spaCy

YAKE

SpaCy is a free, open-source library specifically designed for efficiently performing various NLP tasks. It is usually used for setting up production-level pipelines using pre-trained models for tasks like information extractors or reviews of sentimental analysis systems. It can also be used to extract key phrases and words from the text input. This library can be installed using the following commands.

pip install -U spacy
python -m spacy download en_core_web_sm

Following is the Python implementation of keyphrases extraction using SpaCy.

Python3

# Importing libraries 
import spacy 
from wordcloud import WordCloud 
import matplotlib.pyplot as plt 
  
# Initializing the spaCy model instance 
nlp = spacy.load('en_core_web_sm') 
  
# Input text 
input_text = ''' 
NLP stands for Natural Language Processing. 
It is the branch of Artificial Intelligence that gives the ability to machine understand  
and process human languages. Human languages can be in the form of text or audio format. 
Natural Language Processing started in 1950 When Alan Mathison Turing published  
an article in the name Computing Machinery and Intelligence.  
It is based on Artificial intelligence. It talks about automatic interpretation and  
generation of natural language. 
As the technology evolved, different approaches have come to deal with NLP tasks. 
'''
  
# Creating a spaCy document 
spacy_doc = nlp(input_text) 
  
# Initializing keywords list variable 
keywords = [] 
  
# Extracting keyphrases 
for chunk in spacy_doc.noun_chunks: 
    if chunk.text.lower() not in nlp.Defaults.stop_words: 
        keywords.append(chunk.text) 
  
# Displaying the keywords 
print(keywords) 
  
# Generate WordCloud 
wordcloud = WordCloud().generate(' '.join(keywords)) 
  
# Display the WordCloud 
plt.figure(figsize=(10,10)) 
plt.imshow(wordcloud, interpolation='bilinear') 
plt.axis('off') 
plt.show()

Output:

['\nNLP', 'Natural Language Processing', 'the branch', 'Artificial Intelligence', 'the ability', 
'human languages', 'Human languages', 'the form', 'text', 'audio format', 'Natural Language Processing',
 'Alan Mathison Turing', 'an article', 'the name', 'Computing Machinery', 'Intelligence', 
 'Artificial intelligence', 'automatic interpretation', 'generation', 'natural language', 
 'the technology', 'different approaches', 'NLP tasks']

Keyword Extraction using spaCy

Textacy

Textacy is a Python library that provides a simple and intuitive interface for performing various natural language processing (NLP) tasks. It is built on top of spaCy, another popular NLP library, and offers additional functionalities and utilities to simplify common NLP workflows.

Python3

#import textacy.ke 
import textacy 
from textacy import *
  
#Load a spacy model, which will be used for all further processing. 
en = textacy.load_spacy_lang("en_core_web_sm") 
  
# Input text 
input_text = ''' 
NLP stands for Natural Language Processing. 
It is the branch of Artificial Intelligence that gives the ability to machine understand  
and process human languages. Human languages can be in the form of text or audio format. 
Natural Language Processing started in 1950 When Alan Mathison Turing published  
an article in the name Computing Machinery and Intelligence.  
It is based on Artificial intelligence. It talks about automatic interpretation and  
generation of natural language. 
As the technology evolved, different approaches have come to deal with NLP tasks. 
'''
  
  
#convert the text into a spacy document. 
doc = textacy.make_spacy_doc(input_text, lang=en) 
  
  
#Print the keywords using TextRank algorithm, as implemented in Textacy. 
print("Textrank output: \n", textacy.extract.keyterms.textrank(doc, 
                                                             normalize="lemma", 
                                                             topn=5)) 
  
# structured information extraction (textacy.extract) 
keywords = [kps for kps, weights in 
                            textacy.extract.keyterms.textrank(doc, 
                                                              normalize="lemma")] 
  
# Generate WordCloud 
wordcloud = WordCloud().generate(' '.join(keywords)) 
  
# Display the WordCloud 
plt.figure(figsize=(10,10)) 
plt.imshow(wordcloud, interpolation='bilinear') 
plt.axis('off') 
plt.title('Textrank') 
plt.show() 
  
#Print the key words and phrases, using SGRank algorithm, as implemented in Textacy 
print("SGRank output: \n", [kps for kps, weights in 
                          textacy.extract.keyterms.sgrank(doc, topn=5)]) 
  
# structured information extraction (textacy.extract) 
keywords = [kps for kps, weights in 
                            textacy.extract.keyterms.sgrank(doc, normalize="lemma")] 
  
# Generate WordCloud 
wordcloud = WordCloud().generate(' '.join(keywords)) 
  
# Display the WordCloud 
plt.figure(figsize=(10,10)) 
plt.imshow(wordcloud, interpolation='bilinear') 
plt.axis('off') 
plt.title('SGRank') 
plt.show()

Output:

Textrank output: 
[('Natural Language Processing', 0.044047486408196675), ('Alan Mathison Turing', 0.04176581650758854),
 ('Artificial Intelligence', 0.04001459501418585), ('human language', 0.03494095073620351), 
('NLP task', 0.03217996705388366)]

Keyword Extraction using Textacy Textrank

SGRank output: 
 ['Natural Language Processing', 'Alan Mathison Turing', 'human language', 'NLP', 'Artificial Intelligence']

Keyword Extraction using Textacy SGRank

Keyphrase Extraction in NLP

In this article, we will learn how to perform key phrase and keyword extraction from text using natural language techniques. We will first discuss about keyphrase and keyword extraction and then look into its implementation in Python. We would be using some of the popular libraries including spacy, yake, and rake-nltk.

Tags:

#Natural-language-processing #AI-ML-DS #Machine Learning #NLP #Machine Learning

YAKE

spaCy

Python3

Textacy

Python3

Keyphrase Extraction in NLP

Similar Reads