In this blog, we will extract Noun phrase for a sentenence using TextBlob, Spacy and NLKT libraries.
We can installing or upgrading from the pip
pip install -U textblob
python -m textblob.download_corpora
Processing raw text intelligently is difficult: most words are rare, and it's common for words that look completely different to mean almost the same thing. The same words in a different order can mean something completely different. Even splitting text into useful word-like units can be difficult in many languages. While it’s possible to solve some problems starting from only the raw characters, it’s usually better to use linguistic knowledge to add useful information. That’s exactly what spaCy is designed to do: you put in raw text, and get back a Doc object, that comes with a variety of annotations.
The download command will install the package via pip and place the package in your site-packages directory.
pip install -U spacy python -m spacy download en_core_web_sm
The Natural Language Toolkit (NLTK) is an open source Python library for Natural Language Processing.
pip install --user -U nltk
If you are working first time, you have to download below packages.
nltk.download('brown')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
Now we have installed the required library. Let us extract the noun from a sentence.
from textblob import TextBlob
text = TextBlob("Machine learning (ML) is a field of inquiry devoted to understanding and building methods that learn, \
that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial \
intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make \
predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide \
variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is \
difficult or unfeasible to develop conventional algorithms to perform the needed tasks.")
Noun phrases are accessed through the noun_phrases property.
for np in text.noun_phrases:
print(np)
We got all the nouns.
import spacy
nlp = spacy.load("en_core_web_sm")
text = "Machine learning (ML) is a field of inquiry devoted to understanding and building methods that learn, \
that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial \
intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make \
predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide \
variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is \
difficult or unfeasible to develop conventional algorithms to perform the needed tasks."
doc = nlp(text)
doc
for np in doc.noun_chunks:
print(np)
We got all the noun. In this case we got more nouns.
import nltk
from nltk import word_tokenize, pos_tag
text = "Machine learning (ML) is a field of inquiry devoted to understanding and building methods that learn, \
that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial \
intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make \
predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide \
variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is \
difficult or unfeasible to develop conventional algorithms to perform the needed tasks."
tokens = word_tokenize(text)
tokens
parts_of_speech = nltk.pos_tag(tokens)
parts_of_speech
nouns = list(filter(lambda x: x[1] == "NN", parts_of_speech))
nouns
We got all the nouns.