Extract A Noun Phrase For A Sentence In Natural Language Processing

In this blog, we will extract Noun phrase for a sentenence using TextBlob, Spacy and NLKT libraries.

What is TextBlob?

TextBlob: Simplified Text Processing

TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

Installation

We can installing or upgrading from the pip

pip install -U textblob

Download corpus

python -m textblob.download_corpora

What is Spacy?

Processing raw text intelligently is difficult: most words are rare, and it's common for words that look completely different to mean almost the same thing. The same words in a different order can mean something completely different. Even splitting text into useful word-like units can be difficult in many languages. While it’s possible to solve some problems starting from only the raw characters, it’s usually better to use linguistic knowledge to add useful information. That’s exactly what spaCy is designed to do: you put in raw text, and get back a Doc object, that comes with a variety of annotations.

The download command will install the package via pip and place the package in your site-packages directory.

pip install -U spacy python -m spacy download en_core_web_sm

What is NLTK?

The Natural Language Toolkit (NLTK) is an open source Python library for Natural Language Processing.

Install NLTK

pip install --user -U nltk

If you are working first time, you have to download below packages.

nltk.download('brown')

nltk.download('punkt')

nltk.download('averaged_perceptron_tagger')

Now we have installed the required library. Let us extract the noun from a sentence.

Extract Noun using TextBlog

from textblob import TextBlob

Write a sentence

text = TextBlob("Machine learning (ML) is a field of inquiry devoted to understanding and building methods that learn, \
that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial \
intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make \
predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide \
variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is \
difficult or unfeasible to develop conventional algorithms to perform the needed tasks.")

Noun Phrase Extraction

Noun phrases are accessed through the noun_phrases property.

for np in text.noun_phrases:
    print(np)

machine
ml
building methods
leverage data
artificial intelligence
machine
learning algorithms
sample data
training data
machine
learning algorithms
wide variety
speech recognition
computer vision
conventional algorithms

We got all the nouns.

Noun extraction using spacy

import spacy

Initialize a Language object

nlp = spacy.load("en_core_web_sm")

Define text

text = "Machine learning (ML) is a field of inquiry devoted to understanding and building methods that learn, \
that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial \
intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make \
predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide \
variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is \
difficult or unfeasible to develop conventional algorithms to perform the needed tasks."

Pass text in language object

doc = nlp(text)
doc

Machine learning (ML) is a field of inquiry devoted to understanding and building methods that learn, that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Get all the noun with noun_chunks() method

for np in doc.noun_chunks:
      print(np)

Machine learning
ML
a field
inquiry
understanding and building methods
that
methods
that
data
performance
some set
tasks
It
a part
artificial intelligence
Machine learning algorithms
a model
sample data
training data
order
predictions
decisions
Machine learning algorithms
a wide variety
applications
medicine
email filtering
speech recognition
computer vision
it
conventional algorithms
the needed tasks

We got all the noun. In this case we got more nouns.

Noun extraction using NLTK

import nltk
from nltk import word_tokenize, pos_tag

text = "Machine learning (ML) is a field of inquiry devoted to understanding and building methods that learn, \
that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial \
intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make \
predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide \
variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is \
difficult or unfeasible to develop conventional algorithms to perform the needed tasks."

Convert text into word tokenizer

tokens = word_tokenize(text)
tokens

['Machine',
 'learning',
 '(',
 'ML',
 ')',
 'is',
 'a',
 'field',
 'of',
 'inquiry',
 'devoted',
 'to',
 'understanding',
 'and',
 'building',
 'methods',
 'that',
 'learn',
 ',',
 'that',
 'is',
 ',',
 'methods',
 'that',
 'leverage',
 'data',
 'to',
 'improve',
 'performance',
 'on',
 'some',
 'set',
 'of',
 'tasks',
 '.',
 'It',
 'is',
 'seen',
 'as',
 'a',
 'part',
 'of',
 'artificial',
 'intelligence',
 '.',
 'Machine',
 'learning',
 'algorithms',
 'build',
 'a',
 'model',
 'based',
 'on',
 'sample',
 'data',
 ',',
 'known',
 'as',
 'training',
 'data',
 ',',
 'in',
 'order',
 'to',
 'make',
 'predictions',
 'or',
 'decisions',
 'without',
 'being',
 'explicitly',
 'programmed',
 'to',
 'do',
 'so',
 '.',
 'Machine',
 'learning',
 'algorithms',
 'are',
 'used',
 'in',
 'a',
 'wide',
 'variety',
 'of',
 'applications',
 ',',
 'such',
 'as',
 'in',
 'medicine',
 ',',
 'email',
 'filtering',
 ',',
 'speech',
 'recognition',
 ',',
 'and',
 'computer',
 'vision',
 ',',
 'where',
 'it',
 'is',
 'difficult',
 'or',
 'unfeasible',
 'to',
 'develop',
 'conventional',
 'algorithms',
 'to',
 'perform',
 'the',
 'needed',
 'tasks',
 '.']

Get all parts of speech

parts_of_speech = nltk.pos_tag(tokens)
parts_of_speech

[('Machine', 'NN'),
 ('learning', 'NN'),
 ('(', '('),
 ('ML', 'NNP'),
 (')', ')'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('field', 'NN'),
 ('of', 'IN'),
 ('inquiry', 'NN'),
 ('devoted', 'VBN'),
 ('to', 'TO'),
 ('understanding', 'JJ'),
 ('and', 'CC'),
 ('building', 'NN'),
 ('methods', 'NNS'),
 ('that', 'WDT'),
 ('learn', 'VBP'),
 (',', ','),
 ('that', 'DT'),
 ('is', 'VBZ'),
 (',', ','),
 ('methods', 'NNS'),
 ('that', 'IN'),
 ('leverage', 'NN'),
 ('data', 'NNS'),
 ('to', 'TO'),
 ('improve', 'VB'),
 ('performance', 'NN'),
 ('on', 'IN'),
 ('some', 'DT'),
 ('set', 'NN'),
 ('of', 'IN'),
 ('tasks', 'NNS'),
 ('.', '.'),
 ('It', 'PRP'),
 ('is', 'VBZ'),
 ('seen', 'VBN'),
 ('as', 'IN'),
 ('a', 'DT'),
 ('part', 'NN'),
 ('of', 'IN'),
 ('artificial', 'JJ'),
 ('intelligence', 'NN'),
 ('.', '.'),
 ('Machine', 'NNP'),
 ('learning', 'VBG'),
 ('algorithms', 'JJ'),
 ('build', 'VB'),
 ('a', 'DT'),
 ('model', 'NN'),
 ('based', 'VBN'),
 ('on', 'IN'),
 ('sample', 'NN'),
 ('data', 'NNS'),
 (',', ','),
 ('known', 'VBN'),
 ('as', 'IN'),
 ('training', 'NN'),
 ('data', 'NNS'),
 (',', ','),
 ('in', 'IN'),
 ('order', 'NN'),
 ('to', 'TO'),
 ('make', 'VB'),
 ('predictions', 'NNS'),
 ('or', 'CC'),
 ('decisions', 'NNS'),
 ('without', 'IN'),
 ('being', 'VBG'),
 ('explicitly', 'RB'),
 ('programmed', 'VBN'),
 ('to', 'TO'),
 ('do', 'VB'),
 ('so', 'RB'),
 ('.', '.'),
 ('Machine', 'NNP'),
 ('learning', 'VBG'),
 ('algorithms', 'NNS'),
 ('are', 'VBP'),
 ('used', 'VBN'),
 ('in', 'IN'),
 ('a', 'DT'),
 ('wide', 'JJ'),
 ('variety', 'NN'),
 ('of', 'IN'),
 ('applications', 'NNS'),
 (',', ','),
 ('such', 'JJ'),
 ('as', 'IN'),
 ('in', 'IN'),
 ('medicine', 'NN'),
 (',', ','),
 ('email', 'NN'),
 ('filtering', 'NN'),
 (',', ','),
 ('speech', 'NN'),
 ('recognition', 'NN'),
 (',', ','),
 ('and', 'CC'),
 ('computer', 'NN'),
 ('vision', 'NN'),
 (',', ','),
 ('where', 'WRB'),
 ('it', 'PRP'),
 ('is', 'VBZ'),
 ('difficult', 'JJ'),
 ('or', 'CC'),
 ('unfeasible', 'JJ'),
 ('to', 'TO'),
 ('develop', 'VB'),
 ('conventional', 'JJ'),
 ('algorithms', 'NNS'),
 ('to', 'TO'),
 ('perform', 'VB'),
 ('the', 'DT'),
 ('needed', 'JJ'),
 ('tasks', 'NNS'),
 ('.', '.')]

Filter all noun from parts of speech

nouns = list(filter(lambda x: x[1] == "NN", parts_of_speech))
nouns

[('Machine', 'NN'),
 ('learning', 'NN'),
 ('field', 'NN'),
 ('inquiry', 'NN'),
 ('building', 'NN'),
 ('leverage', 'NN'),
 ('performance', 'NN'),
 ('set', 'NN'),
 ('part', 'NN'),
 ('intelligence', 'NN'),
 ('model', 'NN'),
 ('sample', 'NN'),
 ('training', 'NN'),
 ('order', 'NN'),
 ('variety', 'NN'),
 ('medicine', 'NN'),
 ('email', 'NN'),
 ('filtering', 'NN'),
 ('speech', 'NN'),
 ('recognition', 'NN'),
 ('computer', 'NN'),
 ('vision', 'NN')]

We got all the nouns.

Extract A Noun Phrase For A Sentence In Natural Language Processing

Extract A Noun Phrase For A Sentence In Natural Language Processing

What is TextBlob?

TextBlob: Simplified Text Processing

Installation

Download corpus

What is Spacy?

What is NLTK?

Install NLTK

Extract Noun using TextBlog

Write a sentence

Noun Phrase Extraction

Noun extraction using spacy

Initialize a Language object

Define text

Pass text in language object

Get all the noun with noun_chunks() method

Noun extraction using NLTK

Convert text into word tokenizer

Get all parts of speech

Filter all noun from parts of speech

kindergarten

Python for kids

Fourier series

Linear Equations

Geometry

Laplace

Vectors

Differential equations

Functions

Jacobian

Lagrangian

Waves

Electromagnetism

Optics

Quantum mechanics concepts

Theory of relativity

Kinematics

Thermodynamics

Formulae

A level physics

Chemistry

English

Geography

Animation

Plotting

SVG

Python

Machine Learning

TensorFlow

PySpark

PyTorch

Natural Language Processing

Others