In this blog, we will rectify some typo errors using TextBlob, Autocorrect and pyspellchecker module.
If you have already installed TextBlob, Autocorrect and pyspellchecker then you can skip installation step.
pip install TextBlob
pip install autocorrect
pip install pyspellchecker
from textblob import TextBlob
text = "Machine learnning is a branch of artifecial intelligence and computer sciance."
We can observe, i have written some wrong spelling of "learning, artificial and science". Let us check TextBlob library can correct spelling mistake or not.
TextBlob(text).correct()
For this example, it corrected the spelling error.
text1 = "John is gud boy and he plays fotball"
TextBlob(text1).correct()
In this example, it didn't correct the spelling of good, but did for football.
In this example, we have text data which is list. We will convert list to pandas's dataframe and the apply lambda function to correct the spelling.
import pandas as pd
text3 = [
'Natural languuage procesing is a branch of computer sciance and artifecial intelligance.',
'The Python programing language provides a wide range of tools and libraries for NLP tassks.',
'NLTK, an open source colection of libraries for buildding NLP programs.',
'Spacy and Textblob libraries are also quite popular.',
'NLP tasks are very interresting.'
]
I have entered wrong data in list to correct the typo error. Wrong spelling are:
- languuage
- procesing
- sciance
- artifecial
- intelligance
- tassks
- colection
- buildding
- interresting
This step is not required, if sentences are smaller in text data.
pd.set_option('display.max_colwidth', 500)
df = pd.DataFrame({'text':text3})
print(df)
df['text'].apply(lambda x: str(TextBlob(x).correct()))
Corrected the spellings of:
- languuage - language
- procesing - processing
- sciance - science
- artifecial - artificial
- intelligance - intelligence
- tassks - tasks
- colection - collection
- buildding - building
- interresting - interesting
All wrong spelling corrected.
from autocorrect import Speller
spell = Speller()
text4 = "Machine learnning is a branch of artifecial intelligence and computer sciance."
spell(text4)
Corrected spelling of "learning, artificial and science".
spell1 = Speller()
text5 = [
'Natural languuage procesing is a branch of computer sciance and artifecial intelligance.',
'The Python programing language provides a wide range of tools and libraries for NLP tassks.',
'NLTK, an open source colection of libraries for buildding NLP programs.',
'Spacy and Textblob libraries are also quite popular.',
'NLP tasks are very interresting.'
]
df1 = pd.DataFrame({'text':text5})
df1
df1['text'].apply(lambda x: str(spell1(x)))
from spellchecker import SpellChecker
spell2 = SpellChecker()
text6 = "Machine learnning is a branch of artifecial intelligence and computer sciance."
text6 = text6.split()
text6
misspelled_text6 = spell2.unknown(text6)
misspelled_text6
for word in misspelled_text6:
print(spell2.correction(word))
#Get a list of 'likely' options
print(spell2.candidates(word))
We got the corrected words.