Technical •January 15, 2023• 5 Min. Read

TextBlob: Most widely used library for NLP

TextBlob is an open-source Python library for processing textual unstructured data. It provides an API for the tasks such as Noun phrase extraction, Part-of-speech tagging, Sentiment analysis, Classification, Tokenization, etc.

Using Textblob for NLP

Most of the data generated in today’s times are unstructured. The wide variety of textual data available is a valuable source of information in various industries. In decoding this vast information, Natural Language Processing (NLP) plays an important role, which combines linguistics, computer science, and artificial intelligence.

The most popular python library for Natural Language Processing is NLTK. However, it is extensive and complicated to understand and apply.The simpler option is TextBlob, which is a more useful option for implementing NLTK’s functionality. It is an open-source Python library for processing textual unstructured data. It provides an API for the tasks such as Noun phrase extraction, Part-of-speech tagging, Sentiment analysis, Classification, Tokenization, etc.

To get started, you need to install the TextBlob as follows:

$ pip install -U textblob
$ python -m textblob.download_corpora

This command installs the TextBlob and downloads the necessary NLTK files.

Creating a TextBlob object

Perform thee import task:
from textblob import TextBlob

To create a TextBlob object:
Twit1 = TextBlob(“I really like this product!).”

Sentiment Analysis with TextBlob

Textblob offers a sentiment property which is tuple of the form Sentiment with attributes polarity and subjectivity. The polarity score is a float number within the range [-1.0, 1.0]. The subjectivity is a float number within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

Example:
testimonial = TextBlob(“This product was very useful, great value for money”)

testimonial.sentiment
Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)

testimonial.sentiment.polarity
0.39166666666666666

Tokenization

Tokenization refers to the task of breaking a TextBlob object into words or sentences.

For Example:
zen = TextBlob(“Beautiful is better than ugly. ”
“Explicit is better than implicit. ”
“Simple is better than complex.”)
zen.words
WordList([‘Beautiful’, ‘is’, ‘better’, ‘than’, ‘ugly’, ‘Explicit’, ‘is’, ‘better’, ‘than’, ‘implicit’, ‘Simple’, ‘is’, ‘better’, ‘than’, ‘complex’])

zen.sentences
[Sentence(“Beautiful is better than ugly.”), Sentence(“Explicit is better than implicit.”), Sentence(“Simple is better than complex.”)]

The Sentence objects have the similar properties as TextBlobs objects.

For Example:
for sentence in zen.sentences:
print(sentence.sentiment)

Words Inflection and Lemmatization

Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with methods for word inflection.

For example :
sentence = TextBlob(‘Use 4 spaces per indentation level.’)
sentence.words
WordList([‘Use’, ‘4’, ‘spaces’, ‘per’, ‘indentation’, ‘level’])
sentence.words[2].singularize()
‘space’
sentence.words[-1].pluralize()
‘levels’

Words can be lemmatized by calling the lemmatize method as follows:

from textblob import Word
w = Word(“octopi”)
w.lemmatize()
‘octopus’
w = Word(“went”)
w.lemmatize(“v”) # Pass in WordNet part of speech (verb)
‘go’

Spelling Correction

You can apply the spelling correction with the correct() method.
b = TextBlob(“What a dey!”)
print(b.correct())
What a day!

Word objects can use a spellcheck() and Word.spellcheck() methods that returns a list of (word, confidence) tuples with added suggestions of spellings.

from textblob import Word
w = Word(‘falibility’)
w.spellcheck()
[(‘fallibility’, 1.0)]

Spelling corrections with this method are usually 70{4f8b3ed9fef9ea84dcab9ecab3909d1faac4098bcd4a30a2275db8b2fbfea39e} accurate.

You can also perform tasks such as Parsing n-grams, adding new models or languages through extensions, WordNet integration etc.