Most of the data generated in today’s times are unstructured. The wide variety of textual data available is a valuable source of information in various industries. In decoding this vast information, Natural Language Processing (NLP) plays an important role, which combines linguistics, computer science, and artificial intelligence.
The most popular python library for Natural Language Processing is NLTK. However, it is extensive and complicated to understand and apply.The simpler option is TextBlob, which is a more useful option for implementing NLTK’s functionality. It is an open-source Python library for processing textual unstructured data. It provides an API for the tasks such as Noun phrase extraction, Part-of-speech tagging, Sentiment analysis, Classification, Tokenization, etc.
To get started, you need to install the TextBlob as follows:
$ pip install -U textblob
$ python -m textblob.download_corpora
This command installs the TextBlob and downloads the necessary NLTK files.
Creating a TextBlob object
Perform thee import task:
from textblob import TextBlob
To create a TextBlob object:
Twit1 = TextBlob("I really like this product!)."
Sentiment Analysis with TextBlob
Textblob offers a sentiment property which is tuple of the form Sentiment with attributes polarity and subjectivity. The polarity score is a float number within the range [-1.0, 1.0]. The subjectivity is a float number within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
testimonial = TextBlob("This product was very useful, great value for money")
Tokenization refers to the task of breaking a TextBlob object into words or sentences.
zen = TextBlob("Beautiful is better than ugly. "
"Explicit is better than implicit. "
"Simple is better than complex.")
WordList(['Beautiful', 'is', 'better', 'than', 'ugly', 'Explicit', 'is', 'better', 'than', 'implicit', 'Simple', 'is', 'better', 'than', 'complex'])
[Sentence("Beautiful is better than ugly."), Sentence("Explicit is better than implicit."), Sentence("Simple is better than complex.")]
The Sentence objects have the similar properties as TextBlobs objects.
for sentence in zen.sentences:
Words Inflection and Lemmatization
Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with methods for word inflection.
For example :
sentence = TextBlob('Use 4 spaces per indentation level.')
WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level'])
Words can be lemmatized by calling the lemmatize method as follows:
from textblob import Word
w = Word("octopi")
w = Word("went")
w.lemmatize("v") # Pass in WordNet part of speech (verb)
You can apply the spelling correction with the correct() method.
b = TextBlob("What a dey!")
What a day!
Word objects can use a spellcheck() and Word.spellcheck() methods that returns a list of (word, confidence) tuples with added suggestions of spellings.
from textblob import Word
w = Word('falibility')
Spelling corrections with this method are usually 70% accurate.
You can also perform tasks such as Parsing n-grams, adding new models or languages through extensions, WordNet integration etc.