TextBlob is a Python (2 and 3) library for processing textual data. It provides a consistent API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more.
TextBlob aims to provide access to common text-processing operations through a familiar interface. You can treat TextBlob objects as if they were Python strings that learned how to do Natural Language Processing.
I have used this library to create custom classifier.I have not provided data that I used. Instead of that I am using samle data provided on TextBlob site. This library gives very good results and compared to tools like IBM watson NLC api, it is way better. Also it is open source.
Installation :
Using PIP pip install -U textblob python -m textblob.download_corpora
Using Conda conda install -c https://conda.anaconda.org/sloria textblob python -m textblob.download_corpora
We have successfully install TextBlob now. we will start to create text classifier.
1. Import required libraries
import os path="your dir path"
os.chdir(path)
from textblob.classifiers import NaiveBayesClassifier
2. Load Train and Test Data
train = [ ('I love this sandwich.', 'pos'), ('This is an amazing place!', 'pos'), ('I feel very good about these beers.', 'pos'), ('This is my best work.', 'pos'), ("What an awesome view", 'pos'), ('I do not like this restaurant', 'neg'), ('I am tired of this stuff.', 'neg'), ("I can't deal with this", 'neg'), ('He is my sworn enemy!', 'neg'), ('My boss is horrible.', 'neg') ] test = [ ('The beer was good.', 'pos'), ('I do not enjoy my job', 'neg'), ("I ain't feeling dandy today.", 'neg'), ("I feel amazing!", 'pos'), ('Gary is a friend of mine.', 'pos'), ("I can't believe I'm doing this.", 'neg') ]
3. Create Classifier
cl = NaiveBayesClassifier(train)
4. Check with some random samples
cl.classify("Their burgers are amazing")
# "pos" cl.classify("I don't like their pizza.")
# "neg"
5. Check accuracy for test data
cl.accuracy(test)
#Out[40]: 0.8333333333333334
How Naive Bays Algorithm Works?
""" A classifier based on the Naive Bayes algorithm. In order to find the probability for a label, this algorithm first uses the Bayes rule to express P(label|features) in terms of P(label) and P(features|label): | P(label) * P(features|label) | P(label|features) = ------------------------------------------------------ | P(features) The algorithm then makes the 'naive' assumption that all features are independent, given the label: | P(label) * P(f1|label) * ... * P(fn|label) | P(label|features) = --------------------------------------------------------------------------------------------- | P(features) Rather than computing P(features) explicitly, the algorithm just calculates the numerator for each label, and normalizes them so they sum to one: | P(label) * P(f1|label) * ... * P(fn|label) | P(label|features) = ------------------------------------------------------------------------------------------------ | SUM[l]( P(l) * P(f1|l) * ... * P(fn|l) ) """
That's it. We had created simple text classifier which can classify tweets in two different categories pos and neg.
References :
http://textblob.readthedocs.io/en/dev/classifiers.html