Topic modeling With Naive Bayes Classifier

This article was published as a part of the Data Science Blogathon

Introduction

Naive Bayes is a powerful tool that leverages Bayes’ Theorem to understand and mimic complex data structures. In recent years, it has commonly been used for Natural Language Processing (NLP) tasks, such as text categorization. Today, we will be constructing a Naive Bayes text classifier for topic categorization.

Before we move forward with the explanation, I want to emphasize that Naive Bayes is not the traditional method of classifying topics. In fact, there are other models invented for the specific purpose of classifying topics – such as Blei’s landmark Latent Dirichlet Allocation. But although Naive Bayes will be entering a pretty competitive market in topic categorization, its simplicity and easily accessible mathematical foundation make it a unique tool for developers.

Without any further ado, let’s get Naive.

Packages and Datasets

For this program, we will need to make use of only 2 packages: pandas and textmining. The former will help us input our data,

 

 

 

To finish reading, please visit source site