Topic Modeling and Latent Dirichlet Allocation(LDA) using Gensim and Sklearn : Part 1

This article was published as a part of the Data Science Blogathon

Introduction

Let’s say you have a client who has a publishing house. Your client comes to you with two tasks: one he wants to categorize all the books or the research papers he receives weekly on a common theme or a topic and the other task is to encapsulate large documents into smaller bite-sized texts. Is there any technique and tool available that can do both of these two tasks?

Lo and behold! We enter the world of Topic Modeling. I’ll break this article into three parts. In the current one, we’ll explore the basics of how text data is seen in Natural Language Processing, what are topics, what is topic modeling.

We shall see what are the applications of topic modeling, where all it is used, what are the methodologies to perform topic modeling, and what are the types of models available.

In the second article, we will dive in-depth into the most popular topic modeling technique

 

 

 

To finish reading, please visit source site