Python for NLP: Developing an Automatic Text Filler using N-Grams

python_tutorials

This is the 15th article in my series of articles on Python for NLP. In my previous article, I explained how to implement TF-IDF approach from scratch in Python. Before that we studied, how to implement bag of words approach from scratch in Python.

Today, we will study the N-Grams approach and will see how the N-Grams approach can be used to create a simple automatic text filler or suggestion engine. Automatic text filler is a very useful application and is widely used by Google and different smartphones where a user enters some text and the remaining text is automatically populated or suggested by the application.

Problems with TF-IDF and Bag of Words Approach

Before we go and actually implement the N-Grams model, let us first discuss the drawback of the bag of words and TF-IDF approaches.

In the bag of words and TF-IDF approach, words are treated individually and every single word is converted into its numeric counterpart. The context information of the word is not retained. Consider two sentences “big red machine and carpet” and “big red carpet and machine”. If you use a bag of words approach, you

To finish reading, please visit source site