A friendly guide to NLP: Bag-of-Words with Python example

1. A Quick Example

Let’s look at an easy example to understand the concepts previously explained. We could be interested in analyzing the reviews about Game of Thrones:

Review 1: Game of Thrones is an amazing tv series!

Review 2: Game of Thrones is the best tv series!

Review 3: Game of Thrones is so great

In the table, I show all the calculations to obtain the Bag-Of-Words approach:

Bag-of-Words with Python  example

Each row corresponds to a different review, while the rows are the unique words, contained in the three documents.

2. Implementation with Python

Let’s import the libraries and define the variables, that contain the reviews:

import pandas as pd
import numpy as np
import collections
doc1 = 'Game of Thrones is an amazing tv series!'
doc2 = 'Game of Thrones is the best tv series!'
doc3 = 'Game of Thrones is so great'

We need to remove punctuations, one of the steps I showed in the previous post about the text pre-processing. We

 

 

 

To finish reading, please visit source site