How to Prepare Movie Review Data for Sentiment Analysis (Text Classification)

Last Updated on August 14, 2020

Text data preparation is different for each problem.

Preparation starts with simple steps, like loading data, but quickly gets difficult with cleaning tasks that are very specific to the data you are working with. You need help as to where to begin and what order to work through the steps from raw data to data ready for modeling.

In this tutorial, you will discover how to prepare movie review text data for sentiment analysis, step-by-step.

After completing this tutorial, you will know:

  • How to load text data and clean it to remove punctuation and other non-words.
  • How to develop a vocabulary, tailor it, and save it to file.
  • How to prepare movie reviews using cleaning and a pre-defined vocabulary and save them to new files ready for modeling.

Kick-start your project with my new book Deep Learning for Natural Language Processing, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Update Oct/2017: Fixed a small bug when skipping non-matching files, thanks Jan Zett.
  • Update Dec/2017: Fixed a small typo in full example, thanks Ray and Zain.
  • Update Aug/2020: Updated link to movie review dataset.
  • To finish reading, please visit source site