Steps for effective text data cleaning (with case study using Python)

Introduction

 

The days when one would get data in tabulated spreadsheets are truly behind us. A moment of silence for the data residing in the spreadsheet pockets. Today, more than 80% of the data is unstructured – it is either present in data silos or scattered around the digital archives. Data is being produced as we speak – from every conversation we make in the social media to every content generated from news sources. In order to produce any meaningful actionable insight from data, it is important to know how to work with it in its unstructured form. As a Data Scientist at one of the fastest growing Decision Sciences firm, my bread and butter comes from deriving meaningful insights from unstructured text information.

Mining Twitter Data

One of the first steps in working with text data is to pre-process it. It is an essential step before the data is ready for analysis. Majority

 

 

 

To finish reading, please visit source site

Leave a Reply