Why must text data be pre-processed ?

This article was published as a part of the Data Science Blogathon

Introduction

Language is a structured medium we humans use to communicate with each other. Language can be in the form of speech or text. “Blah blah”, “Meh”, “zzzz…” Yup, we can understand these words. But the question is, “Can computers understand these?” Nop, machines can’t understand
these. In fact, machines can’t understand any text data at all, be it the word “blah” or the word “machine”. They only understand numbers. So, over the decades scientists have been researching how to make machines understand our language. And thus they developed all the Natural Language Processing or NLP Techniques.

What is Natural Language processing?

Natural language processing or NLP is a branch of Artificial Intelligence that deals with computer and human language interactions. NLP combines computational linguistics with statistical, machine learning, and deep learning models, allowing computers to understand languages. NLP helps computers to extract useful information from text data. Some of the real-world applications of NLP are,