Tokenization and Text Normalization

Objective Text data is a type of unstructured data used in natural language processing. Understand how to preprocess the text data before feeding it to the machine learning algorithms. Introduction Text data is a form of unstructured data. The most prominent examples of text data available on the internet are social media data like tweets, posts, comments, or the Conversation data such as messages, emails, Chats. Also, it can be article data like news articles, blogs, etc. Note: If you […]

Read more

NLP Essentials: Removing Stopwords and Performing Text Normalization using NLTK and spaCy in Python

Overview Learn how to remove stopwords and perform text normalization in Python – an essential Natural Language Processing (NLP) read We will explore the different methods to remove stopwords as well as talk about text normalization techniques like stemming and lemmatization Put your theory into practice by performing stopwords removal and text normalization in Python using the popular NLTK, spaCy and Gensim libraries   Introduction Don’t you love how wonderfully diverse Natural Language Processing (NLP) is? Things we never imagined […]

Read more