Beginner’s Guide To Text Classification Using PyCaret

Introduction Have you ever solved a Machine Learning problem in just one go? Solving a problem using machine learning isn’t straightforward. It involves various steps to come up with an accurate solution. The process/steps to be followed for solving an ml problem is known as ML Pipeline/ML Cycle. ML Pipeline/ ML Cycle (Credits: https://medium.com/analytics-vidhya/machine-learning-development-life-cycle-dfe88c44222e) As shown in the figure, the Machine Learning pipeline consists of different steps like: Understand Problem Statement, Hypothesis Generation, Exploratory Data Analysis, Data Preprocessing, Feature Engineering, […]

Read more

Getting started with NLP using NLTK Library

1010010   01101001   01110100   01101000   01101001  01101011   01100001 Did you understand the above binary code? If yes, then you’re a computer. If no, then you’re a Human. 🙂 I know it’s a difficult task for us to understand binary code just like computers because binary code is a Machine Understandable Language. Likewise, even computers don’t understand human language. So, how to make computers understand human language? The answer is Natural Language Processing. With the help of NLP, we can teach computers […]

Read more

Why must text data be pre-processed ?

This article was published as a part of the Data Science Blogathon Introduction Language is a structured medium we humans use to communicate with each other. Language can be in the form of speech or text. “Blah blah”, “Meh”, “zzzz…” Yup, we can understand these words. But the question is, “Can computers understand these?” Nop, machines can’t understandthese. In fact, machines can’t understand any text data at all, be it the word “blah” or the word “machine”. They only understand numbers. […]

Read more

NLTK: A Beginners Hands-on Guide to Natural Language Processing

This article was published as a part of the Data Science Blogathon Introduction:  NLTK is a toolkit build for working with NLP in Python. It provides us various text processing libraries with a lot of test datasets. A variety of tasks can be performed using NLTK such as tokenizing, parse tree visualization, etc… In this article, we will go through how we can set up NLTK in our system and use them for performing various NLP tasks during the text processing […]

Read more

FuzzyWuzzy Python Library: Interesting Tool for NLP and Text Analytics

This article was published as a part of the Data Science Blogathon Introduction There are many ways to compare text in python. But, often we search for an easy way to compare text. Comparing text is needed for various text analytics and Natural Language Processing purposes. One of the easiest ways of comparing text in python is using the fuzzy-wuzzy library. Here, we get a score out of 100, based on the similarity of the strings. Basically, we are given the similarity […]

Read more

Text Analysis with Spacy to Master NLP techniques

This article was published as a part of the Data Science Blogathon Natural Language Processing(NLP) is a branch of Artificial Intelligence that deals with Daily Language. Have you ever wonder how Alexa, Siri, Google Assistant understand us with voice and respond to us. Human Language is the fuzziest and complex. As they receive text input first preprocessing of text happens and many techniques are embedded which lets them understand grammar. In this tutorial, we will study some techniques which are helpful […]

Read more

Practical Guide to Word Embedding System

This article was published as a part of the Data Science Blogathon Pre-requisites – Basic knowledge of Python – Understanding of basics of NLP(Natural Language Processing)   Introduction In natural language processing, word embedding is used for the representation of words for Text Analysis, in the form of a vector that performs the encoding of the meaning of the word such that the words which are closer in that vector space are expected to have similar in mean. Consider, boy-men vs […]

Read more

Part 2: Topic Modeling and Latent Dirichlet Allocation (LDA) using Gensim and Sklearn

This article was published as a part of the Data Science Blogathon Introduction In the previous article, we had started with understanding the basic terminologies of text in Natural Language Processing(NLP), what is topic modeling, its applications, the types of models, and the different topic modeling techniques available. Let’s continue from there, explore Latent Dirichlet Allocation (LDA), working of LDA, and its similarity to another very popular dimensionality reduction technique called Principal Component Analysis (PCA).   Table of Contents A Little […]

Read more

Topic modeling With Naive Bayes Classifier

This article was published as a part of the Data Science Blogathon Introduction Naive Bayes is a powerful tool that leverages Bayes’ Theorem to understand and mimic complex data structures. In recent years, it has commonly been used for Natural Language Processing (NLP) tasks, such as text categorization. Today, we will be constructing a Naive Bayes text classifier for topic categorization. Before we move forward with the explanation, I want to emphasize that Naive Bayes is not the traditional method of […]

Read more

Sentiment Analysis using NLTK – A Practical Approach

This article was published as a part of the Data Science Blogathon Introduction The ultimate goal of this blog is to predict the sentiment of a given text using python where we use NLTK aka Natural Language Processing Toolkit, a package in python made especially for text-based analysis. So with a few lines of code, we can easily predict whether a sentence or a review(used in the blog) is a positive or a negative review. Before moving on to the implementation […]

Read more
1 2 3 4