Step by step guide to extract insights from free text (unstructured data)

Text Mining is one of the most complex analysis in the industry of analytics. The reason for this is that, while doing text mining, we deal with unstructured data. We do not have clearly defined observation and variables (rows and columns). Hence, for doing any kind of analytics, you need to first convert this unstructured data into a structured dataset and then proceed with normal modelling framework. The additional step of converting an unstructured data into a structured format is […]

Read more

Step by step guide to building sentiment analysis model using graphlab

I have been using graph lab for quite some time now. The first Kaggle competition I used it for was Click Trough Rate (CTR) and I was amazed to see the speed at which it can crunch such big data. Over last few months, I have realised much broader applications of GraphLab. In this article I will take up the text mining capability of GraphLab and solve one of the Kaggle problems. I will be referring to this problem with […]

Read more

Text Mining Simplified – IPL 2020 Tweet Analysis with R

This article was published as a part of the Data Science Blogathon. Introduction Text mining utilizes different AI technologies to automatically process data and generate valuable insights, enabling companies to make data-driven decisions. Text mining identifies facts, relationships, and assertions that would otherwise remain buried in the mass of textual big data. Once extracted, this information is converted into a structured form that can be further analyzed, or presented directly using clustered HTML tables, mind maps, charts, etc. Advantages of […]

Read more

Steps for effective text data cleaning (with case study using Python)

Introduction   The days when one would get data in tabulated spreadsheets are truly behind us. A moment of silence for the data residing in the spreadsheet pockets. Today, more than 80% of the data is unstructured – it is either present in data silos or scattered around the digital archives. Data is being produced as we speak – from every conversation we make in the social media to every content generated from news sources. In order to produce any […]

Read more

30 Questions to test a data scientist on Natural Language Processing [Solution: Skilltest – NLP]

Introduction Humans are social animals and language is our primary tool to communicate with the society. But, what if machines could understand our language and then act accordingly? Natural Language Processing (NLP) is the science of teaching machines how to understand the language we humans speak and write. We recently launched an NLP skill test on which a total of 817 people registered. This skill test was designed to test your knowledge of Natural Language Processing. If you are one […]

Read more

Framework to build a niche dictionary for text mining

Having the right dictionary is at the heart of any text mining analysis. Dictionary for text mining can be compared to maps while travelling in a new city. The more precise and accurate maps you use, the faster you reach to the destination. On the other hand, a wrong or incomplete map can end up confusing the traveler. Use of dictionary helps us convert unstructured text into structured data. The more precise dictionary you have for the analysis, the more accurate […]

Read more

Kaggle Solution: What’s Cooking ? (Text Mining Competition)

Introduction Tutorial on Text Mining, XGBoost and Ensemble Modeling in R I came across What’s Cooking competition on Kaggle last week. At first, I was intrigued by its name. I checked it and realized that this competition is about to finish. My bad! It was a text mining competition.  This competition went live for 103 days and ended on 20th December 2015. Still, I decided to test my skills. I downloaded the data set, built a model and managed to get a score of […]

Read more

An Introduction to Text Summarization using the TextRank Algorithm (with Python implementation)

Introduction Text Summarization is one of those applications of Natural Language Processing (NLP) which is bound to have a huge impact on our lives. With growing digital media and ever growing publishing – who has the time to go through entire articles / documents / books to decide whether they are useful or not? Thankfully – this technology is already here. Have you come across the mobile app inshorts? It’s an innovative news app that converts news articles into a […]

Read more

Build a word cloud using text mining tools of R

 This is how a word cloud of our entire website looks like! A word cloud is a graphical representation of frequently used words in a collection of text files. The height of each word in this picture is an indication of frequency of occurrence of the word in the entire text. By the end of this article, you will be able to make a word cloud using R on any given set of text files. Such diagrams are very useful when doing […]

Read more

6 Practices to enhance the performance of a Text Classification Model

Introduction A few months back, I was working on creating a sentiment classifier for Twitter data. After trying the common approaches, I was still struggling to get good accuracy on the results. Text classification problems and algorithms have been around for a while now. They are widely used for Email Spam Filtering by the likes of Google and Yahoo, for conducting sentiment analysis of twitter data and automatic news categorization in google alerts. However, while dealing with enormous amount of text […]

Read more
1 2