Step by step guide to extract insights from free text (unstructured data)

Text Mining is one of the most complex analysis in the industry of analytics. The reason for this is that, while doing text mining, we deal with unstructured data. We do not have clearly defined observation and variables (rows and columns). Hence, for doing any kind of analytics, you need to first convert this unstructured data into a structured dataset and then proceed with normal modelling framework. The additional step of converting an unstructured data into a structured format is […]

Read more

Information Retrieval System explained in simple terms!

Introduction While searching for things over internet, I always wondered, what kind of algorithms might be running behind these search engines which provide us with the most relevant information? How do they decide which result to show for which set of search keywords. This might be a no brainer for a few people, but definitely an interesting problem for some of the best brains around the world. To find the answer, I read every guide, tutorial, learning material that came my way. Eventually, I learnt […]

Read more

Text Mining hack: Subject Extraction made easy using Google API

Let’s do a simple exercise. You need to identify the subject and the sentiment in following sentences: Google is the best resource for any kind of information. I came across a fabulous knowledge portal – Analytics Vidhya Messi played well but Argentina still lost the match Opera is not the best browser Yes, like UAE will win the Cricket World Cup. Was this exercise simple? Even if this looks like a simple exercise, now imagine creating an algorithm to do this? How does that […]

Read more

Framework to build a niche dictionary for text mining

Having the right dictionary is at the heart of any text mining analysis. Dictionary for text mining can be compared to maps while travelling in a new city. The more precise and accurate maps you use, the faster you reach to the destination. On the other hand, a wrong or incomplete map can end up confusing the traveler. Use of dictionary helps us convert unstructured text into structured data. The more precise dictionary you have for the analysis, the more accurate […]

Read more

Tapping Twitter Sentiments: A Complete Case-Study on 2015 Chennai Floods

Introduction We did this case study as a part of our capstone project at Great Lakes Institute of Management, Chennai. After we presented this study, we got an overwhelming response from our professors & mentors. Later, they encouraged us to share our work to help others learn something new. We’ve been following Analytics Vidhya for a while now. Everyone knows, it’s probably the largest engine to share analytics knowledge. We tried and got lucky in connecting with their content team. So, […]

Read more

The Ultimate Learning Path to Becoming a Data Scientist in 2018

Introduction So you’ve taken the plunge. You want to become a data scientist. But where to begin? There are far too many resources out there. How do you decide the starting point? Did you miss out on topics you should have studied? Which are the best resources to learn? Don’t worry, we have you covered! Analytics Vidhya’s learning path for 2016 saw 250,000+ views. In 2017, we went even further and saw an incredible 500,000+ views! So this year, we […]

Read more

Who is the world cheering for? 2014 FIFA WC winner predicted using Twitter feed (in R)

Sports are filled with emotions! Cheering of audience, reactions to events on various media channels are some of the factors, which make a huge impact on the mind of the players. If people support you, your chances to win are greatly enhanced. Live example of this fact, are the statistics of Indian cricket team playing in India and abroad. The win rate of Indian cricket team in India is approximately twice the win rate abroad. Football is again a game driven largely by emotions. […]

Read more

Build a word cloud using text mining tools of R

 This is how a word cloud of our entire website looks like! A word cloud is a graphical representation of frequently used words in a collection of text files. The height of each word in this picture is an indication of frequency of occurrence of the word in the entire text. By the end of this article, you will be able to make a word cloud using R on any given set of text files. Such diagrams are very useful when doing […]

Read more

Replicating Human Memory Structures in Neural Networks to Create Precise NLU algorithms

Introduction Machine learning and Artificial Intelligence developments are happening at breakneck speed! At such pace, you need to understand the developments at multiple levels – you obviously need to understand the underlying tools and techniques, but you also need to develop an intuitive understanding of what is happening. By the end of this article, you will develop an intuitive understanding of RNNs, especially LSTM & GRU. Ready? Let’s go!   Table of Contents Simple exercise – Tweet classification How does […]

Read more

Hacks to perform faster Text Mining in R

Introduction Data science demands versatility. Move away from your regular methods, challenge your ways of working, explore new ways of doing things more efficiently. On reminiscing about my old days, my initial years in data science, I had also got trapped by this devil of ‘complacency’. At one point, I was not challenging myself enough. I wasn’t  experimenting with the ways of doing work. I accepted the things as they were, until I realized ‘Complacency is a state of mind […]

Read more
1 2