FuzzyWuzzy Python Library: Interesting Tool for NLP and Text Analytics

This article was published as a part of the Data Science Blogathon Introduction There are many ways to compare text in python. But, often we search for an easy way to compare text. Comparing text is needed for various text analytics and Natural Language Processing purposes. One of the easiest ways of comparing text in python is using the fuzzy-wuzzy library. Here, we get a score out of 100, based on the similarity of the strings. Basically, we are given the similarity […]

Read more

How to Perform Basic Text Analysis without Training Dataset

This article was published as a part of the Data Science Blogathon Overview This article will give you a basic understanding of how text analysis works. Learn the various steps of the NLP pipeline Derivation of the overall sentiment of the text. Dashboard depicting the general statistics and sentiment analysis of the text. Abstract In this modern digital era, a large amount of information is generated per second. Most of the data humans generate through WhatsApp messages, tweets, blogs, news articles, […]

Read more

Text Analytics of Resume Dataset with NLP!

This article was published as a part of the Data Science Blogathon Introduction We all have made our resumes at some point in time. In a resume, we try to include important facts about ourselves like our education, work experience, skills, etc. Let us work on a resume dataset today.  The text we put in our resume speaks a lot about us. For example, our education, skills, work experience, and other random information about us are all present in a resume. […]

Read more

Step by step guide to extract insights from free text (unstructured data)

Text Mining is one of the most complex analysis in the industry of analytics. The reason for this is that, while doing text mining, we deal with unstructured data. We do not have clearly defined observation and variables (rows and columns). Hence, for doing any kind of analytics, you need to first convert this unstructured data into a structured dataset and then proceed with normal modelling framework. The additional step of converting an unstructured data into a structured format is […]

Read more

Build Text Categorization Model with Spark NLP

Overview Setting up John Snow labs Spark-NLP on AWS EMR and using the library to perform a simple text categorization of BBC articles. Introduction Natural Language Processing is one of the important processes for data science teams across the globe. With ever-growing data, most of the organizations have already moved to big data platforms like Apache Hadoop and cloud offerings like AWS, Azure, and GCP. These platforms are more than capable of handling    

Read more

Measuring Audience Sentiments about Movies using Twitter and Text Analytics

Introduction The practice of using analytics to measure movie’s success is not a new phenomenon. Most of these predictive models are based on structured data with input variables such as Cost of Production, Genre of the Movie, Actor, Director, Production House, Marketing expenditure, no of distribution platforms, etc. However, with the advent of social media platforms, young demographics, digital media and the increasing adoption of platforms like Twitter, Facebook, etc to express views and opinions. Social Media has become a […]

Read more

Build a word cloud using text mining tools of R

 This is how a word cloud of our entire website looks like! A word cloud is a graphical representation of frequently used words in a collection of text files. The height of each word in this picture is an indication of frequency of occurrence of the word in the entire text. By the end of this article, you will be able to make a word cloud using R on any given set of text files. Such diagrams are very useful when doing […]

Read more