An Introduction to Text Summarization using the TextRank Algorithm (with Python implementation)

Introduction Text Summarization is one of those applications of Natural Language Processing (NLP) which is bound to have a huge impact on our lives. With growing digital media and ever growing publishing – who has the time to go through entire articles / documents / books to decide whether they are useful or not? Thankfully – this technology is already here. Have you come across the mobile app inshorts? It’s an innovative news app that converts news articles into a […]

Read more

Top 5 Data Science GitHub Repositories and Reddit Discussions (January 2019)

Introduction There’s nothing quite like GitHub and Reddit for data science. Both platforms have been of immense help to me in my data science journey. GitHub is the ultimate one-stop platform for hosting your code. It excels at easing the collaboration process between team members. Most leading data scientists and organizations use GitHub to open-source their libraries and frameworks. So not only do we stay up-to-date with the latest developments in our field, we get to replicate their models on our […]

Read more

Create Natural Language Processing based Apps for iOS in Minutes! (using Apple’s Core ML 3)

Overview Intrigued by Apple’s iOS apps? Learn how to build Natural Language Processing (NLP) iOS apps in this article We’ll be using Apple’s Core ML 3 to build these NLP iOS apps This is a hands-on step by step tutorial with code   Introduction I love working in the Natural Language Processing (NLP) space. The last couple of years have been a goldmine for me – the level and quality of developments have been breathtaking. But this comes with its […]

Read more

Build a word cloud using text mining tools of R

 This is how a word cloud of our entire website looks like! A word cloud is a graphical representation of frequently used words in a collection of text files. The height of each word in this picture is an indication of frequency of occurrence of the word in the entire text. By the end of this article, you will be able to make a word cloud using R on any given set of text files. Such diagrams are very useful when doing […]

Read more

6 Practices to enhance the performance of a Text Classification Model

Introduction A few months back, I was working on creating a sentiment classifier for Twitter data. After trying the common approaches, I was still struggling to get good accuracy on the results. Text classification problems and algorithms have been around for a while now. They are widely used for Email Spam Filtering by the likes of Google and Yahoo, for conducting sentiment analysis of twitter data and automatic news categorization in google alerts. However, while dealing with enormous amount of text […]

Read more

Extracting information from reports using Regular Expressions Library in Python

Introduction Many times it is necessary to extract key information from reports, articles, papers, etc. For example names of companies – prices from financial reports, names of judges – jurisdiction from court judgments, account numbers from customer complaints, etc. These extractions are part of Text Mining and are essential in converting unstructured data to a structured form which are later used for applying analytics/machine learning. Such entity extraction uses approaches like ‘lookup’, ‘rules’ and ‘statistical/machine learning’. In ‘lookup’ based approaches, […]

Read more

An NLP Approach to Mining Online Reviews using Topic Modeling (with Python codes)

Introduction E-commerce has revolutionized the way we shop. That phone you’ve been saving up to buy for months? It’s just a search and a few clicks away. Items are delivered within a matter of days (sometimes even the next day!). For online retailers, there are no constraints related to inventory management or space management They can sell as many different products as they want. Brick and mortar stores can keep only a limited number of products due to the finite space […]

Read more

Introduction to StanfordNLP: An Incredible State-of-the-Art NLP Library for 53 Languages (with Python code)

Introduction A common challenge I came across while learning Natural Language Processing (NLP) – can we build models for non-English languages? The answer has been no for quite a long time. Each language has its own grammatical patterns and linguistic nuances. And there just aren’t many datasets available in other languages. That’s where Stanford’s latest NLP library steps in – StanfordNLP. I could barely contain my excitement when I read the news last week. The authors claimed StanfordNLP could support more […]

Read more

DataHack Radio #23: Ines Montani and Matthew Honnibal – The Brains behind spaCy

https://soundcloud.com/datahack-radio/ines-montani-matthew-honnibal-the-brains-behind-spacy Introduction What would you do if you had the chance to pick the brains behind one of the most popular Natural Language Processing (NLP) libraries of our era? A library that has helped usher in the current boom in NLP applications and nurtured tons of NLP scientists? Well – you invite the creators on our popular DataHack Radio podcast and let them do the talking! We are delighted to welcome Ines Montani and Matt Honnibal, the developers of spaCy […]

Read more

An Essential Guide to Pretrained Word Embeddings for NLP Practitioners

Overview Understand the importance of pretrained word embeddings Learn about the two popular types of pretrained word embeddings – Word2Vec and GloVe Compare the performance of pretrained word embeddings and learning embeddings from scratch   Introduction How do we make machines understand text data? We know that machines are supremely adept at dealing and working with numerical data but they become sputtering instruments if we feed raw text data to them. The idea is to create a representation of words […]

Read more
1 3 4 5 6 7