Building a FAQ Chatbot in Python – The Future of Information Searching

Introduction What do we do when we need any information? Simple: “We Ask, and Google Tells”. But if the answer depends on multiple variables, then the existing Ask-Tell model tends to sputter. State of the art search engines usually cannot handle such requests. We would have to search for information available in bits and pieces and then try to filter and assemble relevant parts together. Sounds time consuming, doesn’t it? Source: Inbenta This Ask-Tell model is evolving rapidly with the […]

Read more

A Comprehensive Guide to Understand and Implement Text Classification in Python

Improving Text Classification Models While the above framework can be applied to a number of text classification problems, but to achieve a good accuracy some improvements can be done in the overall framework. For example, following are some tips to improve the performance of text classification models and this framework. 1. Text Cleaning : text cleaning can help to reducue the noise present in text data in the form of stopwords, punctuations marks, suffix variations etc. This article can help to understand how […]

Read more

Introduction to Flair for NLP: A Simple yet Powerful State-of-the-Art NLP Library

Introduction Last couple of years have been incredible for Natural Language Processing (NLP) as a domain! We have seen multiple breakthroughs – ULMFiT, ELMo, Facebook’s PyText, Google’s BERT, among many others. These have rapidly accelerated the state-of-the-art research in NLP (and language modeling, in particular). We can now predict the next sentence, given a sequence of preceding words. What’s even more important is that machines are now beginning to understand the key element that had eluded them for long. Context! Understanding context […]

Read more

Innoplexus Sentiment Analysis Hackathon: Top 3 Out-of-the-Box Winning Approaches

Overview Hackathons are a wonderful opportunity to gauge your data science knowledge and compete to win lucrative prizes and job opportunities Here are the top 3 approaches from the Innoplexus Sentiment Analysis Hackathon – a superb NLP challenge   Introduction I’m a big fan of hackathons. I’ve learned so much about data science from participating in these hackathons in the past few years. I’ll admit it – I have gained a lot of knowledge through this medium and this, in […]

Read more

How to use a Machine Learning Model to Make Predictions on Streaming Data using PySpark

Overview Streaming data is a thriving concept in the machine learning space Learn how to use a machine learning model (such as logistic regression) to make predictions on streaming data using PySpark We’ll cover the basics of Streaming Data and Spark Streaming, and then dive into the implementation part   Introduction Picture this – every second, more than 8,500 Tweets are sent, more than 900 photos are uploaded on Instagram, more than 4,200 Skype calls are made, more than 78,000 […]

Read more

Words that matter! A Simple Guide to Keyword Extraction in Python

This article was published as a part of the Data Science Blogathon. Introduction Unstructured data contains a plethora of information. It is like energy when harnessed, will create high value for its stakeholders. A lot of work is already being done in this area by various companies. There is no doubt that the unstructured data is noisy and significant work has to be done to clean, analyze, and make them meaningful to use. This article talks about an area which […]

Read more

Information Retrieval System explained in simple terms!

Introduction While searching for things over internet, I always wondered, what kind of algorithms might be running behind these search engines which provide us with the most relevant information? How do they decide which result to show for which set of search keywords. This might be a no brainer for a few people, but definitely an interesting problem for some of the best brains around the world. To find the answer, I read every guide, tutorial, learning material that came my way. Eventually, I learnt […]

Read more

Ultimate Guide to Understand and Implement Natural Language Processing (with codes in Python)

Overview Complete guide on natural language processing (NLP) in Python Learn various techniques for implementing NLP including parsing & text processing Understand how to use NLP for text feature engineering   Introduction According to industry estimates, only 21% of the available data is present in structured form. Data is being generated as we speak, as we tweet, as we send messages on Whatsapp and in various other activities. Majority of this data exists in the textual form, which is highly unstructured […]

Read more

Complete tutorial on Text Classification using Conditional Random Fields Model (in Python)

Introduction The amount of text data being generated in the world is staggering. Google processes more than 40,000 searches EVERY second!  According to a Forbes report, every single minute we send 16 million text messages and post 510,00 comments on Facebook. For a layman, it is difficult to even grasp the sheer magnitude of data out there? News sites and other online media alone generate tons of text content on an hourly basis. Analyzing patterns in that data can become […]

Read more

DataHack Radio #21: Detecting Fake News using Machine Learning with Mike Tamir, Ph.D.

Introduction Fake news is one of the biggest scourges in our digitally connected world. That is no exaggeration. It is no longer limited to little squabbles – fake news spreads like wildfire and is impacting millions of people every day. How do you deal with such a sensitive issue? Millions of articles are being churned out every day on the internet – how do you tell real from fake? It’s not as easy as turning to a simple fact checker. […]

Read more
1 2 3 4 7