Part 7: Step by Step Guide to Master NLP – Word Embedding in Detail

This article was published as a part of the Data Science Blogathon Introduction This article is part of an ongoing blog series on Natural Language Processing (NLP). In the previous articles (part-5 and 6), we completed the different text vectorization and word embeddings techniques in detail. In this article, firstly we will discuss the co-occurrence matrix, which is also a word vectorization technique and after that, we will be discussing new concepts related to the Word embedding that includes, Applications of […]

Read more

Practical Guide to Word Embedding System

This article was published as a part of the Data Science Blogathon Pre-requisites – Basic knowledge of Python – Understanding of basics of NLP(Natural Language Processing)   Introduction In natural language processing, word embedding is used for the representation of words for Text Analysis, in the form of a vector that performs the encoding of the meaning of the word such that the words which are closer in that vector space are expected to have similar in mean. Consider, boy-men vs […]

Read more

Build your own AI chatbot from scratch!

This article was published as a part of the Data Science Blogathon Introduction It’s pretty simple! Today we will learn to create an AI chatbot from scratch using Intent matching and NLP algorithms. Let’s see what we are gonna do: * Prepare our dataset with questions(keywords) and respective intents. * Prepare a JSON file containing replies for each intent. * Transform our data into Tf-Idf Vectors. * Use Deep Neural Network to classify the User’s question into one of the intents […]

Read more

Text Preprocessing in NLP with Python codes

This article was published as a part of the Data Science Blogathon Introduction Natural Language Processing (NLP) is a branch of Data Science which deals with Text data. Apart from numerical data, Text data is available to a great extent which is used to analyze and solve business problems. But before using the data for analysis or prediction, processing the data is important. To prepare the text data for the model building we perform text preprocessing. It is the very first […]

Read more

Generate Questions from Movies!

This article was published as a part of the Data Science Blogathon Have you ever thought of generating questions from the SRT files of Movies? I don’t know if we can use this but it is pretty exciting when I came to know as a beginner that we can do that. What is SRT? In simple terms, the subtitles you see in Amazon Prime, Netflix, Hotstar, HBO, etc are saved in a text file with (.srt) extension with timestamps. The timestamp […]

Read more

SMS Spam Detection Using LSTM – A Hands On Guide!

Introduction  In today’s world, almost everyone is using a mobile phone and all of them will receive messages(SMS/ email) daily on their phone. But the main thing is that many of the received messages will be spam and only a few of them are ham or required messages. In this article, we are going to create an SMS spam detection model which will help you to find whether an SMS is spam or not using LSTM. About Dataset: Here we […]

Read more

Build your own NLP based search engine Using BM25

Introduction Ever wondered how these search engines like Google and Yahoo work. And ever thought about how can they scan all through the internet and return relevant results in just About 5,43,00,000 results (0.004seconds). Well, they work on the concept of Crawling and Indexing. Crawling: Automated bots looks for pages that are new or updated. And stores the key information like — URL, title, keywords, and so on from the pages to be used later. Indexing: Data captured from crawling is analyzed […]

Read more

Top 15 Open-Source Datasets of 2020 that every Data Scientist Should add to their Portfolio!

Overview Here is a list of Top 15 Datasets for 2020 that we feel every data scientist should practice on The article contains 5 datasets each for machine learning, computer vision, and NLP By no means is this list exhaustive. Feel free to add other datasets in the comments below   Introduction For the things we have to learn before we can do them, we learn by doing them -Aristotle I am sure everyone can attest to this saying. No […]

Read more

Step by step guide to building sentiment analysis model using graphlab

I have been using graph lab for quite some time now. The first Kaggle competition I used it for was Click Trough Rate (CTR) and I was amazed to see the speed at which it can crunch such big data. Over last few months, I have realised much broader applications of GraphLab. In this article I will take up the text mining capability of GraphLab and solve one of the Kaggle problems. I will be referring to this problem with […]

Read more

Natural Language Processing Made Easy – using SpaCy (​in Python)

Introduction Natural Language Processing is one of the principal areas of Artificial Intelligence. NLP plays a critical role in many intelligent applications such as automated chat bots, article summarizers, multi-lingual translation and opinion identification from data. Every industry which exploits NLP to make sense of unstructured text data, not just demands accuracy, but also swiftness in obtaining results. Natural Language Processing is a capacious field, some of the tasks in nlp are – text classification, entity detection, machine translation, question […]

Read more
1 2 3 7