Getting started with NLP using NLTK Library

1010010   01101001   01110100   01101000   01101001  01101011   01100001 Did you understand the above binary code? If yes, then you’re a computer. If no, then you’re a Human. 🙂 I know it’s a difficult task for us to understand binary code just like computers because binary code is a Machine Understandable Language. Likewise, even computers don’t understand human language. So, how to make computers understand human language? The answer is Natural Language Processing. With the help of NLP, we can teach computers […]

Read more

Text Generation Using Bidirectional LSTM – A Walk-through in Tensorflow

This article was published as a part of the Data Science Blogathon Text Generation The Text Generation is a Natural Language Processing task that involves automatically generating meaningful texts. We can also utilize the Text Generation process for Autocomplete. Initially, we provide a prompt, which is a text that is used as the base to generate texts. The model will generate texts based on the prompt, the predicted text will be added to the base prompt and it is fed again […]

Read more

New Anaphora and Co-reference Resolution Technique for Biographies

This article was published as a part of the Data Science Blogathon Introduction Biographies of many famous personalities are very insightful and inspiring. Although, one may not want to read the whole document. In order to just get the important points from the biography, one can generate a summary of the biography. The summary is generated by giving weights to all the words. Sometimes, anaphoras can be predicted by the machine as a separate word which in return produces a less […]

Read more

Malawi News Classification -An NLP Project

Classifying Malawi News articles into 19 different classes using SMOTE and SGDClassifier. Introduction Text classification is common among the application that we use on daily basis. For example, email providers use text classification to filter out spam emails from your inbox. The other most common use of text classification is in customer care where they use sentimental analysis to differentiate bad reviews from good reviews ADDI AI 2050. The modern use of text classification list goes on as we have excelled to […]

Read more

Email Spam Detection – A Comparative Analysis of 4 Machine Learning Models

This article was published as a part of the Data Science Blogathon Introduction This article aims to compare four different deep learning and machine learning algorithms to build a spam detector and evaluate their performances. The dataset we used was from a shuffled sample of email subjects and bodies containing both spam and ham emails in numerous proportions, which we converted into lemmas. Email Spam Detection is one of the most effective projects of Deep learning but this is often also […]

Read more

Identifying The Language of A Document Using NLP!

This article was published as a part of the Data Science Blogathon Introduction The goal of this article is to identify the language from the written text. The text in documents is available in many languages and when we don’t know the language it becomes very difficult sometimes to tell this to google translator as well. For most translators, we have to tell both the input language and the desired language. If you had a text written in Spanish and you […]

Read more

Performing Sentiment Analysis Using Twitter Data!

Photo by Daddy Mohlala on Unsplash Data is water, purifying to make it edible is a role of Data Analyst – Kashish Rastogi We are going to clean the twitter text data and visualize data in this blog. Table Of Contents: Problem Statement Data Description Cleaning text with NLP Finding if the text has: with spacy Cleaning text with preprocessor library Analysis of the sentiment of data Data visualizing   I am taking the twitter data which is available here on […]

Read more

Training BERT Text Classifier on Tensor Processing Unit (TPU)

Training hugging face most famous model on TPU for social media Tunisian Arabizi sentiment analysis.   Introduction The Arabic speakers usually express themself in local dialect on social media, so Tunisians use Tunisian Arabizi which consists of Arabic written in form of Latin alphabets. The sentiment analysis relies on cultural knowledge and word sense with contextual information. We will be using both Arabizi dialect and sentimental analysis to solve the problem in this project. The competition is hosted on Zindi which […]

Read more

Evaluating the Factual Consistency of Abstractive Text Summarization

factCC Evaluating the Factual Consistency of Abstractive Text SummarizationAuthors: Wojciech Kryściński, Bryan McCann, Caiming Xiong, and Richard Socher Introduction Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents.We propose a weakly-supervised, model-based approach for verifying factual consistency and identifying conflicts between source documents and a generated summary.Training data is generated by applying a series of rule-based transformations to the sentences of source documents.The factual consistency model is then trained jointly […]

Read more

Why must text data be pre-processed ?

This article was published as a part of the Data Science Blogathon Introduction Language is a structured medium we humans use to communicate with each other. Language can be in the form of speech or text. “Blah blah”, “Meh”, “zzzz…” Yup, we can understand these words. But the question is, “Can computers understand these?” Nop, machines can’t understandthese. In fact, machines can’t understand any text data at all, be it the word “blah” or the word “machine”. They only understand numbers. […]

Read more
1 2 3 4 22