Part 16 : Step by Step Guide to Master NLP – Topic Modelling using LSA

This article was published as a part of the Data Science Blogathon Introduction This article is part of an ongoing blog series on Natural Language Processing (NLP). In the previous article, we completed a basic technique of Topic Modeling named Non-Negative Matrix Factorization. So, In continuation of that part now we will start our discussion on another Topic modeling technique named Latent Semantic Analysis. So, In this article, we will deep dive into a Topic Modeling technique named Latent Semantic Analysis […]

Read more

Part 20: Step by Step Guide to Master NLP – Information Retrieval

This article was published as a part of the Data Science Blogathon Introduction This article is part of an ongoing blog series on Natural Language Processing (NLP). In the previous article, we completed our discussion on Topic Modelling Techniques. Now, in this article, we will be discussing an important application of NLP in Information Retrieval. So, In this article, we will discuss the basic concepts of Information Retrieval along with some of the models that are used in Information Retrieval. NOTE: […]

Read more

Bag-of-words vs TFIDF vectorization –A Hands-on Tutorial

This article was published as a part of the Data Science Blogathon Whenever we apply any algorithm to textual data, we need to convert the text to a numeric form. Hence, there arises a need for some pre-processing techniques that can convert our text to numbers. Both bag-of-words (BOW) and TFIDF are pre-processing techniques that can generate a numeric form from an input text. Bag-of-Words: The bag-of-words model converts text into fixed-length vectors by counting how many times each word appears. […]

Read more

Vision Transformer for Fast and Efficient Scene Text Recognition

deep-text-recognition-benchmark ViTSTR is a simple single-stage model that uses a pre-trained Vision Transformer (ViT) to perform Scene Text Recognition (ViTSTR). It has a comparable accuracy with state-of-the-art STR models although it uses significantly less number of parameters and FLOPS. ViTSTR is also fast due to the parallel computation inherent to ViT architecture. ViTSTR is built using a fork of CLOVA AI Deep Text Recognition Benchmark whose original documentation is at the bottom. Below we document how to train and evaluate […]

Read more

A Simple Strong Baseline for TextVQA and TextCaps

Simple is not Easy Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021] Citation If you use ssbaseline in your work, please cite: @article{zhu2020simple, title={Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps}, author={Zhu, Qi and Gao, Chenyu and Wang, Peng and Wu, Qi}, journal={arXiv preprint arXiv:2012.05153}, year={2020} } Installation First install the repo using git clone https://github.com/ZephyrZhuQi/ssbaseline.git ~/ssbaseline cd ~/ssbaseline python setup.py build develop Getting Data We provide SBD-Trans OCR for TextVQA and […]

Read more

Getting Started with Natural Language Processing using Python

This article was published as a part of the Data Science Blogathon Why NLP? Natural Language Processing has always been a key tenet of Artificial Intelligence (AI). With the increase in the adoption of AI, systems to automate sophisticated tasks are being built. Some of these examples are described below. Diagnosing rare form of cancer –  At the University of Tokyo’s Institute of Medical Science, doctors used artificial intelligence to successfully diagnose a rare type of leukemia. The doctors used an AI […]

Read more

A Python parser that takes the content of a text file and then reads into variables

Text-File-Parser A Python parser that takes the content of a text file and then reads into variables. Input.text File 1. What is your ***? 1. 18 – 34 2. 35- 44 3. 45- 54 4. 55-64 5. Over 65 6. Don’t know 2. What *** do you live in? 1. Ontario 2. Quebec 3. Manitoba 4. Alberta 5. Other Given a plain text file as above, this Python script reads all the questions and their numbers, storing them into two […]

Read more

Feature Extraction and Embeddings in NLP: A Beginners guide to understand Natural Language Processing

This article was published as a part of the Data Science Blogathon Introduction In Natural Language Processing, Feature Extraction is one of the trivial steps to be followed for a better understanding of the context of what we are dealing with. After the initial text is cleaned and normalized, we need to transform it into their features to be used for modeling. We use some particular method to assign weights to particular words within our document before modeling them. We go […]

Read more

Encode and decode text application in python

Text Encoder and Decoder Encode and decode text in many ways using this GUI application! Encode in: ASCII85 Base85 Base64 Base32 Base16 Url MD5 Hash SHA-1 SHA-224 SHA-384 SHA-256 SHA-512 Decode in: ASCII85 Base85 Base64 Base32 Base16 Url GitHub https://github.com/nonimportant/text-encode-and-decoder    

Read more

Indexing in Natural Language Processing for Information Retrieval

This article was published as a part of the Data Science Blogathon Overview This blog covers GREP(Global-Regular-Expression-Print) and its drawbacks Then we move on to Document Term Matrix and Inverted Matrix Finally, we end with dynamic and distributed indexing image source-https://javarevisited.blogspot.com/2011/06/10-examples-of-grep-command-in-unix-and.html#axzz6zwakOXgt     Global Regular Expression Print Whenever we are dealing with a small amount of data, we can use the grep command very efficiently. It allows us to search one or more files for lines that contain a pattern. For […]

Read more
1 2 3 4 5 22