Text Preprocessing

This is the second step of the NLP end-to-end pipeline. In this step, We generally perform basic preprocessing and then advanced preprocessing but it depends on problem to problem. Let’s see the steps of text preprocessing. Lowercasing:- This is the first step of data preprocessing. It’s compulsory for all kinds of problems because whenever we work on an    

Read more

Similarity to Probability — Part I: Visual Word Embedding for OCR Post Correction

In this post, I will revisit in more detail our previous work that uses human-inspired likelihood revision or similarity to probability [Blok et al. 2003] to re-rank or score any word or text fragment based on the semantic relation to an external context. We will use the most popular Semantic Similarity pre-trained model (e.g., w2v, GloVe, fasttext, etc.) to compute these relations.

Read more
1 6 7 8 9 10 27