Part 20: Step by Step Guide to Master NLP – Information Retrieval

This article was published as a part of the Data Science Blogathon Introduction This article is part of an ongoing blog series on Natural Language Processing (NLP). In the previous article, we completed our discussion on Topic Modelling Techniques. Now, in this article, we will be discussing an important application of NLP in Information Retrieval. So, In this article, we will discuss the basic concepts of Information Retrieval along with some of the models that are used in Information Retrieval. NOTE: […]

Read more

Indexing in Natural Language Processing for Information Retrieval

This article was published as a part of the Data Science Blogathon Overview This blog covers GREP(Global-Regular-Expression-Print) and its drawbacks Then we move on to Document Term Matrix and Inverted Matrix Finally, we end with dynamic and distributed indexing image source-https://javarevisited.blogspot.com/2011/06/10-examples-of-grep-command-in-unix-and.html#axzz6zwakOXgt     Global Regular Expression Print Whenever we are dealing with a small amount of data, we can use the grep command very efficiently. It allows us to search one or more files for lines that contain a pattern. For […]

Read more

Information Retrieval System explained in simple terms!

Introduction While searching for things over internet, I always wondered, what kind of algorithms might be running behind these search engines which provide us with the most relevant information? How do they decide which result to show for which set of search keywords. This might be a no brainer for a few people, but definitely an interesting problem for some of the best brains around the world. To find the answer, I read every guide, tutorial, learning material that came my way. Eventually, I learnt […]

Read more

How Search Engines like Google Retrieve Results: Introduction to Information Extraction using Python and spaCy

Overview How do search engines like Google understand our queries and provide relevant results? Learn about the concept of information extraction We will apply information extraction in Python using the popular spaCy library – so a lot of hands-on learning is ahead!   Introduction I rely heavily on search engines (especially Google) in my daily role as a data scientist. My search results span a variety of queries – Python code questions, machine learning algorithms, comparison of Natural Language Processing […]

Read more

Introductory guide to Information Retrieval using kNN and KDTree

Introduction I love cricket as much as I love data science. A few years back (on 16 November 2013 to be precise), my favorite cricketer – Sachin Tendulkar retired from International Cricket. I spent that entire day reading articles and blogs about him on the web. By the end of the day, I had read close to 50 articles about him. Interestingly, while I was reading these articles – none of the websites suggested me articles outside of Sachin or cricket. […]

Read more

Information Retrieval using word2vec based Vector Space Model

Overview Learn about Information Retrieval (IR), Vector Space Models (VSM), and Mean Average Precision (MAP) Create a project on Information Retrieval using word2vec based Vector Space Model   Introduction “Google it!”- Isn’t it something we say every day? Whenever we come across something that we don’t know about, we “Google it.” Google Search is a great tool that can be used for even finding a needle from a haystack. This generation absolutely relies on Google for answers to all kinds […]

Read more

Claraprint: a chord and melody based fingerprint for western classical music cover detection

Cover song detection has been an active field in the Music Information Retrieval (MIR) community during the past decades. Most of the research community focused in solving it for a wide range of music genres with diverse characteristics… Western classical music, a genre heavily based on the recording of “cover songs”, or musical works, represents a large heritage, offering immediate application for an efficient fingerprint algorithm. We propose an engineering approach for retrieving a cover song from a reference database […]

Read more

Multilingual Music Genre Embeddings for Effective Cross-Lingual Music Item Annotation

Annotating music items with music genres is crucial for music recommendation and information retrieval, yet challenging given that music genres are subjective concepts. Recently, in order to explicitly consider this subjectivity, the annotation of music items was modeled as a translation task: predict for a music item its music genres within a target vocabulary or taxonomy (tag system) from a set of music genre tags originating from other tag systems… However, without a parallel corpus, previous solutions could not handle […]

Read more