Vector space based Information Retrieval System for Text Processing – Information retrieval

Sequence of operations Install Requirements Add given wikipedia files to the corpus directory. Download glove.6B.100d.txt dataset (Ignore if already present) and place it in the project root directory. Run construct_index.py Run construct_index.py –zoned_index True Run trim_embeddings.py Run test_queries.py Run test_queries.py –score_title True Run test_queries.py –expand_query True Installing Requirements: pip install -r requirements.txt corpus Contains the files to be indexed. Add files directly to this directory. Do not create subdirectories.For this assignment, we have used the following files present in the […]

Read more

A Python (2 and 3) library for processing textual data

Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. from textblob import TextBlob text = ”’ The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of–as a doomed doctor chillingly […]

Read more