Introduction to StanfordNLP: An Incredible State-of-the-Art NLP Library for 53 Languages (with Python code)

Introduction A common challenge I came across while learning Natural Language Processing (NLP) – can we build models for non-English languages? The answer has been no for quite a long time. Each language has its own grammatical patterns and linguistic nuances. And there just aren’t many datasets available in other languages. That’s where Stanford’s latest NLP library steps in – StanfordNLP. I could barely contain my excitement when I read the news last week. The authors claimed StanfordNLP could support more […]

Read more

DataHack Radio #23: Ines Montani and Matthew Honnibal – The Brains behind spaCy

https://soundcloud.com/datahack-radio/ines-montani-matthew-honnibal-the-brains-behind-spacy Introduction What would you do if you had the chance to pick the brains behind one of the most popular Natural Language Processing (NLP) libraries of our era? A library that has helped usher in the current boom in NLP applications and nurtured tons of NLP scientists? Well – you invite the creators on our popular DataHack Radio podcast and let them do the talking! We are delighted to welcome Ines Montani and Matt Honnibal, the developers of spaCy […]

Read more

Introductory guide to Information Retrieval using kNN and KDTree

Introduction I love cricket as much as I love data science. A few years back (on 16 November 2013 to be precise), my favorite cricketer – Sachin Tendulkar retired from International Cricket. I spent that entire day reading articles and blogs about him on the web. By the end of the day, I had read close to 50 articles about him. Interestingly, while I was reading these articles – none of the websites suggested me articles outside of Sachin or cricket. […]

Read more

DataHack Radio #12: Exploring the Nuts and Bolts of Natural Language Processing with Sebastian Ruder

https://soundcloud.com/datahack-radio/episode-12-sebastian-ruder Introduction There’s text everywhere around us, from digital sources like social media to physical objects like books and print media. The amount of text data being generated every day is mind boggling and yet we’re not even close to harnessing the full power of natural language processing. I see a ton of aspiring data scientists interested in this field, but they often turn away daunted by the challenges NLP presents. It’s such a niche line of work, and we […]

Read more

Sentiment Analysis in Python With TextBlob

Introduction State-of-the-art technologies in NLP allow us to analyze natural languages on different layers: from simple segmentation of textual information to more sophisticated methods of sentiment categorizations. However, it does not inevitably mean that you should be highly advanced in programming to implement high-level tasks such as sentiment analysis in Python. Sentiment Analysis The algorithms of sentiment analysis mostly focus on defining opinions, attitudes, and even emoticons in a corpus of texts. The range of established sentiments significantly varies from […]

Read more

How I used NLP (Spacy) to screen Data Science Resumes

Resume making is very tricky. A candidate has many dilemmas, whether to state a project at length or just mention the bare minimum whether to mention many skills or just mention his/her core competency skill whether to mention many programming languages or just cite a few whether to restrict the resume to 2 pages or 1 page These dilemmas are equally hard for Data Scientists looking for a change or even for aspiring Data Scientist. Now before you wonder where […]

Read more

How Can Python Help Solve Machine Learning Challenges?

Summary: Python’s open-source and high-level nature, as well as its comprehensive libraries, make it the perfect fit to solve the numerous real-life ML challenges. The increasing popularity and accessibility of Artificial Intelligence solutions is rapidly reshaping many industries, from healthcare through finance to aviation. Although the application of the latest technologies has always been an essential consideration for companies striving to get ahead of the curve, the ubiquity of AI means that it’s becoming the core of many operations. But […]

Read more

Text Mining 101: A Stepwise Introduction to Topic Modeling using Latent Semantic Analysis (using Python)

Introduction Have you ever been inside a well-maintained library? I’m always incredibly impressed with the way the librarians keep everything organized, by name, content, and other topics. But if you gave these librarians thousands of books and asked them to arrange each book on the basis of their genre, they will struggle to accomplish this task in a day, let alone an hour! However, this won’t happen to you if these books came in a digital format, right? All the […]

Read more

Quick Introduction to Bag-of-Words (BoW) and TF-IDF for Creating Features from Text

The Challenge of Making Machines Understand Text “Language is a wonderful medium of communication” You and I would have understood that sentence in a fraction of a second. But machines simply cannot process text data in raw form. They need us to break down the text into a numerical format that’s easily readable by the machine (the idea behind Natural Language Processing!). This is where the concepts of Bag-of-Words (BoW) and TF-IDF come into play. Both BoW and TF-IDF are […]

Read more

Introduction to Structuring Customer complaints explained with examples

Introduction In past, if you were not particularly happy with a service or a product, you would go to the service provider or the shop and lodge a complaint. With services-businesses going online and due to enormous scale, lodging complaints in-person may not be always possible. Electronic ways such as emails, social media and particularly websites like www.consumercomplaints.in focusing on such issues, are widely used platforms to vent out the anger as well as publicizing the issue in expectancy of […]

Read more
1 2 3 4 5 11