Regex Cheatsheet For Natural Language Processing tasks

This article was published as a part of the Data Science Blogathon Introduction Regex is a shorthand for Regular Expression. It is a representation for a set, a set of strings. Say we have a list of emails and we want to check if they are in the correct format or not. One way is to check each and every mail manually but that’s not possible if the number of mails is quite high. So, regex here comes to your rescue. […]

Read more

Part 13: Step by Step Guide to Master NLP – Regular Expressions

This article was published as a part of the Data Science Blogathon Introduction This article is part of an ongoing blog series on Natural Language Processing (NLP). From this article, we will start our discussion on Regular Expressions. When a data scientist comes across a text processing problem whether it is searching for titles in names or dates of birth in a dataset, regular expressions rear their ugly head very frequently. They form part of the basic techniques in NLP and […]

Read more

Extracting information from reports using Regular Expressions Library in Python

Introduction Many times it is necessary to extract key information from reports, articles, papers, etc. For example names of companies – prices from financial reports, names of judges – jurisdiction from court judgments, account numbers from customer complaints, etc. These extractions are part of Text Mining and are essential in converting unstructured data to a structured form which are later used for applying analytics/machine learning. Such entity extraction uses approaches like ‘lookup’, ‘rules’ and ‘statistical/machine learning’. In ‘lookup’ based approaches, […]

Read more

Beginners Tutorial for Regular Expressions in Python

Importance of Regular Expressions In last few years, there has been a dramatic shift in usage of general purpose programming languages for data science and machine learning. This was not always the case – a decade back this thought would have met a lot of skeptic eyes! This means that more people / organizations are using tools like Python / JavaScript for solving their data needs. This is where Regular Expressions become super useful. Regular expressions are normally the default way […]

Read more