FuzzyWuzzy Python Library: Interesting Tool for NLP and Text Analytics

This article was published as a part of the Data Science Blogathon

Introduction

There are many ways to compare text in python. But, often we search for an easy way to compare text. Comparing text is needed for various text analytics and Natural Language Processing purposes.

One of the easiest ways of comparing text in python is using the fuzzy-wuzzy library. Here, we get a score out of 100, based on the similarity of the strings. Basically, we are given the similarity index. The library uses Levenshtein distance to calculate the difference between two strings.

FuzzyWuzzy image
Image Source: https://www.pexels.com/

Levenshtein Distance

The Levenshtein distance is a string metric to calculate the difference between two different strings. Soviet mathematician Vladimir Levenshtein formulated this method and it is named after him.

The Levenshtein distance between two strings a,b (of length {|a| and |b| respectively) is given by lev(a,b) where

FuzzyWuzzy Levenshtein

where

 

 

 

To finish reading, please visit source site