A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution

dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on structured data. dedupe will help you: remove duplicate entries from a spreadsheet of names and addresses link a list with customer information to another with order history, even without unique customer IDs take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record dedupe […]

Read more

Python package for performing Entity and Text Matching using Deep Learning

DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and utilities that enable you to train and apply state-of-the-art deep learning models for entity matching in less than 10 lines of code. The models are also easily customizable – the modular design allows any subcomponent to be altered or swapped out for a custom implementation. As an example, given labeled tuple pairs such as the following: DeepMatcher uses labeled tuple […]

Read more