A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution
dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on structured data. dedupe will help you: remove duplicate entries from a spreadsheet of names and addresses link a list with customer information to another with order history, even without unique customer IDs take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record dedupe […]
Read more