Python tutorials

Minun and Explainable Entity Matching

Given two collections of entities, such as product listings, the entity matching (EM) problem aims to identify all pairs that refer to the same object in the real world, such as products, publications, businesses, etc. Recently, deep learning (DL) techniques have been widely applied to the EM problem and have achieved promising results. Unfortunately, the performance gain brought by DL techniques comes at the cost of reducing transparency and interpretability. The reason is that DL-based approaches are more like black-box […]

Read more

Using Conda? You might not need Docker

Docker packaging is useful, but doing it well is not easy. Even limiting the scope of discussion to production use of Python applications, the number of details to cover is extensive enough that I’ve written over 50 articles on the topic, and created a number of products to speed up the packaging process. In a better universe, none of this would be necessary. So while Docker is often useful enough to merit this effort, in some situations you might be […]

Read more

What Does if __name__ == “__main__” Do in Python?

You’ve likely encountered Python’s if __name__ == “__main__” idiom when reading other people’s code. No wonder—it’s widespread! You might have even used if __name__ == “__main__” in your own scripts. But did you use it correctly? Maybe you’ve programmed in a C-family language like Java before, and you wonder whether this construct is a clumsy accessory to using a main() function as an entry point. Syntactically, Python’s if __name__ == “__main__” idiom is just a normal conditional block: 1if __name__ […]

Read more

N-gram Language Model Based Next Word Suggestion — Part 1

Have you ever wondered how, when typing a message on your phone, it’s able to suggest the next word? Or even being able to correct your spelling on the fly, turning your angry WhatsApp to your boss into a laughable matter because you end up telling them to go and duck themselves? One of the many ways to achieve next word suggestion is using an n-gram language model. This article briefly overviews what an n-gram language model is and how […]

Read more

pySBD: Hidden Gem for Sentence Boundary Detection

Although it may seem simple, human language is noisy and complex. Only up to a certain point does dividing text into sentences based only on punctuation make sense. The best thing about pySBD is that it can handle a wide range of edge cases, including abbreviations, decimal numbers, and other challenging situations that are frequently seen in corpora from the legal, financial, and biomedical fields. PySBD recognises sentence boundaries using a rule-based method, in contrast to the majority of other […]

Read more
1 88 89 90 91 92 191