Use Tableau to Connect to DuckDB

I recently came across a situation where DuckDB fit the need on a recent project. The ask was to demonstrate the construction of a data pipeline and highlight the analytical possibilities of a spatial tracking dataset. DuckDB is an impressive in-process single-file database that proved to be blazingly fast, but also an absolute delight to code against.

Read more

The Infamous GIL

In this series of posts I am trying to break down one of the complicated and intriguing topics in python. Please visit my last post to understand why multi threading is broken in python(Take this comment with a pinch of salt). https://www.reddit.com/r/Python/comments/xdyahc/multithreading_a_concept_which_is_always/

Read more

Minun and Explainable Entity Matching

Given two collections of entities, such as product listings, the entity matching (EM) problem aims to identify all pairs that refer to the same object in the real world, such as products, publications, businesses, etc. Recently, deep learning (DL) techniques have been widely applied to the EM problem and have achieved promising results. Unfortunately, the performance gain brought by DL techniques comes at the cost of reducing transparency and interpretability. The reason is that DL-based approaches are more like black-box […]

Read more

Using Conda? You might not need Docker

Docker packaging is useful, but doing it well is not easy. Even limiting the scope of discussion to production use of Python applications, the number of details to cover is extensive enough that I’ve written over 50 articles on the topic, and created a number of products to speed up the packaging process. In a better universe, none of this would be necessary. So while Docker is often useful enough to merit this effort, in some situations you might be […]

Read more
1 155 156 157 158 159 927