Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval
We introduce the concept of embedding quantization and showcase their impact on retrieval speed, memory usage, disk space, and cost. We’ll discuss how embeddings can be quantized in theory and in practice, after which we introduce a demo showing a real-life retrieval scenario of 41 million Wikipedia texts.
Table of Contents
Why Embeddings?
Embeddings are one