Training and Finetuning Sparse Embedding Models with Sentence Transformers

Tom Aarsen's avatar
Arthur BRESNU's avatar

Sentence Transformers is a Python library for using and training dense embedding, reranker (cross encoder), and sparse embedding models for a wide range of applications, such as retrieval augmented generation, semantic search, semantic textual similarity, paraphrase mining, and more. In this blogpost, I’ll show you how to use it to finetune a sparse encoder/embedding model and explain why you might want to do so. This results in sparse-encoder/example-inference-free-splade-distilbert-base-uncased-nq, a cheap model that works especially well in hybrid search or retrieve and rerank scenarios.

Finetuning sparse embedding models involves several components: the model, datasets, loss functions, training arguments, evaluators, and the trainer class. I’ll have a look at each of these components, accompanied by practical examples of how they can be used for finetuning strong sparse embedding models.

In addition to training your own models, you can choose from a wide

 

 

 

To finish reading, please visit source site