SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE: Simple Contrastive Learning of Sentence Embeddings
This repository contains the code and pre-trained models for our paper SimCSE: Simple Contrastive Learning of Sentence Embeddings.
Overview
We propose a simple contrastive learning framework that works with both unlabeled and labeled data. Unsupervised SimCSE simply takes an input sentence and predicts itself in a contrastive learning framework, with only standard dropout used as noise. Our supervised SimCSE incorporates annotated pairs from NLI datasets into contrastive learning by using entailment
pairs as positives and contradiction
pairs as hard negatives. The following figure is an illustration of our models.
Getting Started
We provide an easy-to-use sentence embedding tool based on our