A multilingual version of MS MARCO passage ranking dataset

A multilingual version of MS MARCO passage ranking dataset

This repository presents a neural machine translation-based method for translating the MS MARCO passage ranking dataset. The code available here is the same used in our paper mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset.

Translated Datasets

As described in our work, we made available 8 translated versions of MS MARCO passage ranking dataset. The translated passages collection and the queries set (training and validation) are available at:

Released Model Checkpoints

Our available fine-tuned models are:

* [email protected] on English MS MARCO

Dataset

We translate MS MARCO passage ranking dataset, a large-scale IR dataset comprising

 

 

 

To finish reading, please visit source site