Visual Document Retrieval Goes Multilingual
TL;DR: We present
vdr-2b-multi-v1, the best multilingual embedding model for visual document retrieval. We also release its English-only twinvdr-2b-v1and open-source the newvdr-multilingual-traindataset. With 500k high-quality samples, it’s the largest open-source multilingual