PaliGemma 2 Mix – New Instruction Vision Language Models by Google

Last December, Google released PaliGemma 2: a new family of pre-trained (pt) PaliGemma vision language models (VLMs) based on SigLIP and Gemma 2. The models come in three different sizes (3B, 10B, 28B) and three different resolutions (224×224, 448×448, 896×896).

Today, Google is releasing PaliGemma 2 mix: fine-tuned on a mix of vision language tasks, including OCR, long and short captioning and more.

PaliGemma 2 pretrained (pt) variants are great vision language models to transfer on a given task at hand. All pt checkpoints are meant to be fine-tuned on a downstream task and were released for

To finish reading, please visit source site