SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM2 represents a fundamental shift in how we think about video understanding – moving from massive models that require substantial computing resources to efficient models that can run anywhere. Our goal is simple: make video understanding accessible across all devices and use cases, from phones to servers.

We are releasing models in three sizes (2.2B, 500M and 256M), MLX ready (Python and Swift APIs) from day zero.
We’ve made all models and demos available in this collection.

Want to try SmolVLM2 right away? Check out our interactive chat interface where you can test visual and video

To finish reading, please visit source site