SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data
Today, we introduce SmolVLA, a compact (450M), open-source Vision-Language-Action model for robotics that runs on consumer hardware.
- Pretrained only on compatibly licensed, open-source community-shared datasets under the lerobot tag.
- SmolVLA-450M outperforms much larger VLAs and strong baselines such as ACT on simulation (LIBERO, Meta-World) and real-world tasks (SO100, SO101).
- Supports asynchronous inference for 30% faster response and 2× task throughput.
Useful links: