SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Today, we introduce SmolVLA, a compact (450M), open-source Vision-Language-Action model for robotics that runs on consumer hardware.

  • Pretrained only on compatibly licensed, open-source community-shared datasets under the lerobot tag.
  • SmolVLA-450M outperforms much larger VLAs and strong baselines such as ACT on simulation (LIBERO, Meta-World) and real-world tasks (SO100, SO101).
  • Supports asynchronous inference for 30% faster response and 2× task throughput.

Useful links:



📚 Table of Contents

 

To finish reading, please visit source site