March 13, 2026 huggingface

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Today, we introduce SmolVLA, a compact (450M), open-source Vision-Language-Action model for robotics that runs on consumer hardware.

Pretrained only on compatibly licensed, open-source community-shared datasets under the lerobot tag.
SmolVLA-450M outperforms much larger VLAs and strong baselines such as ACT on simulation (LIBERO, Meta-World) and real-world tasks (SO100, SO101).
Supports asynchronous inference for 30% faster response and 2× task throughput.

Useful links:

📚 Table of Contents

To finish reading, please visit source site