Smol2Operator: Post-Training GUI Agents for Computer Use
TL;DR: This work shows how a lightweight vision–language model can acquire GUI-grounded skills and evolve into an agentic GUI coder. We release all training recipes, data-processing tools, resulting model, demo and datasets to enable full reproducibility and foster further research 🫡. Find the collection here.
This video demonstrates the model obtained through the recipe described below, executing a task end-to-end.