Smol2Operator: Post-Training GUI Agents for Computer Use

TL;DR: This work shows how a lightweight vision–language model can acquire GUI-grounded skills and evolve into an agentic GUI coder. We release all training recipes, data-processing tools, resulting model, demo and datasets to enable full reproducibility and foster further research 🫡. Find the collection here.


This video demonstrates the model obtained through the recipe described below, executing a task end-to-end.



Table of Contents

 

To finish reading, please visit source site