Smol2Operator: Post-Training GUI Agents for Computer Use

TL;DR: This work shows how a lightweight vision–language model can acquire GUI-grounded skills and evolve into an agentic GUI coder. We release all training recipes, data-processing tools, resulting model, demo and datasets to enable full reproducibility and foster further research 🫡. Find the collection here.

This video demonstrates the model obtained through the recipe described below, executing a task end-to-end.

Smol2Operator: Post-Training GUI Agents for Computer Use

Table of Contents

To finish reading, please visit source site