Training Design for Text-to-Image Models: Lessons from Ablations

Welcome back! This is the second part of our series on training efficient text-to-image models from scratch.

In the first post of this series, we introduced our goal: training a competitive text-to-image foundation model entirely from scratch, in the open, and at scale. We focused primarily on architectural choices and motivated the core design decisions behind our model PRX.
We also released an early, small (1.2B parameters) version of the model as a preview of what we are building (go try it if you haven’t already 😉).

In this post, we

To finish reading, please visit source site