Welcome aMUSEd: Efficient Text-to-Image Generation
We’re excited to present an efficient non-diffusion text-to-image model named aMUSEd. It’s called so because it’s a open reproduction of Google’s MUSE. aMUSEd’s generation quality is not the best and we’re releasing a research preview with a permissive license. In contrast to the commonly used latent diffusion approach (Rombach et al. (2022)), aMUSEd employs a Masked Image Model (MIM) methodology. This not only requires fewer inference steps, as noted by Chang et al. (2023), but also enhances the model’s interpretability. […]
Read more