Speculative Decoding for 2x Faster Whisper Inference

Open AI’s Whisper is a general purpose speech transcription model that achieves state-of-the-art results across a range of different benchmarks and audio conditions. The latest large-v3 model tops the OpenASR Leaderboard, ranking as the best open-source speech transcription model for English. The model also demonstrates strong multilingual performance, achieving less    

Read more

LoRA training scripts of the world, unite!

A community derived guide to some of the SOTA practices for SD-XL Dreambooth LoRA fine tuning TL;DR We combined the Pivotal Tuning technique used on Replicate’s SDXL Cog trainer with the Prodigy optimizer used in the Kohya trainer (plus a bunch of other optimizations) to achieve very good results on training Dreambooth LoRAs for SDXL. Check    

Read more

Welcome aMUSEd: Efficient Text-to-Image Generation

We’re excited to present an efficient non-diffusion text-to-image model named aMUSEd. It’s called so because it’s a open reproduction of Google’s MUSE. aMUSEd’s generation quality is not the best and we’re releasing a research preview with a permissive license. In contrast to the commonly used latent diffusion approach (Rombach et al. (2022)), aMUSEd employs a Masked Image Model (MIM) methodology. This not only requires fewer inference steps, as noted by Chang et al. (2023), but also enhances the model’s interpretability. […]

Read more

A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara’s hallucination leaderboard

Hugging Face’s Open LLM Leaderboard (originally created by Ed Beeching and Lewis Tunstall, and maintained by Nathan Habib and Clémentine Fourrier) is well known for tracking the performance of open source LLMs, comparing their performance in a variety of tasks, such as TruthfulQA or HellaSwag. This has been of tremendous value to the open-source community, as it provides a way for practitioners to keep track of the best open-source models. In late 2023, at Vectara we introduced the Hughes Hallucination […]

Read more

Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive

SD Turbo and SDXL Turbo are two fast generative text-to-image models capable of generating viable images in as little as one step, a significant improvement over the 30+ steps often required with previous Stable Diffusion models. SD Turbo is a distilled version of Stable Diffusion 2.1, and SDXL Turbo is a distilled version of SDXL 1.0. We’ve previously shown how to accelerate Stable Diffusion inference with ONNX Runtime. Not only does ONNX Runtime provide performance benefits when used with SD […]

Read more

Preference Tuning LLMs with Direct Preference Optimization Methods

Addendum After consulting with the authors of the IPO paper, we discovered that the implementation of IPO in TRL was incorrect; in particular, the loss over the log-likelihoods of the completions needs to be averaged instead of summed. We have added a fix in this PR and re-run the experiments. The results are now consistent with the paper, with IPO on par with DPO and performing better than KTO in the paired preference setting. We have updated the post to […]

Read more

PatchTSMixer in HuggingFace – Getting Started

PatchTSMixer is a lightweight time-series modeling approach based on the MLP-Mixer architecture. It is proposed in TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting by IBM Research authors Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong and Jayant Kalagnanam. For effective mindshare and to promote open-sourcing – IBM Research joins hands with the HuggingFace team to release this model in the Transformers library. In the Hugging Face implementation, we provide PatchTSMixer’s capabilities to effortlessly facilitate lightweight mixing across patches, […]

Read more
1 30 31 32 33 34 1,021