March 13, 2026 huggingface

Porting fairseq wmt19 translation system to transformers

March 13, 2026 huggingface

Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models

Transformer-based encoder-decoder models were proposed in Vaswani et al. (2017) and have recently experienced a surge of interest, e.g. Lewis et al. (2019), Raffel et al. (2019), Zhang et al. (2020), Zaheer et al. (2020), Yan et al. (2020). Similar

March 13, 2026 huggingface

How we sped up transformer inference 100x for 🤗 API customers

🤗 Transformers has become the default library for data scientists all around the world to explore state of the art NLP models and build new NLP features. With over 5,000 pre-trained and fine-tuned models available, in over 250 languages, it is a rich playground, easily accessible whichever framework you are working in. While experimenting with models in 🤗 Transformers is easy, deploying

March 13, 2026 huggingface

Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

A guest blog post by Hugging Face fellow Stas Bekman As recent Machine Learning models have been growing much faster than the amount of GPU memory added to newly released cards, many users are unable to train or even just load some of those huge models onto their hardware. While there is an ongoing effort to distill some of those huge models

March 13, 2026 huggingface

Faster TensorFlow models in Hugging Face Transformers

In the last few months, the Hugging Face team has been working hard on improving Transformers’ TensorFlow models to make them more robust and faster. The recent improvements are mainly focused on two aspects: Computational performance: BERT, RoBERTa, ELECTRA and MPNet have been improved in order to have a much faster computation time. This

March 13, 2026 huggingface

Hugging Face on PyTorch / XLA TPUs: Faster and cheaper training

Training Your Favorite Transformers on Cloud TPUs using PyTorch / XLA The PyTorch-TPU project originated

March 13, 2026 huggingface

Retrieval Augmented Generation with Huggingface Transformers and Ray

A guest blog post by Amog Kamsetty from the Anyscale team Huggingface Transformers recently added the Retrieval Augmented Generation (RAG) model, a new NLP architecture that leverages external documents (like Wikipedia) to augment its knowledge and achieve state of

March 13, 2026 huggingface

🚧 Simple considerations for simple people building fancy neural networks

March 13, 2026 huggingface

Hugging Face Reads, Feb. 2021 – Long-range Transformers

Efficient Transformers taxonomy from Efficient Transformers: a Survey by Tay et al. Co-written by Teven Le Scao, Patrick Von Platen, Suraj Patil, Yacine Jernite and Victor Sanh. Each month, we will choose a topic to focus on, reading a set of four papers recently published on the subject. We will then write a short blog post summarizing

March 13, 2026 huggingface

Fine-Tune Wav2Vec2 for English ASR with 🤗 Transformers

Wav2Vec2 is a pretrained model for Automatic Speech Recognition (ASR) and was released in September 2020 by Alexei Baevski, Michael Auli, and Alex Conneau. Using a novel contrastive pretraining objective, Wav2Vec2 learns powerful speech representations from more than 50.000 hours of unlabeled speech. Similar, to BERT’s masked language

1 2 3 … 70 »