Welcome to the Falcon 3 Family of Open Models!

We introduce Falcon3, a family of decoder-only large language models under 10 billion parameters, developed by Technology Innovation Institute (TII) in Abu Dhabi. By pushing the boundaries of performance and training efficiency, this release reflects our ongoing commitment to advancing open and accessible large foundation models. Falcon3 represents a natural evolution from previous releases,    

Read more

Bamba: Inference-Efficient Hybrid Mamba2 Model 🐍

We introduce Bamba-9B, an inference-efficient Hybrid Mamba2 model trained by IBM, Princeton, CMU, and UIUC on completely open data. At inference time, the model demonstrates 2.5x throughput improvement and 2x latency speedup compared to standard transformers in vLLM. To foster community experimentation, the model is immediately available to use in transformers, vLLM, TRL, and llama.cpp. We also release tuning, training, and extended pretraining recipes with a stateful data loader, and invite the community to further improve this model. Let’s overcome […]

Read more

Finally, a Replacement for BERT

This blog post introduces ModernBERT, a family of state-of-the-art encoder-only models representing improvements over older generation encoders across the board, with a 8192 sequence length, better downstream performance and much faster processing. ModernBERT is available as a slot-in replacement for any BERT-like models, with both a base (149M params) and large (395M params) model size. Click to see how to use these models with transformers ModernBERT will be included in v4.48.0 of transformers. Until then, it requires installing transformers from […]

Read more

Visualize and understand GPU memory in PyTorch

You must be familiar with this message 🤬: RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.93 GiB total capacity; 6.00 GiB already allocated; 14.88 MiB free; 6.00 GiB reserved in total by PyTorch) While it’s easy to see that GPU memory is full, understanding why and how to fix it can be more challenging. In    

Read more

Introducing smolagents, a simple library to build agents

Today we are launching smolagents, a very simple library that unlocks agentic capabilities for language models. Here’s a glimpse: from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel()) agent.run(“How many seconds would it take for a leopard at full speed to run through Pont des Arts?”) Table of Contents

Read more

CO₂ Emissions and Models Performance: Insights from the Open LLM Leaderboard

Since June 2024, we have evaluated more than 3,000 models on the Open LLM Leaderboard, a worldwide ranking of open language models performance. Even though we’re trying to run evaluations without wasting resources (we use the spare cycles of our cluster, in other words the GPUs which are active but waiting between jobs), this still represents quite a big amount of energy spent for model inference! In the last year, people have become more and more aware that using large […]

Read more

AI Agents Are Here. What Now?

Introduction The sudden, rapid advancement of LLM capabilities – such as writing fluent sentences and achieving increasingly high scores on benchmarks – has led AI developers and businesses alike to look towards what comes next: What game-changing technology is just on the horizon? One technology very recently taking off is “AI agents”, systems that can take actions in the digital world aligned with a deployer’s goals. Most of today’s AI agents    

Read more
1 47 48 49 50 51 1,021