Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA
LLMs are known to be large, and running or training them in consumer hardware is a huge challenge for users and accessibility.
Our LLM.int8 blogpost showed how the techniques in the LLM.int8 paper were integrated in transformers using the
As we strive to make models even more accessible to anyone, we decided to collaborate with bitsandbytes again to allow users to run models in 4-bit precision. This includes a large majority of HF models, in any modality (text, vision, multi-modal, etc.). Users can also train adapters on top of 4bit models leveraging tools from
Our LLM.int8 blogpost showed how the techniques in the LLM.int8 paper were integrated in transformers using the
bitsandbytes library.As we strive to make models even more accessible to anyone, we decided to collaborate with bitsandbytes again to allow users to run models in 4-bit precision. This includes a large majority of HF models, in any modality (text, vision, multi-modal, etc.). Users can also train adapters on top of 4bit models leveraging tools from