DBRX is a large language model trained by Databricks, and made available under an open license. This repository contains the minimal code and examples to run inference, as well as a collection of resources and links for using DBRX.
A reference model code can be found in this repository at modeling_dbrx.py.
Note: this model code is supplied for references purposes only, please see the Hugging Face repository for the official supported version.
DBRX is a Mixture-of-Experts (MoE) model with 132B total parameters and 36B live parameters. We use 16 experts, of which 4 are active during training or inference. DBRX was pre-trained for 12T tokens of text. DBRX has a context length of 32K tokens.
The following models are open-sourced:
Model | Description |
---|---|
DBRX Base | Pre-trained base model |
DBRX Instruct | Finetuned model for instruction following |
The model was trained using optimized versions of our open source libraries Composer, LLM Foundry, MegaBlocks and Streaming.
For the instruct model, we used the ChatML format. Please see the DBRX Instruct model card for more information on this.
To download the weights and tokenizer, please first visit the DBRX Hugging Face page and accept the license. Note: access to the Base model requires manual approval.
We recommend having at least 320GB of memory to run the model.
Then, run:
pip install -r requirements.txt # Or requirements-gpu.txt to use flash attention on GPU(s)
huggingface-cli login # Add your Hugging Face token in order to access the model
python generate.py # See generate.py to change the prompt and other settings
For more advanced usage, please see LLM Foundry (chat script, batch generation script)
If you have any package installation issues, we recommend using our Docker image: mosaicml/llm-foundry:2.2.1_cu121_flash2-latest
Both TensorRT-LLM and vLLM can be used to run optimized inference with DBRX. We have tested both libraries on NVIDIA A100 and H100 systems. To run inference with 16-bit precision, a minimum of 4 x 80GB multi-GPU system is required.
DBRX support is being added to TensorRT-LLM library: Pending PR
After merging, instructions to build and run DBRX TensorRT engines will be found at: README
Please see the vLLM docs for instructions on how to run DBRX with the vLLM engine.
If you have an Apple laptop with a sufficiently powerful M-series chip, quantized version of DBRX can be run with MLX. See instructions for running DBRX on MLX here.
If you have an Apple M-series chip laptop with atleast 64GB RAM, you can run a quantized version of DBRX using llama.cpp.
./main -ngl 41 -m ./models/ggml-dbrx-instruct-16x12b-iq1_s.gguf -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt
To finetune DBRX with our open source library LLM Foundry, please see the instructions in our training script (found here). We have finetuning support for both:
Note: LoRA support currently cannot finetune the experts, since the experts are fused. Stay tuned for more.
The model cards can be found at:
DBRX is available on the Databricks platform through:
Other providers have recently added support for DBRX:
The same tools used to train high quality MoE models such as DBRX are available for Databricks customers. Please reach out to us at https://www.databricks.com/company/contact if you are interested in pre-training, finetuning, or deploying your own DBRX models!
For issues with model output, or community discussion, please use the Hugging Face community forum (instruct, base)
For issues with LLM Foundry, or any of the underlying training libraries, please open an issue on the relevant GitHub repository.
Our model weights and code are licensed for both researchers and commercial entities. The Databricks Open Source License can be found at LICENSE, and our Acceptable Use Policy can be found here.