mixtral offloading
1.0.0
This project implements efficient inference of Mixtral-8x7B models.
In summary, we achieve efficient inference of Mixtral-8x7B models through a combination of techniques:
For more detailed information about our methods and results, please refer to our tech-report.
To try this demo, please use the demo notebook: ./notebooks/demo.ipynb or
For now, there is no command-line script available for running the model locally. However, you can create one using the demo notebook as a reference. That being said, contributions are welcome!
Some techniques described in our technical report are not yet available in this repo. However, we are actively working on adding support for them in the near future.
Some of the upcoming features are: