mergoo
is a library for easily merging multiple LLM experts, and efficiently train the merged LLM. With mergoo
, you can efficiently integrate the knowledge of different generic or domain-based LLM experts.
If you like the project, consider leaving a ️
Install by pip:
pip install mergoo
Install latest unstable version on Github:
pip install git+https://github.com/Leeroo-AI/mergoo
Install it from the source:
git clone https://github.com/Leeroo-AI/mergoo
cd mergoo
pip install -e .
Specify the config for merging:
model_type
: type of base model. choices: mistral
, llama
, or bert
.num_experts_per_token
: Number of experts for each token of MoE.experts
: config for experts to merge. includes expert_name
and Hugging Face ?model_id
.router_layers
: layers chosen for applying Mixture-of-Experts.This is a sample config when merging fully fine-tuned LLM experts.
config = {
"model_type": "mistral",
"num_experts_per_tok": 2,
"experts": [
{"expert_name": "base_expert", "model_id": "mistralai/Mistral-7B-v0.1"},
{"expert_name": "expert_1", "model_id": "meta-math/MetaMath-Mistral-7B"},
{"expert_name": "expert_2", "model_id": "ajibawa-2023/Code-Mistral-7B"}
],
"router_layers": ["gate_proj", "up_proj", "down_proj"]
}
For the above example, we merged math and code mistral-based experts. Please refer to this notebook for further details!
This is a sample config when merging LoRA fine-tuned LLM experts. mergoo
builds a routing layer on top of LoRAs, resulting in a mixture of adapters.
config = {
"model_type": "mistral",
"num_experts_per_tok": 2,
"base_model": "mistralai/Mistral-7B-v0.1",
"experts": [
{"expert_name": "adapter_1", "model_id": "predibase/customer_support"},
{"expert_name": "adapter_2", "model_id": "predibase/customer_support_accounts"},
{"expert_name": "adapter_3", "model_id": "predibase/customer_support_orders"},
{"expert_name": "adapter_4", "model_id": "predibase/customer_support_payments"}
],
}
The expert_name
starts with adapter
instead of expert
. Please refer to this notebook for further details!
Following the config setup, mergoo
creates the merged LLM as:
import torch
from mergoo.compose_experts import ComposeExperts
# create checkpoint
model_id = "data/mistral_lora_moe"
expertmerger = ComposeExperts(config, torch_dtype=torch.float16)
expertmerger.compose()
expertmerger.save_checkpoint(model_id)
Now, you can easily train the merged LLM with Hugging Face Trainer:
from transformers import Trainer
from mergoo.models.modeling_mistral import MistralForCausalLM
model = MistralForCausalLM.from_pretrained("data/mistral_lora_moe")
# NOTE: 'gate' / router layers are untrained hence weight loading warning would appeare for them
trainer = Trainer( ... )
trainer.train()
After finishing the Quick Start guide, you can explore the tutorials below to further familiarize yourself with mergoo
.
Notebook | Details |
---|---|
MoE with fully fine-tuned LLM experts | Build a unifined Mixture-of-Experts model with fully fine-tuned experts. Inspired by BTX Research (Meta AI). |
MoE with LoRA fine-tuned experts | Build a Mixture of Adaptes expert. Inspired by xlora | Mixture-of-LoRAs | MoLE | PHATGOOSE | MoELoRA |
Hugging Face Blog | Deep dive into research details behind the merging methods of mergoo library |
LLaMa3-based Experts | Build your own MoE-style LLM experts by integrating LLaMa3-based domain experts |
Phi3-based Experts | Create MoE-style LLM architecture by merging Phi3-based fine-tuned models |
As an open-source library in a fast evolving domain, we welcome contributions, whether it is introducing new features, enhancing infrastructure, or improving documentation.
Here is mergoo
roadmap:
Feel free to suggest new features and/or contribute to mergoo
roadmap!
We love to here your feedback, please join Leeroo community:
Have a question not listed here? Open a GitHub Issue or send us an email!