litgpt下载 - litgpt源代码下载

litgpt

其他源码

v0.5.3

下载

⚡ LitGPT

20 多个高性能法学硕士，提供大规模预训练、微调和部署的方法。

 ✅ 从头开始实现 ✅ 无抽象 ✅ 初学者友好   
✅ Flash 注意力 ✅ FSDP ✅ LoRA、QLoRA、适配器
✅ 减少 GPU 内存 (fp4/8/16/32) ✅ 1-1000+ GPU/TPU ✅ 20+ LLM

快速入门 • 模型 • Finetune • 部署 • 所有工作流程 • 功能 • 配方 (YAML) • Lightning AI • 教程

闪电般快速地使用、微调、预训练和部署 LLM ⚡⚡

每个法学硕士都是从头开始实施的，没有抽象和完全控制，这使得它们在企业规模上速度极快、极简且高性能。

✅企业就绪 - Apache 2.0 可供无限制的企业使用。
✅开发人员友好 -轻松调试，没有抽象层和单个文件实现。
✅优化性能 -模型旨在最大限度地提高性能、降低成本并加快训练速度。
✅经过验证的食谱 -在企业规模上测试的高度优化的培训/微调食谱。

快速启动

安装 LitGPT

 pip install 'litgpt[all]'

加载并使用 20 多个 LLM 中的任意一个：

 from litgpt import LLM

llm = LLM . load ( "microsoft/phi-2" )
text = llm . generate ( "Fix the spelling: Every fall, the familly goes to the mountains." )
print ( text )
# Corrected Sentence: Every fall, the family goes to the mountains.

✅ 针对快速推理进行了优化
✅ 量化
✅ 在低内存 GPU 上运行
✅ 没有内部抽象层
✅ 针对生产规模进行优化

高级安装选项

从源安装：

git clone https://github.com/Lightning-AI/litgpt
cd litgpt
pip install -e ' .[all] '

探索完整的 Python API 文档。

从 20 多个法学硕士中选择

每个模型都是从头开始编写的，以最大限度地提高性能并消除抽象层：

模型	型号尺寸	作者	参考
骆驼 3、3.1、3.2	1B、3B、8B、70B、405B	元人工智能	元人工智能 2024
代码骆驼	7B、13B、34B、70B	元人工智能	罗齐埃等人。 2023年
混合教育部	8x7B	米斯特拉尔人工智能	米斯特拉尔人工智能 2023
米斯特拉尔	7B、123B	米斯特拉尔人工智能	米斯特拉尔人工智能 2023
代码吉玛	7B	谷歌	谷歌团队，谷歌 Deepmind
杰玛2号	2B、9B、27B	谷歌	谷歌团队，谷歌 Deepmind
Φ3 & 3.5	3.8B	微软	阿卜丁等人。 2024年
...	...	...	...

查看 20 多个法学硕士的完整列表

所有型号

模型	型号尺寸	作者	参考
代码吉玛	7B	谷歌	谷歌团队，谷歌 Deepmind
代码骆驼	7B、13B、34B、70B	元人工智能	罗齐埃等人。 2023年
鹘	7B、40B、180B	阿联酋 TII	2023年TII
FreeWilly2（稳定白鲸 2）	70B	稳定性人工智能	稳定性人工智能 2023
函数调用 Llama 2	7B	特雷利斯	特雷利斯等人。 2023年
芽	2B、7B	谷歌	谷歌团队，谷歌 Deepmind
杰玛2号	9B、27B	谷歌	谷歌团队，谷歌 Deepmind
骆驼2	7B、13B、70B	元人工智能	图夫龙等人。 2023年
骆驼3.1	8B、70B	元人工智能	元人工智能 2024
骆驼3.2	1B、3B	元人工智能	元人工智能 2024
数学斯特拉尔	7B	米斯特拉尔人工智能	米斯特拉尔人工智能 2024
小羊驼	300M	王肯	MicroLama 仓库
混合教育部	8x7B	米斯特拉尔人工智能	米斯特拉尔人工智能 2023
米斯特拉尔	7B、123B	米斯特拉尔人工智能	米斯特拉尔人工智能 2023
OLMo	1B、7B	艾伦人工智能研究所 (AI2)	格罗内费尔德等人。 2024年
开放骆驼	3B、7B、13B	OpenLM 研究	耿刘2023
Φ1.5&2	1.3B、2.7B	微软研究院	李等人。 2023年
Φ3	3.8B	微软研究院	阿卜丁等人。 2024年
鸭嘴兽	7B、13B、70B	李等人。	李、亨特和鲁伊斯 2023
皮提亚	{14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B	埃鲁瑟人工智能	比德曼等人。 2023年
稳定代码	3B	稳定性人工智能	稳定性人工智能 2023
稳定LM	3B、7B	稳定性人工智能	稳定性人工智能 2023
稳定LM Zephyr	3B	稳定性人工智能	稳定性人工智能 2023
小羊驼	1.1B	张等人。	张等人。 2023年

提示：您可以通过运行litgpt download list命令列出所有可用型号。

工作流程

Finetune • 预训练 • 持续预训练 • 评估 • 部署 • 测试

使用命令行界面运行高级工作流程，例如对您自己的数据进行预训练或微调。

所有工作流程

安装 LitGPT 后，选择要运行的模型和工作流程（微调、预训练、评估、部署等...）：

 # ligpt [action] [model]
litgpt  serve     meta-llama/Llama-3.2-3B-Instruct
litgpt  finetune  meta-llama/Llama-3.2-3B-Instruct
litgpt  pretrain  meta-llama/Llama-3.2-3B-Instruct
litgpt  chat      meta-llama/Llama-3.2-3B-Instruct
litgpt  evaluate  meta-llama/Llama-3.2-3B-Instruct

微调LLM

微调是采用预训练的人工智能模型，并在针对特定任务或应用程序定制的较小的专业数据集上对其进行进一步训练的过程。

 # 0) setup your dataset
curl -L https://huggingface.co/datasets/ksaw008/finance_alpaca/resolve/main/finance_alpaca.json -o my_custom_dataset.json

# 1) Finetune a model (auto downloads weights)
litgpt finetune microsoft/phi-2 
  --data JSON 
  --data.json_path my_custom_dataset.json 
  --data.val_split_fraction 0.1 
  --out_dir out/custom-model

# 2) Test the model
litgpt chat out/custom-model/final

# 3) Deploy the model
litgpt serve out/custom-model/final

阅读完整的微调文档

部署法学硕士

部署预训练或微调的 LLM 以在实际应用程序中使用它。部署，自动设置可供网站或应用程序访问的 Web 服务器。

 # deploy an out-of-the-box LLM
litgpt serve microsoft/phi-2

# deploy your own trained model
litgpt serve path/to/microsoft/phi-2/checkpoint

显示查询服务器的代码：

在单独的终端中测试服务器并将模型 API 集成到您的 AI 产品中：

 # 3) Use the server (in a separate Python session)
import requests , json
response = requests . post (
    "http://127.0.0.1:8000/predict" ,
    json = { "prompt" : "Fix typos in the following sentence: Exampel input" }
)
print ( response . json ()[ "output" ])

阅读完整的部署文档。

评估法学硕士

评估法学硕士以测试其在各种任务上的表现，看看它理解和生成文本的能力如何。简而言之，我们可以评估它在大学化学、编码等方面的表现如何（MMLU、真实的 QA 等）

litgpt evaluate microsoft/phi-2 --tasks ' truthfulqa_mc2,mmlu '

阅读完整的评估文档。

测试法学硕士

通过交互式聊天测试模型的运行情况。使用chat命令进行聊天、提取嵌入等...

以下示例展示了如何使用 Phi-2 LLM：

litgpt chat microsoft/phi-2

>> Prompt: What do Llamas eat ?

完整代码：

 # 1) List all supported LLMs
litgpt download list

# 2) Use a model (auto downloads weights)
litgpt chat microsoft/phi-2

>> Prompt: What do Llamas eat ?

某些模型的下载需要额外的访问令牌。您可以在下载文档中阅读有关此内容的更多信息。

阅读完整的聊天文档。

法学硕士预训练

预训练是在针对特定任务进行微调之前，通过将其暴露于大量数据来训练人工智能模型的过程。

显示代码：

mkdir -p custom_texts
curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt

# 1) Download a tokenizer
litgpt download EleutherAI/pythia-160m 
  --tokenizer_only True

# 2) Pretrain the model
litgpt pretrain EleutherAI/pythia-160m 
  --tokenizer_dir EleutherAI/pythia-160m 
  --data TextFiles 
  --data.train_data_path " custom_texts/ " 
  --train.max_tokens 10_000_000 
  --out_dir out/custom-model

# 3) Test the model
litgpt chat out/custom-model/final

阅读完整的预训练文档

继续预训练法学硕士

持续预训练是另一种微调方法，通过对自定义数据进行训练来专门化已经预训练的模型：

显示代码：

mkdir -p custom_texts
curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt

# 1) Continue pretraining a model (auto downloads weights)
litgpt pretrain EleutherAI/pythia-160m 
  --tokenizer_dir EleutherAI/pythia-160m 
  --initial_checkpoint_dir EleutherAI/pythia-160m 
  --data TextFiles 
  --data.train_data_path " custom_texts/ " 
  --train.max_tokens 10_000_000 
  --out_dir out/custom-model

# 2) Test the model
litgpt chat out/custom-model/final

阅读完整的持续预训练文档

最先进的功能

✅ 最先进的优化：Flash Attention v2、通过完全分片数据并行的多 GPU 支持、可选的 CPU 卸载以及 TPU 和 XLA 支持。

✅ 预训练、微调和部署

✅ 通过低精度设置降低计算要求：FP16、BF16 和 FP16/FP32 混合。

✅ 通过量化降低内存要求：4 位浮点数、8 位整数和双量化。

✅ 配置文件可实现出色的开箱即用性能。

✅ 参数高效的微调：LoRA、QLoRA、Adapter 和 Adapter v2。

✅ 导出为其他流行的模型重量格式。

✅ 许多流行的数据集用于预训练和微调，并支持自定义数据集。

✅ 可读且易于修改的代码来尝试最新的研究想法。

训练食谱

LitGPT 附带经过验证的配方（YAML 配置），可在不同条件下训练模型。我们根据我们发现的在不同训练条件下表现最佳的参数生成了这些配方。

在此浏览所有培训食谱。

例子

litgpt finetune 
  --config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml

✅ 使用配置来定制训练

配置允许您自定义所有细粒度参数的训练，例如：

 # The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir : checkpoints/meta-llama/Llama-2-7b-hf

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir : out/finetune/qlora-llama2-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision : bf16-true

...

✅ 示例：LoRA 微调配置

 # The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir : checkpoints/meta-llama/Llama-2-7b-hf

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir : out/finetune/qlora-llama2-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision : bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize : bnb.nf4

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices : 1

# How many nodes to use. (type: int, default: 1)
num_nodes : 1

# The LoRA rank. (type: int, default: 8)
lora_r : 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha : 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout : 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query : true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key : false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value : true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection : false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp : false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head : false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data :
  class_path : litgpt.data.Alpaca2k
  init_args :
    mask_prompt : false
    val_split_fraction : 0.05
    prompt_style : alpaca
    ignore_index : -100
    seed : 42
    num_workers : 4
    download_dir : data/alpaca2k

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train :

  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval : 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval : 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size : 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size : 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps : 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs : 4

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens :

  # Limits the number of optimizer steps to run (type: Optional[int], default: null)
  max_steps :

  # Limits the length of samples (type: Optional[int], default: null)
  max_seq_length : 512

  # Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
  tie_embeddings :

  #   (type: float, default: 0.0003)
  learning_rate : 0.0002

  #   (type: float, default: 0.02)
  weight_decay : 0.0

  #   (type: float, default: 0.9)
  beta1 : 0.9

  #   (type: float, default: 0.95)
  beta2 : 0.95

  #   (type: Optional[float], default: null)
  max_norm :

  #   (type: float, default: 6e-05)
  min_lr : 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval :

  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval : 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens : 100

  # Number of iterations (type: int, default: 100)
  max_iters : 100

# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
logger_name : csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed : 1337

✅ 覆盖 CLI 中的任何参数：

litgpt finetune 
  --config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml 
  --lora_r 4

项目亮点

LitGPT 为许多伟大的人工智能项目、倡议、挑战，当然还有企业提供支持。请提交拉取请求以考虑某个功能。

SAMBA：用于高效无限上下文语言建模的简单混合状态空间模型

微软研究人员的 Samba 项目建立在 LitGPT 代码库之上，将状态空间模型与滑动窗口注意力相结合，其性能优于纯状态空间模型。

？ NeurIPS 2023 大型语言模型效率挑战：1 LLM + 1 GPU + 1 天

LitGPT 存储库是 NeurIPS 2023 LLM 效率挑战赛的官方入门套件，该竞赛的重点是在单个 GPU 上对现有非指令调整的 LLM 进行 24 小时的微调。

？ TinyLlama：开源小语言模型

LitGPT 为 TinyLlama 项目和 TinyLlama：开源小语言模型研究论文提供支持。

？ MicroLlama：MicroLlama-300M

MicroLlama 是一个 300M Llama 模型，在由 TinyLlama 和 LitGPT 提供支持的 50B 代币上进行预训练。

？使用更少的代币预训练小型 LM

研究论文“用更少的令牌预训练小型基础 LM”利用 LitGPT，通过从较大模型继承一些 Transformer 块并在较大模型使用的一小部分数据上进行训练来开发较小的基础语言模型。它表明，尽管使用的训练数据和资源少得多，但这些较小的模型可以与较大的模型相媲美。

社区

我们欢迎所有个人贡献者，无论他们的经验或硬件水平如何。您的贡献很有价值，我们很高兴看到您在这个协作和支持的环境中能够取得什么成就。

请求功能
提交您的第一份贡献
加入我们的不和谐

教程

开始使用
⚡️ 微调，包括。 LoRA、QLoRA 和适配器
？预训练
模型评估
支持的和自定义的数据集
？量化
？处理内存不足 (OOM) 错误的提示
??‍ 使用云TPU

致谢

此实现在 Lit-LLaMA 和 nanoGPT 上进行了扩展，并由Lightning Fabric ⚡ 提供支持。

@karpathy 的 nanoGPT
@EleutherAI for GPT-NeoX 和评估工具
@TimDettmers 代表比特和字节
@Microsoft LoRA
@tridao for Flash Attention 2

执照

LitGPT 在 Apache 2.0 许可证下发布。

引文

如果您在研究中使用 LitGPT，请引用以下工作：

 @misc { litgpt-2023 ,
  author       = { Lightning AI } ,
  title        = { LitGPT } ,
  howpublished = { url{https://github.com/Lightning-AI/litgpt} } ,
  year         = { 2023 } ,
}