Simplified Chinese | English?
PaddleNLP is a large language model (LLM) development kit based on the Paddle deep learning framework, which supports efficient large model training, lossless compression, and high-performance inference on a variety of hardware. PaddleNLP is easy to use and has the ultimate performance , and is committed to helping developers implement efficient industrial-level applications of large models.
2024.08.08 "PaddleNLP 3.0, a powerful tool for industrial-level large language model development, is released" , with the entire process of training, pressure and push, and full coverage of mainstream models. Large models are automatically parallelized, and the entire process of training and pushing hundreds of billions of models is available out of the box. Provides industrial-grade high-performance fine-tuning and alignment solutions, leading compression inference, and multi-hardware adaptation. Covering application scenarios such as industrial-level intelligent assistants, content creation, knowledge Q&A, and key information extraction. Live broadcast time: August 22 (Thursday) 19:00. Registration link: https://www.wjx.top/vm/Y2f7FFY.aspx?udsid=143844
2024.06.27 PaddleNLP v3.0 Beta : Embrace large models and experience a fully upgraded experience. Unify the large model suite to achieve full-process access to domestic computing chips; fully support large model industrial-level application processes such as flying paddle 4D parallel configuration, efficient fine-tuning strategies, efficient alignment algorithms, and high-performance reasoning; self-developed extremely convergent RsLoRA+ algorithm, The automatic expansion and contraction storage mechanism Unified Checkpoint and the universally supported FastFFN and FusedQKV assist large model training and promotion; mainstream models continue to support updates and provide efficient solutions.
2024.04.24 PaddleNLP v2.8 : The self-developed RsLoRA+ algorithm with extreme convergence greatly improves the PEFT training convergence speed and training effect; introduces high-performance generation acceleration to the RLHF PPO algorithm, breaking the generation speed bottleneck in PPO training, and PPO training performance is significantly ahead. Universally supports multiple large model training performance optimization methods such as FastFFN and FusedQKV, making large model training faster and more stable.
It supports large model and natural language understanding model training and inference for NVIDIA GPU, Kunlun XPU, Shengteng NPU, Suiyuan GCU, Haiguang DCU and other hardware. The suite interface supports fast hardware switching, significantly reducing hardware switching R&D costs. Currently supported natural language understanding models: List of multi-hardware natural language understanding models
Supports 4D high-performance training of pure data parallel strategies, data parallel strategies of group parameter slicing, tensor model parallel strategies and pipeline model parallel strategies. Trainer supports distributed strategy configuration to reduce usage costs caused by complex distributed combinations; Unified Checkpoint's large model storage format supports dynamic expansion and contraction training in model parameter distribution, reducing migration costs caused by hardware switching.
The fine-tuning algorithm deeply combines the zero-fill data stream and the FlashMask high-performance operator to reduce the filling and calculation of invalid training data and greatly improve the throughput of fine-tuning training.
The high-performance reasoning module of the large model suite has built-in dynamic insertion and full-link operator fusion strategies, which greatly speeds up parallel reasoning. The underlying implementation details are encapsulated to achieve high-performance parallel reasoning capabilities out of the box.
Model parameters have supported LLaMA series, Baichuan series, Bloom series, ChatGLM series, Gemma series, Mistral series, OPT series and Qwen series. The detailed list [LLM] model parameter support list is as follows:
Model series | Model name |
---|---|
LLAMA | facebook/llama-7b, facebook/llama-13b, facebook/llama-30b, facebook/llama-65b |
LLama2 | meta-llama/Llama-2-7b, meta-llama/Llama-2-7b-chat, meta-llama/Llama-2-13b, meta-llama/Llama-2-13b-chat, meta-llama/Llama- 2-70b, meta-llama/Llama-2-70b-chat |
LLama3 | meta-llama/Meta-Llama-3-8B, meta-llama/Meta-Llama-3-8B-Instruct, meta-llama/Meta-Llama-3-70B, meta-llama/Meta-Llama-3-70B- Instruct |
LLama3.1 | meta-llama/Meta-Llama-3.1-8B, meta-llama/Meta-Llama-3.1-8B-Instruct, meta-llama/Meta-Llama-3.1-70B, meta-llama/Meta-Llama-3.1-70B- Instruct, meta-llama/Meta-Llama-3.1-405B, meta-llama/Meta-Llama-3.1-405B-Instruct, meta-llama/Llama-Guard-3-8B |
LLama3.2 | meta-llama/Llama-3.2-1B, meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.2-3B, meta-llama/Llama-3.2-3B-Instruct, meta-llama/Llama- Guard-3-1B |
Baichuan | baichuan-inc/Baichuan-7B, baichuan-inc/Baichuan-13B-Base, baichuan-inc/Baichuan-13B-Chat |
Baichuan2 | baichuan-inc/Baichuan2-7B-Base, baichuan-inc/Baichuan2-7B-Chat, baichuan-inc/Baichuan2-13B-Base, baichuan-inc/Baichuan2-13B-Chat |
Bloom | bigscience/bloom-560m, bigscience/bloom-560m-bf16, bigscience/bloom-1b1, bigscience/bloom-3b, bigscience/bloom-7b1, bigscience/bloomz-560m, bigscience/bloomz-1b1, bigscience/bloomz-3b, bigscience/bloomz-7b1-mt, bigscience/bloomz-7b1-p3, bigscience/bloomz-7b1, bellegroup/belle-7b-2m |
ChatGLM | THUDM/chatglm-6b, THUDM/chatglm-6b-v1.1 |
ChatGLM2 | THUDM/chatglm2-6b |
ChatGLM3 | THUDM/chatglm3-6b |
Gemma | google/gemma-7b, google/gemma-7b-it, google/gemma-2b, google/gemma-2b-it |
Mistral | mistralai/Mistral-7B-Instruct-v0.3, mistralai/Mistral-7B-v0.1 |
Mixtral | mistralai/Mixtral-8x7B-Instruct-v0.1 |
OPT | facebook/opt-125m, facebook/opt-350m, facebook/opt-1.3b, facebook/opt-2.7b, facebook/opt-6.7b, facebook/opt-13b, facebook/opt-30b, facebook/opt-66b , facebook/opt-iml-1.3b, opt-iml-max-1.3b |
Qwen | qwen/qwen-7b, qwen/qwen-7b-chat, qwen/qwen-14b, qwen/qwen-14b-chat, qwen/qwen-72b, qwen/qwen-72b-chat, |
Qwen1.5 | Qwen/Qwen1.5-0.5B, Qwen/Qwen1.5-0.5B-Chat, Qwen/Qwen1.5-1.8B, Qwen/Qwen1.5-1.8B-Chat, Qwen/Qwen1.5-4B, Qwen/ Qwen1.5-4B-Chat, Qwen/Qwen1.5-7B, Qwen/Qwen1.5-7B-Chat, Qwen/Qwen1.5-14B, Qwen/Qwen1.5-14B-Chat, Qwen/Qwen1.5- 32B, Qwen/Qwen1.5-32B-Chat, Qwen/Qwen1.5-72B, Qwen/Qwen1.5-72B-Chat, Qwen/Qwen1.5-110B, Qwen/Qwen1.5-110B-Chat, Qwen/ Qwen1.5-MoE-A2.7B, Qwen/Qwen1.5-MoE-A2.7B-Chat |
Qwen2 | Qwen/Qwen2-0.5B, Qwen/Qwen2-0.5B-Instruct, Qwen/Qwen2-1.5B, Qwen/Qwen2-1.5B-Instruct, Qwen/Qwen2-7B, Qwen/Qwen2-7B-Instruct, Qwen/Qwen2- 72B, Qwen/Qwen2-72B-Instruct, Qwen/Qwen2-57B-A14B, Qwen/Qwen2-57B-A14B-Instruct |
Qwen2-Math | Qwen/Qwen2-Math-1.5B, Qwen/Qwen2-Math-1.5B-Instruct, Qwen/Qwen2-Math-7B, Qwen/Qwen2-Math-7B-Instruct, Qwen/Qwen2-Math-72B, Qwen/Qwen2- Math-72B-Instruct, Qwen/Qwen2-Math-RM-72B |
Qwen2.5 | Qwen/Qwen2.5-0.5B, Qwen/Qwen2.5-0.5B-Instruct, Qwen/Qwen2.5-1.5B, Qwen/Qwen2.5-1.5B-Instruct, Qwen/Qwen2.5-3B, Qwen/ Qwen2.5-3B-Instruct, Qwen/Qwen2.5-7B, Qwen/Qwen2.5-7B-Instruct, Qwen/Qwen2.5-14B, Qwen/Qwen2.5-14B-Instruct, Qwen/Qwen2.5- 32B, Qwen/Qwen2.5-32B-Instruct, Qwen/Qwen2.5-72B, Qwen/Qwen2.5-72B-Instruct |
Qwen2.5-Math | Qwen/Qwen2.5-Math-1.5B, Qwen/Qwen2.5-Math-1.5B-Instruct, Qwen/Qwen2.5-Math-7B, Qwen/Qwen2.5-Math-7B-Instruct, Qwen/Qwen2. 5-Math-72B, Qwen/Qwen2.5-Math-72B-Instruct, Qwen/Qwen2.5-Math-RM-72B |
Qwen2.5-Coder | Qwen/Qwen2.5-Coder-1.5B, Qwen/Qwen2.5-Coder-1.5B-Instruct, Qwen/Qwen2.5-Coder-7B, Qwen/Qwen2.5-Coder-7B-Instruct |
Yuan2 | IEITYuan/Yuan2-2B, IEITYuan/Yuan2-51B, IEITYuan/Yuan2-102B |
4D parallelism and operator optimization have supported LLaMA series, Baichuan series, Bloom series, ChatGLM series, Gemma series, Mistral series, OPT series and Qwen series. The [LLM] model 4D parallelism and operator support list is as follows:
Model name/parallel capability support | data parallelism | Tensor model parallelism | Parameter sharding parallelism | Pipeline parallelism | |||
---|---|---|---|---|---|---|---|
Basic abilities | sequence parallelism | stage1 | stage2 | stage3 | |||
Llama | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Qwen | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Qwen1.5 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Qwen2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Mixtral(moe) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ? |
Mistral | ✅ | ✅ | ? | ✅ | ✅ | ✅ | ? |
Baichuan | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Baichuan2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
ChatGLM | ✅ | ✅ | ? | ✅ | ✅ | ✅ | ? |
ChatGLM2 | ✅ | ? | ? | ✅ | ✅ | ✅ | ? |
ChatGLM3 | ✅ | ? | ? | ✅ | ✅ | ✅ | ? |
Bloom | ✅ | ✅ | ? | ✅ | ✅ | ✅ | ? |
GPT-2/GPT-3 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
OPT | ✅ | ✅ | ? | ✅ | ✅ | ✅ | ? |
Gemma | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Yuan2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ? |
Large model pre-training, fine-tuning (including SFT, PEFT technology), alignment, and quantification have supported LLaMA series, Baichuan series, Bloom series, ChatGLM series, Mistral series, OPT series and Qwen series, [LLM] model pre-training, fine-tuning , alignment, and quantization support list are as follows:
Model | Pretrain | SFT | LoRA | FlashMask | Prefix Tuning | DPO/SimPO/ORPO | RLHF | Quantization |
---|---|---|---|---|---|---|---|---|
Llama | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Qwen | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ? | ? |
Mixtral | ✅ | ✅ | ✅ | ? | ? | ✅ | ? | ? |
Mistral | ✅ | ✅ | ✅ | ? | ✅ | ✅ | ? | ? |
Baichuan/Baichuan2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ? | ✅ |
ChatGLM-6B | ✅ | ✅ | ✅ | ? | ✅ | ? | ? | ✅ |
ChatGLM2/ChatGLM3 | ✅ | ✅ | ✅ | ? | ✅ | ✅ | ? | ✅ |
Bloom | ✅ | ✅ | ✅ | ? | ✅ | ? | ? | ✅ |
GPT-3 | ✅ | ✅ | ? | ? | ? | ? | ? | ? |
OPT | ✅ | ✅ | ✅ | ? | ? | ? | ? | ? |
Gemma | ✅ | ✅ | ✅ | ? | ? | ✅ | ? | ? |
Yuan | ✅ | ✅ | ✅ | ? | ? | ✅ | ? | ? |
Large model reasoning already supports LLaMA series, Qwen series, Mistral series, ChatGLM series, Bloom series and Baichuan series, supports Weight Only INT8 and INT4 reasoning, and supports WAC (weight, activation, Cache KV) for INT8 and FP8 quantified reasoning, [ LLM] Model inference support list is as follows:
Model name/quantization type support | FP16/BF16 | WINT8 | WINT4 | INT8-A8W8 | FP8-A8W8 | INT8-A8W8C8 |
---|---|---|---|---|---|---|
LLAMA | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Qwen | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Qwen-Moe | ✅ | ✅ | ✅ | ? | ? | ? |
Mixtral | ✅ | ✅ | ✅ | ? | ? | ? |
ChatGLM | ✅ | ✅ | ✅ | ? | ? | ? |
Bloom | ✅ | ✅ | ✅ | ? | ? | ? |
Bai Chuan | ✅ | ✅ | ✅ | ✅ | ✅ | ? |
python >= 3.8
paddlepaddle >= 3.0.0b0
If you have not installed PaddlePaddle, please refer to the PaddlePaddle official website to install it.
pip install --upgrade paddlenlp==3.0.0b2
Or you can install the latest develop branch code through the following command:
pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html
For more detailed tutorials on PaddlePaddle and PaddleNLP installation, please see Installation.
PaddleNLP provides a convenient and easy-to-use Auto API that can quickly load models and Tokenizers. Here is an example of text generation using the Qwen/Qwen2-0.5B
model:
>>> from paddlenlp.transformers import AutoTokenizer, AutoModelForCausalLM>>> tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B")>>> model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B", dtype=" float16")>>> input_features = tokenizer("Hello! Please introduce yourself.", return_tensors="pd")>>> outputs = model.generate(**input_features, max_length=128)>>> print(tokenizer .batch_decode(outputs[0], skip_special_tokens=True)) ['I am an AI language model. I can answer various questions, including but not limited to: weather, news, history, culture, science, education, entertainment, etc. Is there anything you need to know? ']
git clone https://github.com/PaddlePaddle/PaddleNLP.git && cd PaddleNLP # If you have cloned or downloaded PaddleNLP, you can skip mkdir -p llm/data && cd llm/data wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k.bin wget https://bj.bcebos.com/padaddlenlp/models/transformers/llama/data/llama_openwebtext_100k.idxcd .. # change folder to PaddleNLP/llmpython -u -m paddle.distributed.launch --gpus "0,1, 2,3,4,5,6,7" run_pretrain.py ./config/llama/pretrain_argument.json
git clone https://github.com/PaddlePaddle/PaddleNLP.git && cd PaddleNLP # If you have cloned or downloaded PaddleNLP, you can skip mkdir -p llm/data && cd llm/data wget https://bj.bcebos.com/padaddlenlp/datasets/examples/AdvertiseGen.tar.gz && tar -zxvf AdvertiseGen.tar.gzcd .. # change folder to PaddleNLP/llmpython -u -m paddle.distributed.launch - -gpus "0,1,2,3,4,5,6,7" run_finetune.py ./config/llama/sft_argument.json
For more complete steps of the large model process, please refer to the introduction of the Flying Paddle Large Model Kit.
For more PaddleNLP content, please refer to:
Selected model library, including end-to-end full process use of high-quality pre-trained models.
Multiple scenario examples to learn how to use PaddleNLP to solve various NLP technical problems, including basic technology, system applications and expanded applications.
Interactive tutorial to quickly learn PaddleNLP on the free computing platform AI Studio.
Scan the QR code on WeChat and fill out the questionnaire to join the communication group and have in-depth discussions with many community developers and official teams.
If PaddleNLP is helpful to your research, please feel free to cite it
@misc{=paddlenlp,title={PaddleNLP: An Easy-to-use and High Performance NLP Library},author={PaddleNLP Contributors},howpublished = {url{https://github.com/PaddlePaddle/PaddleNLP}}, year={2021}}
We have learned from Hugging Face's Transformers?'s excellent design on the use of pre-trained models, and would like to express our gratitude to the authors of Hugging Face and their open source community.
PaddleNLP follows the Apache-2.0 open source license.