PaddleNLP

Other source code

Download

Simplified Chinese | English?

Features | Model Support | Installation | Quick Start | Community Communication

PaddleNLP is a large language model (LLM) development kit based on the Paddle deep learning framework, which supports efficient large model training, lossless compression, and high-performance inference on a variety of hardware. PaddleNLP is easy to use and has the ultimate performance , and is committed to helping developers implement efficient industrial-level applications of large models.

News

2024.08.08 "PaddleNLP 3.0, a powerful tool for industrial-level large language model development, is released" , with the entire process of training, pressure and push, and full coverage of mainstream models. Large models are automatically parallelized, and the entire process of training and pushing hundreds of billions of models is available out of the box. Provides industrial-grade high-performance fine-tuning and alignment solutions, leading compression inference, and multi-hardware adaptation. Covering application scenarios such as industrial-level intelligent assistants, content creation, knowledge Q&A, and key information extraction. Live broadcast time: August 22 (Thursday) 19:00. Registration link: https://www.wjx.top/vm/Y2f7FFY.aspx?udsid=143844
2024.06.27 PaddleNLP v3.0 Beta : Embrace large models and experience a fully upgraded experience. Unify the large model suite to achieve full-process access to domestic computing chips; fully support large model industrial-level application processes such as flying paddle 4D parallel configuration, efficient fine-tuning strategies, efficient alignment algorithms, and high-performance reasoning; self-developed extremely convergent RsLoRA+ algorithm, The automatic expansion and contraction storage mechanism Unified Checkpoint and the universally supported FastFFN and FusedQKV assist large model training and promotion; mainstream models continue to support updates and provide efficient solutions.
2024.04.24 PaddleNLP v2.8 : The self-developed RsLoRA+ algorithm with extreme convergence greatly improves the PEFT training convergence speed and training effect; introduces high-performance generation acceleration to the RLHF PPO algorithm, breaking the generation speed bottleneck in PPO training, and PPO training performance is significantly ahead. Universally supports multiple large model training performance optimization methods such as FastFFN and FusedQKV, making large model training faster and more stable.

characteristic

Multi-hardware training and promotion in one

It supports large model and natural language understanding model training and inference for NVIDIA GPU, Kunlun XPU, Shengteng NPU, Suiyuan GCU, Haiguang DCU and other hardware. The suite interface supports fast hardware switching, significantly reducing hardware switching R&D costs. Currently supported natural language understanding models: List of multi-hardware natural language understanding models

Efficient and easy-to-use pre-training

Supports 4D high-performance training of pure data parallel strategies, data parallel strategies of group parameter slicing, tensor model parallel strategies and pipeline model parallel strategies. Trainer supports distributed strategy configuration to reduce usage costs caused by complex distributed combinations; Unified Checkpoint's large model storage format supports dynamic expansion and contraction training in model parameter distribution, reducing migration costs caused by hardware switching.

Efficient and fine tuning

The fine-tuning algorithm deeply combines the zero-fill data stream and the FlashMask high-performance operator to reduce the filling and calculation of invalid training data and greatly improve the throughput of fine-tuning training.

Lossless compression and high-performance inference

The high-performance reasoning module of the large model suite has built-in dynamic insertion and full-link operator fusion strategies, which greatly speeds up parallel reasoning. The underlying implementation details are encapsulated to achieve high-performance parallel reasoning capabilities out of the box.

Model support

Model parameters have supported LLaMA series, Baichuan series, Bloom series, ChatGLM series, Gemma series, Mistral series, OPT series and Qwen series. The detailed list [LLM] model parameter support list is as follows:

Model series	Model name
LLAMA	facebook/llama-7b, facebook/llama-13b, facebook/llama-30b, facebook/llama-65b
LLama2	meta-llama/Llama-2-7b, meta-llama/Llama-2-7b-chat, meta-llama/Llama-2-13b, meta-llama/Llama-2-13b-chat, meta-llama/Llama- 2-70b, meta-llama/Llama-2-70b-chat
LLama3	meta-llama/Meta-Llama-3-8B, meta-llama/Meta-Llama-3-8B-Instruct, meta-llama/Meta-Llama-3-70B, meta-llama/Meta-Llama-3-70B- Instruct
LLama3.1	meta-llama/Meta-Llama-3.1-8B, meta-llama/Meta-Llama-3.1-8B-Instruct, meta-llama/Meta-Llama-3.1-70B, meta-llama/Meta-Llama-3.1-70B- Instruct, meta-llama/Meta-Llama-3.1-405B, meta-llama/Meta-Llama-3.1-405B-Instruct, meta-llama/Llama-Guard-3-8B
LLama3.2	meta-llama/Llama-3.2-1B, meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.2-3B, meta-llama/Llama-3.2-3B-Instruct, meta-llama/Llama- Guard-3-1B
Baichuan	baichuan-inc/Baichuan-7B, baichuan-inc/Baichuan-13B-Base, baichuan-inc/Baichuan-13B-Chat
Baichuan2	baichuan-inc/Baichuan2-7B-Base, baichuan-inc/Baichuan2-7B-Chat, baichuan-inc/Baichuan2-13B-Base, baichuan-inc/Baichuan2-13B-Chat
Bloom	bigscience/bloom-560m, bigscience/bloom-560m-bf16, bigscience/bloom-1b1, bigscience/bloom-3b, bigscience/bloom-7b1, bigscience/bloomz-560m, bigscience/bloomz-1b1, bigscience/bloomz-3b, bigscience/bloomz-7b1-mt, bigscience/bloomz-7b1-p3, bigscience/bloomz-7b1, bellegroup/belle-7b-2m
ChatGLM	THUDM/chatglm-6b, THUDM/chatglm-6b-v1.1
ChatGLM2	THUDM/chatglm2-6b
ChatGLM3	THUDM/chatglm3-6b
Gemma	google/gemma-7b, google/gemma-7b-it, google/gemma-2b, google/gemma-2b-it
Mistral	mistralai/Mistral-7B-Instruct-v0.3, mistralai/Mistral-7B-v0.1
Mixtral	mistralai/Mixtral-8x7B-Instruct-v0.1
OPT	facebook/opt-125m, facebook/opt-350m, facebook/opt-1.3b, facebook/opt-2.7b, facebook/opt-6.7b, facebook/opt-13b, facebook/opt-30b, facebook/opt-66b , facebook/opt-iml-1.3b, opt-iml-max-1.3b
Qwen	qwen/qwen-7b, qwen/qwen-7b-chat, qwen/qwen-14b, qwen/qwen-14b-chat, qwen/qwen-72b, qwen/qwen-72b-chat,
Qwen1.5	Qwen/Qwen1.5-0.5B, Qwen/Qwen1.5-0.5B-Chat, Qwen/Qwen1.5-1.8B, Qwen/Qwen1.5-1.8B-Chat, Qwen/Qwen1.5-4B, Qwen/ Qwen1.5-4B-Chat, Qwen/Qwen1.5-7B, Qwen/Qwen1.5-7B-Chat, Qwen/Qwen1.5-14B, Qwen/Qwen1.5-14B-Chat, Qwen/Qwen1.5- 32B, Qwen/Qwen1.5-32B-Chat, Qwen/Qwen1.5-72B, Qwen/Qwen1.5-72B-Chat, Qwen/Qwen1.5-110B, Qwen/Qwen1.5-110B-Chat, Qwen/ Qwen1.5-MoE-A2.7B, Qwen/Qwen1.5-MoE-A2.7B-Chat
Qwen2	Qwen/Qwen2-0.5B, Qwen/Qwen2-0.5B-Instruct, Qwen/Qwen2-1.5B, Qwen/Qwen2-1.5B-Instruct, Qwen/Qwen2-7B, Qwen/Qwen2-7B-Instruct, Qwen/Qwen2- 72B, Qwen/Qwen2-72B-Instruct, Qwen/Qwen2-57B-A14B, Qwen/Qwen2-57B-A14B-Instruct
Qwen2-Math	Qwen/Qwen2-Math-1.5B, Qwen/Qwen2-Math-1.5B-Instruct, Qwen/Qwen2-Math-7B, Qwen/Qwen2-Math-7B-Instruct, Qwen/Qwen2-Math-72B, Qwen/Qwen2- Math-72B-Instruct, Qwen/Qwen2-Math-RM-72B
Qwen2.5	Qwen/Qwen2.5-0.5B, Qwen/Qwen2.5-0.5B-Instruct, Qwen/Qwen2.5-1.5B, Qwen/Qwen2.5-1.5B-Instruct, Qwen/Qwen2.5-3B, Qwen/ Qwen2.5-3B-Instruct, Qwen/Qwen2.5-7B, Qwen/Qwen2.5-7B-Instruct, Qwen/Qwen2.5-14B, Qwen/Qwen2.5-14B-Instruct, Qwen/Qwen2.5- 32B, Qwen/Qwen2.5-32B-Instruct, Qwen/Qwen2.5-72B, Qwen/Qwen2.5-72B-Instruct
Qwen2.5-Math	Qwen/Qwen2.5-Math-1.5B, Qwen/Qwen2.5-Math-1.5B-Instruct, Qwen/Qwen2.5-Math-7B, Qwen/Qwen2.5-Math-7B-Instruct, Qwen/Qwen2. 5-Math-72B, Qwen/Qwen2.5-Math-72B-Instruct, Qwen/Qwen2.5-Math-RM-72B
Qwen2.5-Coder	Qwen/Qwen2.5-Coder-1.5B, Qwen/Qwen2.5-Coder-1.5B-Instruct, Qwen/Qwen2.5-Coder-7B, Qwen/Qwen2.5-Coder-7B-Instruct
Yuan2	IEITYuan/Yuan2-2B, IEITYuan/Yuan2-51B, IEITYuan/Yuan2-102B

4D parallelism and operator optimization have supported LLaMA series, Baichuan series, Bloom series, ChatGLM series, Gemma series, Mistral series, OPT series and Qwen series. The [LLM] model 4D parallelism and operator support list is as follows:

Model name/parallel capability support	data parallelism	Tensor model parallelism		Parameter sharding parallelism			Pipeline parallelism
		Basic abilities	sequence parallelism	stage1	stage2	stage3
Llama	✅	✅	✅	✅	✅	✅	✅
Qwen	✅	✅	✅	✅	✅	✅	✅
Qwen1.5	✅	✅	✅	✅	✅	✅	✅
Qwen2	✅	✅	✅	✅	✅	✅	✅
Mixtral(moe)	✅	✅	✅	✅	✅	✅	?
Mistral	✅	✅	?	✅	✅	✅	?
Baichuan	✅	✅	✅	✅	✅	✅	✅
Baichuan2	✅	✅	✅	✅	✅	✅	✅
ChatGLM	✅	✅	?	✅	✅	✅	?
ChatGLM2	✅	?	?	✅	✅	✅	?
ChatGLM3	✅	?	?	✅	✅	✅	?
Bloom	✅	✅	?	✅	✅	✅	?
GPT-2/GPT-3	✅	✅	✅	✅	✅	✅	✅
OPT	✅	✅	?	✅	✅	✅	?
Gemma	✅	✅	✅	✅	✅	✅	✅
Yuan2	✅	✅	✅	✅	✅	✅	?

Large model pre-training, fine-tuning (including SFT, PEFT technology), alignment, and quantification have supported LLaMA series, Baichuan series, Bloom series, ChatGLM series, Mistral series, OPT series and Qwen series, [LLM] model pre-training, fine-tuning , alignment, and quantization support list are as follows:

Model	Pretrain	SFT	LoRA	FlashMask	Prefix Tuning	DPO/SimPO/ORPO	RLHF	Quantization
Llama	✅	✅	✅	✅	✅	✅	✅	✅
Qwen	✅	✅	✅	✅	✅	✅	?	?
Mixtral	✅	✅	✅	?	?	✅	?	?
Mistral	✅	✅	✅	?	✅	✅	?	?
Baichuan/Baichuan2	✅	✅	✅	✅	✅	✅	?	✅
ChatGLM-6B	✅	✅	✅	?	✅	?	?	✅
ChatGLM2/ChatGLM3	✅	✅	✅	?	✅	✅	?	✅
Bloom	✅	✅	✅	?	✅	?	?	✅
GPT-3	✅	✅	?	?	?	?	?	?
OPT	✅	✅	✅	?	?	?	?	?
Gemma	✅	✅	✅	?	?	✅	?	?
Yuan	✅	✅	✅	?	?	✅	?	?

Large model reasoning already supports LLaMA series, Qwen series, Mistral series, ChatGLM series, Bloom series and Baichuan series, supports Weight Only INT8 and INT4 reasoning, and supports WAC (weight, activation, Cache KV) for INT8 and FP8 quantified reasoning, [ LLM] Model inference support list is as follows:

Model name/quantization type support	FP16/BF16	WINT8	WINT4	INT8-A8W8	FP8-A8W8	INT8-A8W8C8
LLAMA	✅	✅	✅	✅	✅	✅
Qwen	✅	✅	✅	✅	✅	✅
Qwen-Moe	✅	✅	✅	?	?	?
Mixtral	✅	✅	✅	?	?	?
ChatGLM	✅	✅	✅	?	?	?
Bloom	✅	✅	✅	?	?	?
Bai Chuan	✅	✅	✅	✅	✅	?

Install

environment dependence

python >= 3.8
paddlepaddle >= 3.0.0b0

If you have not installed PaddlePaddle, please refer to the PaddlePaddle official website to install it.

pip install

 pip install --upgrade paddlenlp==3.0.0b2

Or you can install the latest develop branch code through the following command:

 pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html

For more detailed tutorials on PaddlePaddle and PaddleNLP installation, please see Installation.

quick start

Large model text generation

PaddleNLP provides a convenient and easy-to-use Auto API that can quickly load models and Tokenizers. Here is an example of text generation using the Qwen/Qwen2-0.5B model:

 >>> from paddlenlp.transformers import AutoTokenizer, AutoModelForCausalLM>>> tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B")>>> model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B", dtype=" float16")>>> input_features = tokenizer("Hello! Please introduce yourself.", return_tensors="pd")>>> outputs = model.generate(**input_features, max_length=128)>>> print(tokenizer .batch_decode(outputs[0], skip_special_tokens=True))
['I am an AI language model. I can answer various questions, including but not limited to: weather, news, history, culture, science, education, entertainment, etc. Is there anything you need to know? ']

Large model pre-training

 git clone https://github.com/PaddlePaddle/PaddleNLP.git && cd PaddleNLP # If you have cloned or downloaded PaddleNLP, you can skip mkdir -p llm/data && cd llm/data
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k.bin
wget https://bj.bcebos.com/padaddlenlp/models/transformers/llama/data/llama_openwebtext_100k.idxcd .. # change folder to PaddleNLP/llmpython -u -m paddle.distributed.launch --gpus "0,1, 2,3,4,5,6,7" run_pretrain.py ./config/llama/pretrain_argument.json

Large model SFT fine tuning

 git clone https://github.com/PaddlePaddle/PaddleNLP.git && cd PaddleNLP # If you have cloned or downloaded PaddleNLP, you can skip mkdir -p llm/data && cd llm/data
wget https://bj.bcebos.com/padaddlenlp/datasets/examples/AdvertiseGen.tar.gz && tar -zxvf AdvertiseGen.tar.gzcd .. # change folder to PaddleNLP/llmpython -u -m paddle.distributed.launch - -gpus "0,1,2,3,4,5,6,7" run_finetune.py ./config/llama/sft_argument.json

For more complete steps of the large model process, please refer to the introduction of the Flying Paddle Large Model Kit.

For more PaddleNLP content, please refer to:

Selected model library, including end-to-end full process use of high-quality pre-trained models.
Multiple scenario examples to learn how to use PaddleNLP to solve various NLP technical problems, including basic technology, system applications and expanded applications.
Interactive tutorial to quickly learn PaddleNLP on the free computing platform AI Studio.

Community communication

Scan the QR code on WeChat and fill out the questionnaire to join the communication group and have in-depth discussions with many community developers and official teams.

Citation

If PaddleNLP is helpful to your research, please feel free to cite it

 @misc{=paddlenlp,title={PaddleNLP: An Easy-to-use and High Performance NLP Library},author={PaddleNLP Contributors},howpublished = {url{https://github.com/PaddlePaddle/PaddleNLP}}, year={2021}}

Acknowledge

We have learned from Hugging Face's Transformers?'s excellent design on the use of pre-trained models, and would like to express our gratitude to the authors of Hugging Face and their open source community.