zero_nlp
1.0.0
目标
: Make an out-of-the-box training framework for NLP in the Chinese field based on pytorch
and transformers
, and provide a full set of solutions for training and fine-tuning models (including large models, text steering vectors, text generation, multi-modal and other models);数据
:百GB
of data;流程
: Each project has complete model training steps, such as: data cleaning, data processing, model construction, model training, model deployment, and model illustration;模型
: currently supports multi-modal large models such as gpt2
, clip
, gpt-neox
, dolly
, llama
, chatglm-6b
, VisionEncoderDecoderModel
, etc.;多卡串联
: Currently, the size of most large models is much larger than the video memory of a single consumer-grade graphics card. Multiple graphics cards need to be connected in series to train and deploy large models. Therefore, some model structures were modified to realize the multi-card series function训练时
and推理时
.模型工具
: Added词表裁切
and词表扩充
tutorial for large models model_modify Chinese name | folder name | data | Data cleaning | large model | Model deployment | Illustration |
---|---|---|---|---|---|---|
Chinese text classification | chinese_classifier | ✅ | ✅ | ✅ | ✅ | |
Chinese gpt2 | chinese_gpt2 | ✅ | ✅ | ✅ | ✅ | |
Chinese clip | chinese_clip | ✅ | ✅ | ✅ | ✅ | |
Image generation Chinese text | VisionEncoderDecoderModel | ✅ | ✅ | ✅ | ✅ | |
Introduction to vit core source code | vit model | ✅ | ||||
Thu-ChatGlm-6b ( v1 version is obsolete) | simple_thu_chatglm6b | ✅ | ✅ | ✅ | ✅ | |
?chatglm- v2 -6b? | chatglm_v2_6b_lora | ✅ | ✅ | ✅ | ||
Chinese dolly_v2_3b | dolly_v2_3b | ✅ | ✅ | ✅ | ||
Chinese llama (obsolete) | chinese_llama | ✅ | ✅ | ✅ | ||
Chinese bloom | chinese_bloom | ✅ | ✅ | ✅ | ||
Chinese falcon (note: the falcon model is similar to the bloom structure) | chinese_bloom | ✅ | ✅ | ✅ | ||
Chinese pre-training code | model_clm | ✅ | ✅ | ✅ | ||
Large model of Baichuan | model_baichuan | ✅ | ✅ | ✅ | ✅ | |
Model trimming✂️ | model_modify | ✅ | ✅ | ✅ | ||
llama2 pipeline parallelism | pipeline | ✅ | ✅ | ✅ | ||
Baichuan 2-7b-chat dpo | DPO baichuan2-7b-chat | ✅ | ✅ | ✅ | ||
During training, the proportion of data changes | train_data_sample | ✅ | ✅ | ✅ | ||
internlm-base sft | internlm-sft | ✅ | ✅ | ✅ | ||
train qwen2 | train_qwen2 | ✅ | ✅ | ✅ | ✅ | |
train llava | train_llava | ✅ | ✅ | ✅ | ✅ | ✅ |
I have always felt that data flow is most clearly expressed in the form of diagrams, so I will try my best to diagram every task.
I have been doing source code interpretation of transformers. You can go to station B to view the video of Liangmulu Programmer.