Registration link: https://xihe.mindspore.cn/course/foundation-model-v2/introduction
(Note: You must register to participate in the free course! Add the QQ group simultaneously, and subsequent course matters will be notified in the group!)
The second phase of the course will be live broadcast at Station B from 14:00-15:00 every Saturday from October 14th.
The ppt and code of each course will be gradually uploaded to github along with the teaching, and the series of video playbacks will be archived on station b. You can get a review of the knowledge points of each class and a course preview for the next class on the MindSpore official account. Welcome to Everyone receives a series of large model tasks from the MindSpore community to challenge.
Because the course cycle is long, the class schedule may be slightly adjusted midway. The final notice shall prevail. Thank you for your understanding!
Friends are warmly welcome to participate in the construction of the course. Interesting developments based on the course can be submitted to the MindSpore large model platform.
If you find any problems with courseware and code during the learning process, and you want us to explain what content you want, or have any suggestions for the course, you can create an issue directly in this repository.
Shengsi MindSpore technology open class is now in full swing. It is open to all developers who are interested in large models. We will lead you to combine theory with time and gradually deepen the large model technology from the shallower to the deeper.
In the completed first course (Lecture 1-10), we started with Transformer, analyzed the evolution route of ChatGPT, and guided you step by step to build a simple version of "ChatGPT"
The ongoing second phase of the course (Lecture 11-) has been comprehensively upgraded on the basis of the first phase. It focuses on the whole process practice of large models from development to application, explaining more cutting-edge large model knowledge and enriching more knowledge. A diverse lineup of lecturers, looking forward to your joining!
Chapter number | Chapter name | Course Introduction | video | Courseware and code | Summary of knowledge points |
---|---|---|---|---|---|
Lecture 1 | Transformer | Multi-head self-attention principle. Masking processing method of Masked self-attention. Transformer-based machine translation task training. | link | link | link |
Lecture 2 | BERT | BERT model design based on Transformer Encoder: MLM and NSP tasks. BERT's paradigm for fine-tuning downstream tasks. | link | link | link |
Lecture 3 | GPT | GPT model design based on Transformer Decoder: Next token prediction. GPT downstream task fine-tuning paradigm. | link | link | link |
Lecture 4 | GPT2 | The core innovations of GPT2 include Task Conditioning and Zero shot learning; the model implementation details are based on the changes of GPT1. | link | link | link |
Lecture 5 | MindSpore automatically parallelizes | Data parallelism, model parallelism, Pipeline parallelism, memory optimization and other technologies based on MindSpore's distributed parallelism characteristics. | link | link | link |
Lecture 6 | Code pre-training | The development history of code pre-training. Code data preprocessing. CodeGeex code pre-trains large models. | link | link | link |
Lecture 7 | Prompt Tuning | Change from Pretrain-finetune paradigm to Prompt tuning paradigm. Hard prompt and Soft prompt related technologies. Just change the prompting of the description text. | link | link | link |
Lecture 8 | Multi-modal pre-trained large model | The design, data processing and advantages of Zidong Taichu multi-modal large model; the theoretical overview, system framework, current situation and challenges of speech recognition. | link | / | / |
Lecture 9 | Instruct Tuning | The core idea of instruction tuning: enable the model to understand the task description (instruction). Limitations of instruction tuning: unable to support open domain innovative tasks, unable to align LM training goals and human needs. Chain-of-thoughts: By providing examples in prompts, the model can draw inferences. | link | link | link |
Lecture 10 | RLHF | The core idea of RLHF: Align LLM with human behavior. Breakdown of RLHF technology: LLM fine-tuning, reward model training based on human feedback, and model fine-tuning through reinforcement learning PPO algorithm. | link | link | Updating |
Lecture 11 | ChatGLM | GLM model structure, evolution from GLM to ChatGLM, ChatGLM inference deployment code demonstration | link | link | link |
Lecture 12 | Multimodal remote sensing intelligent interpretation basic model | In this course, Mr. Sun Xian, deputy director of the researcher laboratory of the Institute of Aerospace Information Innovation, Chinese Academy of Sciences, explained the basic model of multi-modal remote sensing interpretation, revealing the development and challenges of intelligent remote sensing technology in the era of large models, and the technical routes and solutions of the basic remote sensing model. Typical scenario applications | link | / | link |
Lecture 13 | ChatGLM2 | ChatGLM2 technical analysis, ChatGLM2 inference deployment code demonstration, ChatGLM3 feature introduction | link | link | link |
Lecture 14 | Text generation and decoding principles | Taking MindNLP as an example to explain the principles and implementation of search and sampling technology | link | link | link |
Lecture 15 | LLAMA | LLaMA background and introduction to the alpaca family, LLaMA model structure analysis, LLaMA inference deployment code demonstration | link | link | link |
Lecture 16 | LLAMA2 | Introducing the LLAMA2 model structure, reading the code to demonstrate LLAMA2 chat deployment | link | link | link |
Lecture 17 | Pengcheng mind | The Pengcheng Brain 200B model is an autoregressive language model with 200 billion parameters. It is based on MindSpore's multi-dimensional distributed parallel technology for long-term large-scale development on the China Computing Network hub node 'Pengcheng Cloud Brain II' kilocard cluster. Scale training. The model focuses on the core capabilities of Chinese, taking into account English and some multi-language capabilities. It has completed training on 1.8T tokens. | link | / | link |
Lecture 18 | CPM-Bee | Introducing CPM-Bee pre-training, inference, fine-tuning and live code demonstration | link | link | link |
Lecture 19 | RWKV1-4 | The decline of RNN and the rise of Transformers. Universal Transformers? Disadvantages of Self-attention "Punch" Transformer's new RNN-RWKV Practice of RWKV model based on MindNLP | link | / | link |
Lecture 20 | MOE | The past and present life of MoE The implementation foundation of MoE: AlltoAll communication; Mixtral 8x7b: The best open source MoE large model at present, MoE and lifelong learning, based on the Mixtral 8x7b inference demonstration of Shengsi MindSpore. | link | link | link |
Lecture 21 | Efficient parameter fine-tuning | Introducing Lora, (P-Tuning) principles and code implementation | link | link | link |
Lecture 22 | Prompt Engineering | Prompt engineering: 1. What is Prompt? 2. How to define the quality of a Prompt? 3. How to write a high-quality Prompt? 4. How to produce a high-quality prompt? 5. Let’s briefly talk about some of the problems we encountered when executing Prompt. | link | / | link |
Lecture 23 | Multi-dimensional hybrid parallel automatic search optimization strategy | Topic 1·Time loss model and improved multi-dimensional dichotomy/Topic 2·APSS algorithm application | up and down | link | |
Lecture 24 | Scholar. Puyu large model open source full chain tool chain introduction and intelligent agent development experience | In this course, we are fortunate to invite Mr. Wen Xing, the technical operator and technical evangelist of Shusheng.Puyu community, and Mr. Geng Li, the technical evangelist of MindSpore, to explain in detail the open source full-link tool of Shusheng.Puyu large model. chain, demonstrating how to fine-tune, reason and develop intelligent agents on Shusheng.Puyu. | link | / | link |
Lecture 25 | RAG | ||||
Lecture 26 | LangChain module analysis | Analyze Models, Prompts, Memory, Chains, Agents, Indexes, Callbacks modules, and case analysis | |||
Lecture 27 | RWKV5-6 | / | |||
Lecture 28 | Quantify | Introducing low-bit quantization and other related model quantization technologies |