English
The Chinese name of this course is Artificial Intelligence System , which mainly explains the design of computer systems that support artificial intelligence. The corresponding English course name is System for AI . The following terms will be used interchangeably in this course: artificial intelligence system , AI-System and System for AI .
This course is one of the artificial intelligence related tutorials planned in the Microsoft Artificial Intelligence Education and Co-construction Community. Under the basic tutorial module, the course number and name is A6-Artificial Intelligence System .
Welcome to visit the A-Basic Tutorial module of Microsoft Artificial Intelligence Education and Co-construction Community to access more related content.
In recent years, artificial intelligence, especially deep learning technology, has developed rapidly, which is inseparable from the continuous progress of computer hardware and software systems. In the foreseeable future, the development of artificial intelligence technology will still rely on the joint innovation model that combines computer systems and artificial intelligence. It should be noted that computer systems are now empowering artificial intelligence with larger scale and higher complexity, which requires not only more system innovation, but also systematic thinking and methodology. At the same time, artificial intelligence is also supporting the design of complex systems.
We have noticed that most of the current artificial intelligence-related courses, especially deep learning and machine learning-related courses, mainly focus on related theories, algorithms or applications, and system-related courses are rare. We hope that the course of artificial intelligence systems will make artificial intelligence-related education more comprehensive and in-depth, so as to jointly promote the cultivation of talents at the intersection of artificial intelligence and systems.
This course is designed primarily for senior undergraduate and graduate students to help students:
Completely understand the computer system architecture that supports deep learning, and learn system design under the complete life cycle of deep learning through practical problems.
Introduces cutting-edge research work combining systems and artificial intelligence, including AI for Systems and Systems for AI, to help senior undergraduates and graduate students better find and define meaningful research questions.
Design experimental courses from the perspective of systematic research. Encourage students to implement and optimize system modules by operating and applying mainstream and latest frameworks, platforms and tools to improve their ability to solve practical problems rather than just understanding the use of tools.
Prerequisite courses: C/C++/Python, computer architecture, introduction to algorithms
The course mainly includes the following three modules:
The first part is the basic knowledge of artificial intelligence and a full-stack overview of artificial intelligence systems; as well as the systematic design and methodology of deep learning systems.
The second part is an advanced course, including the most cutting-edge research areas at the intersection of systems and artificial intelligence.
The third part is the supporting experimental courses, including the most mainstream frameworks, platforms and tools, as well as a series of experimental projects.
The content of the first part will focus on basic knowledge, while the content of the other two parts will be dynamically adjusted with technological advancements in academia and industry. The content of the last two parts will be organized in a modular form to facilitate adjustment or combination with other CS courses (such as compilation principles, etc.) as advanced lecture notes or internship projects.
The design of this course will also draw on the research results and experience of Microsoft Research Asia in the intersection of artificial intelligence and systems, including some platforms and tools developed by Microsoft and the research institute. The course also encourages other schools and teachers to add and adjust more advanced topics or other experiments according to their own needs.
basic course
Course number | Handout name | Remark |
1 | Course introduction | Course Overview and System/AI Fundamentals |
2 | Artificial Intelligence System Overview | Development history of artificial intelligence systems, basics of neural networks, basics of artificial intelligence systems |
3 | Basics of deep neural network computing framework | Backpropagation and automatic derivation, tensors, directed acyclic graphs, execution graph papers and systems: PyTorch, TensorFlow |
4 | Matrix operations and computer architecture | Matrix operations, CPU/SIMD, GPGPU, ASIC/TPU Papers and systems: Blas, TPU |
5 | Distributed training algorithm | Data parallelism, model parallelism, distributed SGD Papers and Systems: PipeDream |
6 | Distributed training system | MPI, parameter servers, all-reduce, RDMA Papers and Systems: Horovod |
7 | Heterogeneous computing cluster scheduling and resource management system | Running DNN tasks on a cluster: containers, resource allocation, scheduling papers and systems: Kubeflow, OpenPAI, Gandiva |
8 | Deep learning derivation system | Efficiency, latency, throughput, deployment papers and systems: TensorRT, TensorFlowLite, ONNX |
Advanced courses
Course number | Handout name | Remark |
9 | Compilation and optimization of computational graphs | IR, subgraph pattern matching, matrix multiplication and memory optimization papers and systems: XLA, MLIR, TVM, NNFusion |
10 | Compression and sparsification optimization of neural networks | Model compression, sparsification, pruning |
11 | Automatic machine learning system | Hyperparameter tuning, neural network structure search (NAS) Papers and systems: Hyperband, SMAC, ENAX, AutoKeras, NNI |
12 | reinforcement learning system | RL theory, RL system papers and systems: AC3, RLlib, AlphaZero |
13 | Security and privacy | Federated Learning, Security, Privacy Papers and Systems: DeepFake |
14 | Using artificial intelligence to optimize computer systems | Artificial intelligence is applied to traditional system problems, artificial intelligence is applied to system algorithm papers and systems: streaming media systems, database indexing, system parameter tuning, chip design, predictive resource scheduling |
Basic experiment
Experiment number | Experiment name | Remark |
Experiment 1 | Examples of getting started with frameworks and tools | |
Experiment 2 | Customize a new tensor operation | |
Experiment 3 | CUDA implementation and optimization | |
Experiment 4 | Implementation or optimization of AllReduce | |
Experiment 5 | Configure Container for cloud training or inference preparation | |
Advanced experiments
Experiment 6 | Learn to use the scheduling management system | |
Experiment 7 | Distributed training task exercise | |
Experiment 8 | Automatic machine learning system exercises | |
Experiment 9 | Reinforcement learning system exercises | |
The textbook "Artificial Intelligence System" is one of the artificial intelligence-related textbooks planned in the Microsoft Artificial Intelligence Education and Co-construction Community. We have noticed that most of the current artificial intelligence-related textbooks, especially deep learning and machine learning-related courses, mainly focus on related theories, algorithms or applications, and system-related textbooks are rare. We hope that artificial intelligence system textbooks can make artificial intelligence system education more systematic and universal, so as to jointly promote the cultivation of talents at the intersection of artificial intelligence and systems.
The paper version of the textbook "Deep Learning System Design: Theory and Practice" has been published. Welcome to read it!
<TBD>
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https:// cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (eg, status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the Creative Commons Attribution 4.0 International Public License, see the LICENSE file, and grant you a license to any code in the repository under the MIT License, see the LICENSE-CODE file.
Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at https://go.microsoft.com/fwlink/?LinkID=254653.
Privacy information can be found at https://privacy.microsoft.com/en-us/
Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.