NVIDIA NeMo Framework is a scalable and cloud-native generative AI framework built for researchers and PyTorch developers working on Large Language Models (LLMs), Multimodal Models (MMs), Automatic Speech Recognition (ASR), Text to Speech (TTS), and Computer Vision (CV) domains. It is designed to help you efficiently create, customize, and deploy new generative AI models by leveraging existing code and pre-trained model checkpoints.
For technical documentation, please see the NeMo Framework User Guide.
NVIDIA NeMo 2.0 introduces several significant improvements over its predecessor, NeMo 1.0, enhancing flexibility, performance, and scalability.
Python-Based Configuration - NeMo 2.0 transitions from YAML files to a Python-based configuration, providing more flexibility and control. This shift makes it easier to extend and customize configurations programmatically.
Modular Abstractions - By adopting PyTorch Lightning’s modular abstractions, NeMo 2.0 simplifies adaptation and experimentation. This modular approach allows developers to more easily modify and experiment with different components of their models.
Scalability - NeMo 2.0 seamlessly scaling large-scale experiments across thousands of GPUs using NeMo-Run, a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across computing environments.
Overall, these enhancements make NeMo 2.0 a powerful, scalable, and user-friendly framework for AI model development.
Important
NeMo 2.0 is currently supported by the LLM (large language model) and VLM (vision language model) collections.
All NeMo models are trained with Lightning. Training is automatically scalable to 1000s of GPUs.
When applicable, NeMo models leverage cutting-edge distributed training techniques, incorporating parallelism strategies to enable efficient training of very large models. These techniques include Tensor Parallelism (TP), Pipeline Parallelism (PP), Fully Sharded Data Parallelism (FSDP), Mixture-of-Experts (MoE), and Mixed Precision Training with BFloat16 and FP8, as well as others.
NeMo Transformer-based LLMs and MMs utilize NVIDIA Transformer Engine for FP8 training on NVIDIA Hopper GPUs, while leveraging NVIDIA Megatron Core for scaling Transformer model training.
NeMo LLMs can be aligned with state-of-the-art methods such as SteerLM, Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF). See NVIDIA NeMo Aligner for more information.
In addition to supervised fine-tuning (SFT), NeMo also supports the latest parameter efficient fine-tuning (PEFT) techniques such as LoRA, P-Tuning, Adapters, and IA3. Refer to the NeMo Framework User Guide for the full list of supported models and techniques.
NeMo LLMs and MMs can be deployed and optimized with NVIDIA NeMo Microservices.
NeMo ASR and TTS models can be optimized for inference and deployed for production use cases with NVIDIA Riva.
Important
NeMo Framework Launcher is compatible with NeMo version 1.0 only. NeMo-Run is recommended for launching experiments using NeMo 2.0.
NeMo Framework Launcher is a cloud-native tool that streamlines the NeMo Framework experience. It is used for launching end-to-end NeMo Framework training jobs on CSPs and Slurm clusters.
The NeMo Framework Launcher includes extensive recipes, scripts, utilities, and documentation for training NeMo LLMs. It also includes the NeMo Framework Autoconfigurator, which is designed to find the optimal model parallel configuration for training on a specific cluster.
To get started quickly with the NeMo Framework Launcher, please see the NeMo Framework Playbooks. The NeMo Framework Launcher does not currently support ASR and TTS training, but it will soon.
Getting started with NeMo Framework is easy. State-of-the-art pretrained NeMo models are freely available on Hugging Face Hub and NVIDIA NGC. These models can be used to generate text or images, transcribe audio, and synthesize speech in just a few lines of code.
We have extensive tutorials that can be run on Google Colab or with our NGC NeMo Framework Container. We also have playbooks for users who want to train NeMo models with the NeMo Framework Launcher.
For advanced users who want to train NeMo models from scratch or fine-tune existing NeMo models, we have a full suite of example scripts that support multi-GPU/multi-node training.
Version | Status | Description |
---|---|---|
Latest | Documentation of the latest (i.e. main) branch. | |
Stable | Documentation of the stable (i.e. most recent release) |
The NeMo Framework can be installed in a variety of ways, depending on your needs. Depending on the domain, you may find one of the following installation methods more suitable.
Important: We strongly recommended that you start with a base NVIDIA PyTorch container: nvcr.io/nvidia/pytorch:24.02-py3.
Install NeMo in a fresh Conda environment:
conda create --name nemo python==3.10.12
conda activate nemo
Install PyTorch using their configurator:
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
The command to install PyTorch may depend on your system. Use the configurator linked above to find the right command for your system.
Then, install NeMo via Pip or from Source. We do not provide NeMo on the conda-forge or any other Conda channel.
To install the nemo_toolkit, use the following installation method:
apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython packaging
pip install nemo_toolkit['all']
Depending on the shell used, you may need to use the
"nemo_toolkit[all]"
specifier instead in the above command.
To install a specific domain of NeMo, you must first install the nemo_toolkit using the instructions listed above. Then, you run the following domain-specific commands:
pip install nemo_toolkit['asr']
pip install nemo_toolkit['nlp']
pip install nemo_toolkit['tts']
pip install nemo_toolkit['vision']
pip install nemo_toolkit['multimodal']
If you want to work with a specific version of NeMo from a particular GitHub branch (e.g main), use the following installation method:
apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython packaging
python -m pip install git+https://github.com/NVIDIA/NeMo.git@{BRANCH}#egg=nemo_toolkit[all]
If you want to clone the NeMo GitHub repository and contribute to NeMo open-source development work, use the following installation method:
apt-get update && apt-get install -y libsndfile1 ffmpeg
git clone https://github.com/NVIDIA/NeMo
cd NeMo
./reinstall.sh
If you only want the toolkit without the additional Conda-based
dependencies, you can replace reinstall.sh
with pip install -e .
when your PWD is the root of the NeMo repository.
To install NeMo on Mac computers with the Apple M-Series GPU, you need to create a new Conda environment, install PyTorch 2.0 or higher, and then install the nemo_toolkit.
Important: This method is only applicable to the ASR domain.
Run the following code:
# [optional] install mecab using Homebrew, to use sacrebleu for NLP collection
# you can install Homebrew here: https://brew.sh
brew install mecab
# [optional] install pynini using Conda, to use text normalization
conda install -c conda-forge pynini
# install Cython manually
pip install cython packaging
# clone the repo and install in development mode
git clone https://github.com/NVIDIA/NeMo
cd NeMo
pip install 'nemo_toolkit[all]'
# Note that only the ASR toolkit is guaranteed to work on MacBook - so for MacBook use pip install 'nemo_toolkit[asr]'
To install the Windows Subsystem for Linux (WSL), run the following code in PowerShell:
wsl --install
# [note] If you run wsl --install and see the WSL help text, it means WSL is already installed.
To learn more about installing WSL, refer to Microsoft's official documentation.
After installing your Linux distribution with WSL, two options are available:
Option 1: Open the distribution (Ubuntu by default) from the Start menu and follow the instructions.
Option 2: Launch the Terminal application. Download it from Microsoft's Windows Terminal page if not installed.
Next, follow the instructions for Linux systems, as provided above. For example:
apt-get update && apt-get install -y libsndfile1 ffmpeg
git clone https://github.com/NVIDIA/NeMo
cd NeMo
./reinstall.sh
For optimal performance of a Recurrent Neural Network Transducer (RNNT), install the Numba package from Conda.
Run the following code:
conda remove numba
pip uninstall numba
conda install -c conda-forge numba
If you work with the LLM and MM domains, three additional dependencies are required: NVIDIA Apex, NVIDIA Transformer Engine, and NVIDIA Megatron Core. When working with the [main]{.title-ref} branch, these dependencies may require a recent commit.
The most recent working versions of these dependencies are here:
export apex_commit=810ffae374a2b9cb4b5c5e28eaeca7d7998fca0c
export te_commit=bfe21c3d68b0a9951e5716fb520045db53419c5e
export mcore_commit=02871b4df8c69fac687ab6676c4246e936ce92d0
export nv_pytorch_tag=24.02-py3
When using a released version of NeMo, please refer to the Software Component Versions for the correct versions.
We recommended that you start with a base NVIDIA PyTorch container: nvcr.io/nvidia/pytorch:24.02-py3.
If starting with a base NVIDIA PyTorch container, you must first launch the container:
docker run
--gpus all
-it
--rm
--shm-size=16g
--ulimit memlock=-1
--ulimit stack=67108864
nvcr.io/nvidia/pytorch:$nv_pytorch_tag
Next, you need to install the dependencies.
NVIDIA Apex is required for LLM and MM domains. Although Apex is pre-installed in the NVIDIA PyTorch container, you may need to update it to a newer version.
To install Apex, run the following code:
git clone https://github.com/NVIDIA/apex.git
cd apex
git checkout $apex_commit
pip install . -v --no-build-isolation --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam --group_norm"
When attempting to install Apex separately from the NVIDIA PyTorch container, you might encounter an error if the CUDA version on your system is different from the one used to compile PyTorch. To bypass this error, you can comment out the relevant line in the setup file located in the Apex repository on GitHub here: https://github.com/NVIDIA/apex/blob/master/setup.py#L32.
cuda-nvprof is needed to install Apex. The version should match the CUDA version that you are using.
To install cuda-nvprof, run the following code:
conda install -c nvidia cuda-nvprof=11.8
Finally, install the packaging:
pip install packaging
To install the most recent versions of Apex locally, it might be necessary to remove the [pyproject.toml]{.title-ref} file from the Apex directory.
NVIDIA Transformer Engine is required for LLM and MM domains. Although the Transformer Engine is pre-installed in the NVIDIA PyTorch container, you may need to update it to a newer version.
The Transformer Engine facilitates training with FP8 precision on NVIDIA Hopper GPUs and introduces many enhancements for the training of Transformer-based models. Refer to Transformer Engine for information.
To install Transformer Engine, run the following code:
git clone https://github.com/NVIDIA/TransformerEngine.git &&
cd TransformerEngine &&
git checkout $te_commit &&
git submodule init && git submodule update &&
NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi pip install .
Transformer Engine requires PyTorch to be built with at least CUDA 11.8.
Megatron Core is required for LLM and MM domains. Megatron Core is a library for scaling large Transformer-based models. NeMo LLMs and MMs leverage Megatron Core for model parallelism, transformer architectures, and optimized PyTorch datasets.
To install Megatron Core, run the following code:
git clone https://github.com/NVIDIA/Megatron-LM.git &&
cd Megatron-LM &&
git checkout $mcore_commit &&
pip install . &&
cd megatron/core/datasets &&
make
NeMo Text Processing, specifically Inverse Text Normalization, is now a separate repository. It is located here: https://github.com/NVIDIA/NeMo-text-processing.
NeMo containers are launched concurrently with NeMo version updates. NeMo Framework now supports LLMs, MMs, ASR, and TTS in a single consolidated Docker container. You can find additional information about released containers on the NeMo releases page.
To use a pre-built container, run the following code:
docker pull nvcr.io/nvidia/nemo:24.05
To build a nemo container with Dockerfile from a branch, run the following code:
DOCKER_BUILDKIT=1 docker build -f Dockerfile -t nemo:latest
If you choose to work with the main branch, we recommend using NVIDIA's PyTorch container version 23.10-py3 and then installing from GitHub.
docker run --gpus all -it --rm -v <nemo_github_folder>:/NeMo --shm-size=8g
-p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit
stack=67108864 --device=/dev/snd nvcr.io/nvidia/pytorch:23.10-py3
The NeMo Framework Launcher does not currently support ASR and TTS training, but it will soon.
FAQ can be found on the NeMo Discussions board. You are welcome to ask questions or start discussions on the board.
We welcome community contributions! Please refer to CONTRIBUTING.md for the process.
We provide an ever-growing list of publications that utilize the NeMo Framework.
To contribute an article to the collection, please submit a pull request
to the gh-pages-src
branch of this repository. For detailed
information, please consult the README located at the gh-pages-src
branch.