nncf 다운로드 - nncf 소스 코드 다운로드

nncf

AI 소스 코드

v2.14.0

다운로드

신경망 압축 프레임워크(NNCF)

주요 기능 • 설치 • 문서화 • 사용법 • 튜토리얼 및 샘플 • 타사 통합 • Model Zoo

신경망 압축 프레임워크(NNCF)는 정확도 저하를 최소화하면서 OpenVINO™에서 신경망 추론을 최적화하기 위한 사후 훈련 및 훈련 시간 알고리즘 제품군을 제공합니다.

NNCF는 PyTorch, TorchFX, TensorFlow, ONNX 및 OpenVINO™의 모델과 작동하도록 설계되었습니다.

NNCF는 다양한 사용 사례 및 모델에 대한 압축 알고리즘의 사용을 보여주는 샘플을 제공합니다. NNCF Model Zoo 페이지에서 NNCF 기반 샘플로 달성 가능한 압축 결과를 확인하세요.

프레임워크는 독립 실행형 모드에서 구축하고 사용할 수 있는 Python* 패키지로 구성됩니다. 프레임워크 아키텍처는 PyTorch 및 TensorFlow 딥 러닝 프레임워크 모두에 대해 서로 다른 압축 알고리즘을 쉽게 추가할 수 있도록 통합되었습니다.

주요 특징

훈련 후 압축 알고리즘

압축 알고리즘	오픈비노	파이토치	토치FX	텐서플로우	ONNX
훈련 후 양자화	지원됨	지원됨	실험적	지원됨	지원됨
가중치 압축	지원됨	지원됨	실험적	지원되지 않음	지원되지 않음
활성화 희소성	지원되지 않음	실험적	지원되지 않음	지원되지 않음	지원되지 않음

훈련 시간 압축 알고리즘

압축 알고리즘	파이토치	텐서플로우
양자화 인식 훈련	지원됨	지원됨
혼합 정밀도 양자화	지원됨	지원되지 않음
희소성	지원됨	지원됨
필터 가지치기	지원됨	지원됨
움직임 가지치기	실험적	지원되지 않음

압축된 모델을 얻기 위한 구성 가능한 자동 모델 그래프 변환입니다.
참고 : TensorFlow 모델은 제한적으로 지원됩니다. Sequential 또는 Keras Functional API를 사용하여 생성된 모델만 지원됩니다.
압축 방법에 대한 공통 인터페이스입니다.
보다 빠른 압축 모델 미세 조정을 위한 GPU 가속 레이어.
분산 교육 지원.
NNCF를 사용자 정의 학습 파이프라인에 통합하는 프로세스를 보여주는 유명한 타사 저장소(huggingface-transformers)용 Git 패치입니다.
가지치기, 희소성 및 양자화 알고리즘의 완벽한 조합. NNCF 최적화부터 압축된 OpenVINO IR까지 엔드투엔드의 조인트(이동) 가지치기, 양자화 및 증류(JPQD)의 예는 최적 인텔을 참조하세요.
PyTorch 압축 모델을 ONNX* 체크포인트로 내보내고 TensorFlow 압축 모델을 SavedModel 또는 Frozen Graph 형식으로 내보내 OpenVINO™ 툴킷과 함께 사용할 수 있습니다.
적응형 압축 수준 훈련 및 조기 종료 훈련을 통해 정확도 인식 모델 훈련 파이프라인을 지원합니다.

선적 서류 비치

이 문서는 NNCF에 기여하는 데 필요한 NNCF 알고리즘 및 기능에 대한 자세한 정보를 다루고 있습니다.

NNCF에 대한 최신 사용자 문서는 여기에서 확인할 수 있습니다.

NNCF API 문서는 여기에서 찾을 수 있습니다.

용법

훈련 후 양자화

NNCF PTQ는 8비트 양자화를 적용하는 가장 간단한 방법입니다. 알고리즘을 실행하려면 모델과 소규모(~300개 샘플) 교정 데이터 세트만 필요합니다.

OpenVINO는 PTQ를 실행하는 데 선호되는 백엔드이며 PyTorch, TensorFlow 및 ONNX도 지원됩니다.

오픈비노

 import nncf
import openvino . runtime as ov
import torch
from torchvision import datasets , transforms

# Instantiate your uncompressed model
model = ov . Core (). read_model ( "/model_path" )

# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets . ImageFolder ( "/path" , transform = transforms . Compose ([ transforms . ToTensor ()]))
dataset_loader = torch . utils . data . DataLoader ( val_dataset , batch_size = 1 )

# Step 1: Initialize transformation function
def transform_fn ( data_item ):
    images , _ = data_item
    return images

# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf . Dataset ( dataset_loader , transform_fn )
# Step 3: Run the quantization pipeline
quantized_model = nncf . quantize ( model , calibration_dataset )

파이토치

 import nncf
import torch
from torchvision import datasets , models

# Instantiate your uncompressed model
model = models . mobilenet_v2 ()

# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets . ImageFolder ( "/path" , transform = transforms . Compose ([ transforms . ToTensor ()]))
dataset_loader = torch . utils . data . DataLoader ( val_dataset )

# Step 1: Initialize the transformation function
def transform_fn ( data_item ):
    images , _ = data_item
    return images

# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf . Dataset ( dataset_loader , transform_fn )
# Step 3: Run the quantization pipeline
quantized_model = nncf . quantize ( model , calibration_dataset )

참고 훈련 후 양자화 알고리즘이 품질 요구 사항을 충족하지 않는 경우 양자화된 Pytorch 모델을 미세 조정할 수 있습니다. 여기에서 Pytorch 모델에 대한 양자화 인식 훈련 파이프라인의 예를 찾을 수 있습니다.

토치FX

 import nncf
import torch . fx
from torchvision import datasets , models
from nncf . torch import disable_patching

# Instantiate your uncompressed model
model = models . mobilenet_v2 ()

# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets . ImageFolder ( "/path" , transform = transforms . Compose ([ transforms . ToTensor ()]))
dataset_loader = torch . utils . data . DataLoader ( val_dataset )

# Step 1: Initialize the transformation function
def transform_fn ( data_item ):
    images , _ = data_item
    return images

# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf . Dataset ( dataset_loader , transform_fn )

# Step 3: Export model to TorchFX
input_shape = ( 1 , 3 , 224 , 224 )
with nncf . torch . disable_patching ():
    fx_model = torch . export . export_for_training ( model , args = ( ex_input ,)). module ()
    # or
    # fx_model = torch.export.export(model, args=(ex_input,)).module()

    # Step 4: Run the quantization pipeline
    quantized_fx_model = nncf . quantize ( fx_model , calibration_dataset )

텐서플로우

 import nncf
import tensorflow as tf
import tensorflow_datasets as tfds

# Instantiate your uncompressed model
model = tf . keras . applications . MobileNetV2 ()

# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = tfds . load ( "/path" , split = "validation" ,
                        shuffle_files = False , as_supervised = True )

# Step 1: Initialize transformation function
def transform_fn ( data_item ):
    images , _ = data_item
    return images

# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf . Dataset ( val_dataset , transform_fn )
# Step 3: Run the quantization pipeline
quantized_model = nncf . quantize ( model , calibration_dataset )

ONNX

 import onnx
import nncf
import torch
from torchvision import datasets

# Instantiate your uncompressed model
onnx_model = onnx . load_model ( "/model_path" )

# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets . ImageFolder ( "/path" , transform = transforms . Compose ([ transforms . ToTensor ()]))
dataset_loader = torch . utils . data . DataLoader ( val_dataset , batch_size = 1 )

# Step 1: Initialize transformation function
input_name = onnx_model . graph . input [ 0 ]. name
def transform_fn ( data_item ):
    images , _ = data_item
    return { input_name : images . numpy ()}

# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf . Dataset ( dataset_loader , transform_fn )
# Step 3: Run the quantization pipeline
quantized_model = nncf . quantize ( onnx_model , calibration_dataset )

훈련 시간 양자화

다음은 모델 가중치와 압축 매개변수를 미세 조정하여 더 높은 정확도를 달성할 수 있는 정확도 인식 양자화 파이프라인의 예입니다.

파이토치

 import nncf
import torch
from torchvision import datasets , models

# Instantiate your uncompressed model
model = models . mobilenet_v2 ()

# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets . ImageFolder ( "/path" , transform = transforms . Compose ([ transforms . ToTensor ()]))
dataset_loader = torch . utils . data . DataLoader ( val_dataset )

# Step 1: Initialize the transformation function
def transform_fn ( data_item ):
    images , _ = data_item
    return images

# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf . Dataset ( dataset_loader , transform_fn )
# Step 3: Run the quantization pipeline
quantized_model = nncf . quantize ( model , calibration_dataset )

# Now use compressed_model as a usual torch.nn.Module
# to fine-tune compression parameters along with the model weights

# Save quantization modules and the quantized model parameters
checkpoint = {
    'state_dict' : model . state_dict (),
    'nncf_config' : model . nncf . get_config (),
    ... # the rest of the user-defined objects to save
}
torch . save ( checkpoint , path_to_checkpoint )

# ...

# Load quantization modules and the quantized model parameters
resuming_checkpoint = torch . load ( path_to_checkpoint )
nncf_config = resuming_checkpoint [ 'nncf_config' ]
state_dict = resuming_checkpoint [ 'state_dict' ]

quantized_model = nncf . torch . load_from_config ( model , nncf_config , example_input )
model . load_state_dict ( state_dict )
# ... the rest of the usual PyTorch-powered training pipeline

훈련 시간 압축

다음은 모델 가중치와 압축 매개변수를 미세 조정하여 더 높은 정확도를 달성할 수 있는 Accuracy Aware RB Sparsification 파이프라인의 예입니다.

파이토치

 import torch
import nncf . torch  # Important - must be imported before any other external package that depends on torch

from nncf import NNCFConfig
from nncf . torch import create_compressed_model , register_default_init_args

# Instantiate your uncompressed model
from torchvision . models . resnet import resnet50
model = resnet50 ()

# Load a configuration file to specify compression
nncf_config = NNCFConfig . from_json ( "resnet50_imagenet_rb_sparsity.json" )

# Provide data loaders for compression algorithm initialization, if necessary
import torchvision . datasets as datasets
representative_dataset = datasets . ImageFolder ( "/path" , transform = transforms . Compose ([ transforms . ToTensor ()]))
init_loader = torch . utils . data . DataLoader ( representative_dataset )
nncf_config = register_default_init_args ( nncf_config , init_loader )

# Apply the specified compression algorithms to the model
compression_ctrl , compressed_model = create_compressed_model ( model , nncf_config )

# Now use compressed_model as a usual torch.nn.Module
# to fine-tune compression parameters along with the model weights

# ... the rest of the usual PyTorch-powered training pipeline

# Export to ONNX or .pth when done fine-tuning
compression_ctrl . export_model ( "compressed_model.onnx" )
torch . save ( compressed_model . state_dict (), "compressed_model.pth" )

참고(PyTorch) : NNCF가 PyTorch 백엔드 내에서 작동하는 방식으로 인해 import nncf 코드에서 사용하는 패키지 또는 타사 패키지에서 다른 torch 를 가져오기 전에 수행되어야 합니다. 그렇지 않으면 압축이 불완전하게 적용될 수 있습니다.

텐서플로우

 import tensorflow as tf

from nncf import NNCFConfig
from nncf . tensorflow import create_compressed_model , register_default_init_args

# Instantiate your uncompressed model
from tensorflow . keras . applications import ResNet50
model = ResNet50 ()

# Load a configuration file to specify compression
nncf_config = NNCFConfig . from_json ( "resnet50_imagenet_rb_sparsity.json" )

# Provide dataset for compression algorithm initialization
representative_dataset = tf . data . Dataset . list_files ( "/path/*.jpeg" )
nncf_config = register_default_init_args ( nncf_config , representative_dataset , batch_size = 1 )

# Apply the specified compression algorithms to the model
compression_ctrl , compressed_model = create_compressed_model ( model , nncf_config )

# Now use compressed_model as a usual Keras model
# to fine-tune compression parameters along with the model weights

# ... the rest of the usual TensorFlow-powered training pipeline

# Export to Frozen Graph, TensorFlow SavedModel or .h5  when done fine-tuning
compression_ctrl . export_model ( "compressed_model.pb" , save_format = "frozen_graph" )

학습 코드의 NNCF 사용법에 대한 자세한 설명은 이 튜토리얼을 참조하세요.

데모, 튜토리얼 및 샘플

NNCF 기반 압축을 더 빠르게 시작하려면 아래에 제시된 샘플 노트북과 스크립트를 사용해 보세요.

Jupyter* 노트북 튜토리얼 및 데모

즉시 실행 가능한 Jupyter* 노트북 튜토리얼 및 데모를 통해 OpenVINO 툴킷을 통한 추론 모델 최적화를 위한 NNCF 압축 알고리즘을 설명하고 표시할 수 있습니다.

노트북 튜토리얼 이름	압축 알고리즘	백엔드	도메인
BERT 양자화	훈련 후 양자화	오픈비노	NLP
MONAI 분할 모델 양자화	훈련 후 양자화	오픈비노	분할
PyTorch 모델 양자화	훈련 후 양자화	파이토치	이미지 분류
정확도 제어를 통한 양자화	정확도 제어를 통한 훈련 후 양자화	오픈비노	음성-텍스트, 객체 감지
PyTorch 훈련 시간 압축	훈련 시간 압축	파이토치	이미지 분류
TensorFlow 훈련 시간 압축	훈련 시간 압축	텐서플로우	이미지 분류
BERT를 위한 공동 가지치기, 양자화 및 증류	공동 가지치기, 양자화 및 증류	오픈비노	NLP

다양한 도메인의 모델에 대한 NNCF 압축과 함께 OpenVINO 변환 및 추론을 보여주는 노트북 목록:

데모 모델	압축 알고리즘	백엔드	도메인
YOLOv8	훈련 후 양자화	오픈비노	물체 감지, 키포인트 감지, 인스턴스 분할
효율적인SAM	훈련 후 양자화	오픈비노	이미지 분할
세그먼트 무엇이든 모델	훈련 후 양자화	오픈비노	이미지 분할
원포머	훈련 후 양자화	오픈비노	이미지 분할
지시Pix2Pix	훈련 후 양자화	오픈비노	이미지 대 이미지
클립	훈련 후 양자화	오픈비노	이미지를 텍스트로
블립	훈련 후 양자화	오픈비노	이미지를 텍스트로
잠재 일관성 모델	훈련 후 양자화	오픈비노	텍스트를 이미지로
ControlNet QR 코드 몬스터	훈련 후 양자화	오픈비노	텍스트를 이미지로
SDXL-터보	훈련 후 양자화	오픈비노	텍스트를 이미지로, 이미지 대 이미지
증류 속삭임	훈련 후 양자화	오픈비노	음성-텍스트
속삭임	훈련 후 양자화	오픈비노	음성-텍스트
MMS 음성 인식	훈련 후 양자화	오픈비노	음성-텍스트
문법 오류 수정	훈련 후 양자화	오픈비노	NLP, 문법 교정
LLM 지침 다음	체중 압축	오픈비노	NLP, 다음 명령
LLM 채팅 봇	체중 압축	오픈비노	NLP, 채팅봇

훈련 후 양자화 예

양자화 및 이에 따른 추론 속도 향상을 보여주는 컴팩트 스크립트:

예시 이름	압축 알고리즘	백엔드	도메인
OpenVINO MobileNetV2	훈련 후 양자화	오픈비노	이미지 분류
오픈VINO YOLOv8	훈련 후 양자화	오픈비노	객체 감지
OpenVINO YOLOv8 QwAС	정확도 제어를 통한 훈련 후 양자화	오픈비노	객체 감지
OpenVINO 이상 분류	정확도 제어를 통한 훈련 후 양자화	오픈비노	이상 분류
PyTorch MobileNetV2	훈련 후 양자화	파이토치	이미지 분류
파이토치 SSD	훈련 후 양자화	파이토치	객체 감지
TorchFX Resnet18	훈련 후 양자화	토치FX	이미지 분류
TensorFlow MobileNetV2	훈련 후 양자화	텐서플로우	이미지 분류
ONNX MobileNetV2	훈련 후 양자화	ONNX	이미지 분류

훈련 시간 압축 예

분류, 감지, 세분화 작업을 위한 압축, 훈련, 추론을 포함한 전체 파이프라인의 예:

예시 이름	압축 알고리즘	백엔드	도메인
PyTorch 이미지 분류	훈련 시간 압축	파이토치	이미지 분류
PyTorch 객체 감지	훈련 시간 압축	파이토치	객체 감지
PyTorch 의미론적 분할	훈련 시간 압축	파이토치	의미론적 분할
TensorFlow 이미지 분류	훈련 시간 압축	텐서플로우	이미지 분류
TensorFlow 객체 감지	훈련 시간 압축	텐서플로우	객체 감지
TensorFlow 인스턴스 분할	훈련 시간 압축	텐서플로우	인스턴스 분할

타사 저장소 통합

NNCF는 타사 저장소의 교육/평가 파이프라인에 쉽게 통합될 수 있습니다.

사용처

OpenVINO 교육 확장
NNCF는 모델 최적화 백엔드로 OpenVINO Training Extensions에 통합되었습니다. 사용 가능한 모델 템플릿을 기반으로 새 모델을 훈련, 최적화 및 내보낼 수 있을 뿐만 아니라 내보낸 모델을 OpenVINO로 실행할 수도 있습니다.
허깅페이스 옵티멈 인텔
NNCF는 HuggingFace Optimum Intel의 유명한 transformers 저장소 내에서 압축 백엔드로 사용됩니다.

설치 가이드

자세한 설치 지침은 설치 가이드를 참조하세요.

NNCF는 pip를 통해 일반 PyPI 패키지로 설치할 수 있습니다.

pip install nncf

NNCF는 conda를 통해서도 사용할 수 있습니다.

conda install -c conda-forge nncf

시스템 요구사항

Ubuntu* 18.04 이상(64비트)
Python* 3.9 이상
지원되는 프레임워크:
- PyTorch* >=2.4, <2.6
- 텐서플로우* >=2.8.4, <=2.15.1
- ONNX* ==1.17.0
- OpenVINO* >=2022.3.0

이 저장소는 Python* 3.10.14, PyTorch* 2.5.0(NVidia CUDA* Toolkit 12.4) 및 TensorFlow* 2.12.1(NVidia CUDA* Toolkit 11.8)에서 테스트되었습니다.

NNCF 압축 NNCF 모델 동물원

모델 목록 및 압축 결과는 NNCF Model Zoo 페이지에서 확인할 수 있습니다.

인용

@ article{kozlov2020neural,
    title =   {Neural network compression framework for fast model inference},
    author =  {Kozlov, Alexander and Lazarevich, Ivan and Shamporov, Vasily and Lyalyushkin, Nikolay and Gorbachev, Yury},
    journal = {arXiv preprint arXiv: 2002.08679 },
    year =    { 2020 }
}