xla 다운로드 - xla 소스 코드 다운로드

xla

기타 소스코드

1.0.0

다운로드

Pytorch/XLA

현재 CI 상태 :

Pytorch/XLA는 XLA 딥 러닝 컴파일러를 사용하여 Pytorch 딥 러닝 프레임 워크 및 클라우드 TPU를 연결하는 Python 패키지입니다. Kaggle과 함께 단일 클라우드 TPU VM에서 무료로 시도 할 수 있습니다!

시작하려면 Kaggle 노트북 중 하나를 살펴보십시오.

pytorch/xla 2.0을 사용한 안정적인 확산
분산 된 pytorch/xla 기본 사항

설치

TPU

새로운 TPU VM에 Pytorch/XLA 안정 빌드를 설치하려면 :

 pip install torch~=2.5.0 torch_xla[tpu]~=2.5.0 -f https://storage.googleapis.com/libtpu-releases/index.html

새로운 TPU VM에 Pytorch/XLA Nightly 빌드를 설치하려면 :

 pip3 install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
pip install 'torch_xla[tpu] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl' -f https://storage.googleapis.com/libtpu-releases/index.html

GPU 플러그인

Pytorch/XLA는 이제 libtpu 와 유사한 플러그인 패키지를 통해 GPU 지원을 제공합니다.

 pip install torch~=2.5.0 torch_xla~=2.5.0 https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla_cuda_plugin-2.5.0-py3-none-any.whl

시작하기

기존 교육 루프를 업데이트하려면 다음을 변경하십시오.

 - import torch.multiprocessing as mp
+ import torch_xla as xla
+ import torch_xla.core.xla_model as xm

 def _mp_fn(index):
   ...

+  # Move the model paramters to your XLA device
+  model.to(xla.device())

   for inputs, labels in train_loader:
+    with xla.step():
+      # Transfer data to the XLA device. This happens asynchronously.
+      inputs, labels = inputs.to(xla.device()), labels.to(xla.device())
       optimizer.zero_grad()
       outputs = model(inputs)
       loss = loss_fn(outputs, labels)
       loss.backward()
-      optimizer.step()
+      # `xm.optimizer_step` combines gradients across replicas
+      xm.optimizer_step(optimizer)

 if __name__ == '__main__':
-  mp.spawn(_mp_fn, args=(), nprocs=world_size)
+  # xla.launch automatically selects the correct world size
+  xla.launch(_mp_fn, args=())

DistributedDataParallel 사용하는 경우 다음을 변경하십시오.

 import torch.distributed as dist
- import torch.multiprocessing as mp
+ import torch_xla as xla
+ import torch_xla.distributed.xla_backend

 def _mp_fn(rank):
   ...

-  os.environ['MASTER_ADDR'] = 'localhost'
-  os.environ['MASTER_PORT'] = '12355'
-  dist.init_process_group("gloo", rank=rank, world_size=world_size)
+  # Rank and world size are inferred from the XLA device runtime
+  dist.init_process_group("xla", init_method='xla://')
+
+  model.to(xm.xla_device())
+  # `gradient_as_bucket_view=True` required for XLA
+  ddp_model = DDP(model, gradient_as_bucket_view=True)

-  model = model.to(rank)
-  ddp_model = DDP(model, device_ids=[rank])

   for inputs, labels in train_loader:
+    with xla.step():
+      inputs, labels = inputs.to(xla.device()), labels.to(xla.device())
       optimizer.zero_grad()
       outputs = ddp_model(inputs)
       loss = loss_fn(outputs, labels)
       loss.backward()
       optimizer.step()

 if __name__ == '__main__':
-  mp.spawn(_mp_fn, args=(), nprocs=world_size)
+  xla.launch(_mp_fn, args=())

Semantics 및 기능에 대한 설명을 포함하여 Pytorch/XLA에 대한 추가 정보는 pytorch.org에서 확인할 수 있습니다. XLA 장치 (TPU, CUDA, CPU 및 ...)에서 실행되는 네트워크를 작성할 때 모범 사례는 API 안내서를 참조하십시오.

포괄적 인 사용자 가이드는 다음과 같습니다.

최신 릴리스 문서

마스터 브랜치에 대한 문서

Pytorch/XLA 튜토리얼

클라우드 TPU VM QuickStart
클라우드 TPU 포드 슬라이스 QuickStart
TPU VM에서 프로파일 링
GPU 가이드

사용 가능한 도커 이미지 및 바퀴

파이썬 패키지

버전 R2.1로 시작하는 Pytorch/XLA 릴리스는 PYPI에서 사용할 수 있습니다. 이제 pip install torch_xla 사용하여 기본 빌드를 설치할 수 있습니다. 설치된 torch_xla 에 해당하는 Cloud TPU 플러그인을 설치하려면 메인 빌드를 설치 한 후 옵션 tpu 종속성을 설치하십시오.

 pip install torch_xla[tpu] -f https://storage.googleapis.com/libtpu-releases/index.html

GPU 및 야간 빌드는 공개 GCS 버킷에서 제공됩니다.

버전	클라우드 GPU VM 휠
2.5 (CUDA 12.1 + Python 3.9)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.5 (CUDA 12.1 + Python 3.10)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.5 (CUDA 12.1 + Python 3.11)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.5 (Cuda 12.4 + Python 3.9)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.5 (CUDA 12.4 + Python 3.10)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.5 (CUDA 12.4 + Python 3.11)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl`
야간 (Python 3.8)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev-cp38-cp38-linux_x86_64.whl`
야간 (Python 3.10)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl`
야간 (Cuda 12.1 + Python 3.8)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.6.0.dev-cp38-cp38-linux_x86_64.whl`

08/13/2024 전에 야간 빌드를 사용하십시오

`+yyyymmdd`를 추가하여 'torch_xla-nightly`를 추가하여 지정된 날짜의 야간 바퀴를 얻을 수 있습니다. 예는 다음과 같습니다.

 pip3 install torch==2.6.0.dev20240925+cpu --index-url https://download.pytorch.org/whl/nightly/cpu
pip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-nightly%2B20240925-cp310-cp310-linux_x86_64.whl

Torch Wheel 버전 2.6.0.dev20240925+cpu https://download.pytorch.org/whl/nightly/torch/에서 찾을 수 있습니다.

08/20/2024 이후 야간 빌드를 사용하십시오

torch_xla-2.6.0.dev 후에 yyyymmdd 추가하여 지정된 날짜의 야간 휠을 얻을 수도 있습니다. 예는 다음과 같습니다.

 pip3 install torch==2.5.0.dev20240820+cpu --index-url https://download.pytorch.org/whl/nightly/cpu
pip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.5.0.dev20240820-cp310-cp310-linux_x86_64.whl

Torch Wheel 버전 2.6.0.dev20240925+cpu https://download.pytorch.org/whl/nightly/torch/에서 찾을 수 있습니다.

오래된 버전

버전	클라우드 TPU VMS 휠
2.4 (Python 3.10)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.4.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.3 (Python 3.10)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.3.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.2 (Python 3.10)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.2.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.1 (XRT + Python 3.10)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/xrt/tpuvm/torch_xla-2.1.0%2Bxrt-cp310-cp310-manylinux_2_28_x86_64.whl`
2.1 (Python 3.8)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.1.0-cp38-cp38-linux_x86_64.whl`

버전	GPU 휠
2.5 (CUDA 12.1 + Python 3.9)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.5 (CUDA 12.1 + Python 3.10)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.5 (CUDA 12.1 + Python 3.11)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.5 (Cuda 12.4 + Python 3.9)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.5 (CUDA 12.4 + Python 3.10)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.5 (CUDA 12.4 + Python 3.11)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.4 (Cuda 12.1 + Python 3.9)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.4 (CUDA 12.1 + Python 3.10)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.4 (CUDA 12.1 + Python 3.11)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.3 (Cuda 12.1 + Python 3.8)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.3.0-cp38-cp38-manylinux_2_28_x86_64.whl`
2.3 (CUDA 12.1 + Python 3.10)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.3.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.3 (CUDA 12.1 + Python 3.11)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.3.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.2 (CUDA 12.1 + Python 3.8)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.2.0-cp38-cp38-manylinux_2_28_x86_64.whl`
2.2 (Cuda 12.1 + Python 3.10)	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.2.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.1 + CUDA 11.8	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/11.8/torch_xla-2.1.0-cp38-cp38-manylinux_2_28_x86_64.whl`
야간 + cuda 12.0> = 2023/06/27	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.0/torch_xla-nightly-cp38-cp38-linux_x86_64.whl`

도커

버전	클라우드 TPU VMS Docker
2.5	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_tpuvm`
2.4	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_tpuvm`
2.3	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_tpuvm`
2.2	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.2.0_3.10_tpuvm`
2.1	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_tpuvm`
야간 파이썬	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm`

위의 Dockers를 사용하려면 --privileged --net host --shm-size=16G 통과하십시오. 예는 다음과 같습니다.

docker run --privileged --net host --shm-size=16G -it us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm /bin/bash

버전	GPU CUDA 12.4 Docker
2.5	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_cuda_12.4`
2.4	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_cuda_12.4`

버전	GPU CUDA 12.1 Docker
2.5	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_cuda_12.1`
2.4	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_cuda_12.1`
2.3	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_cuda_12.1`
2.2	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.2.0_3.10_cuda_12.1`
2.1	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_cuda_12.1`
야간	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1`
밤에 밤	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1_YYYYMMDD`

버전	GPU CUDA 11.8 + Docker
2.1	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_cuda_11.8`
2.0	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.0_3.8_cuda_11.8`

GPU로 컴퓨팅 인스턴스에서 실행합니다.

문제 해결

Pytorch/XLA가 예상대로 수행하지 않으면 네트워크 디버깅 및 최적화에 대한 제안이있는 문제 해결 안내서를 참조하십시오.

피드백 제공

Pytorch/XLA 팀은 항상 사용자와 OSS 기고자의 의견을 기쁘게 생각합니다! 연락하는 가장 좋은 방법은이 github에 문제를 제기하는 것입니다. 질문, 버그 보고서, 기능 요청, 빌드 문제 등이 모두 환영합니다!

기여

기여 가이드를 참조하십시오.

부인 성명

이 저장소는 Google, Meta 및 기고자 파일에 나열된 여러 개별 기고자에 의해 공동으로 작동 및 유지됩니다. Meta에 대한 질문은 [email protected]으로 이메일을 보내주십시오. Google에 대한 질문은 [email protected]으로 이메일을 보내주십시오. 다른 모든 질문에 대해서는이 저장소에서 문제를 여기에서 열어주십시오.