xla下載 - xla源代碼下載

xla

其他源碼

1.0.0

下載

Pytorch/XLA

當前的CI狀態：

Pytorch/XLA是一個Python軟件包，它使用XLA深度學習編譯器連接Pytorch深度學習框架和雲TPU。您可以立即在帶有Kaggle的單個雲TPU VM上免費嘗試！

看一下我們的Kaggle筆記本之一開始：

使用Pytorch/XLA 2.0穩定擴散
分佈式Pytorch/XLA基礎知識

安裝

TPU

要在新的TPU VM中安裝Pytorch/XLA穩定構建：

 pip install torch~=2.5.0 torch_xla[tpu]~=2.5.0 -f https://storage.googleapis.com/libtpu-releases/index.html

在新的TPU VM中安裝Pytorch/XLA每晚構建：

 pip3 install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
pip install 'torch_xla[tpu] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl' -f https://storage.googleapis.com/libtpu-releases/index.html

GPU插件

Pytorch/XLA現在通過類似於libtpu的插件包提供GPU支持：

 pip install torch~=2.5.0 torch_xla~=2.5.0 https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla_cuda_plugin-2.5.0-py3-none-any.whl

入門

要更新您現有的培訓循環，請進行以下更改：

 - import torch.multiprocessing as mp
+ import torch_xla as xla
+ import torch_xla.core.xla_model as xm

 def _mp_fn(index):
   ...

+  # Move the model paramters to your XLA device
+  model.to(xla.device())

   for inputs, labels in train_loader:
+    with xla.step():
+      # Transfer data to the XLA device. This happens asynchronously.
+      inputs, labels = inputs.to(xla.device()), labels.to(xla.device())
       optimizer.zero_grad()
       outputs = model(inputs)
       loss = loss_fn(outputs, labels)
       loss.backward()
-      optimizer.step()
+      # `xm.optimizer_step` combines gradients across replicas
+      xm.optimizer_step(optimizer)

 if __name__ == '__main__':
-  mp.spawn(_mp_fn, args=(), nprocs=world_size)
+  # xla.launch automatically selects the correct world size
+  xla.launch(_mp_fn, args=())

如果您使用的是DistributedDataParallel ，請進行以下更改：

 import torch.distributed as dist
- import torch.multiprocessing as mp
+ import torch_xla as xla
+ import torch_xla.distributed.xla_backend

 def _mp_fn(rank):
   ...

-  os.environ['MASTER_ADDR'] = 'localhost'
-  os.environ['MASTER_PORT'] = '12355'
-  dist.init_process_group("gloo", rank=rank, world_size=world_size)
+  # Rank and world size are inferred from the XLA device runtime
+  dist.init_process_group("xla", init_method='xla://')
+
+  model.to(xm.xla_device())
+  # `gradient_as_bucket_view=True` required for XLA
+  ddp_model = DDP(model, gradient_as_bucket_view=True)

-  model = model.to(rank)
-  ddp_model = DDP(model, device_ids=[rank])

   for inputs, labels in train_loader:
+    with xla.step():
+      inputs, labels = inputs.to(xla.device()), labels.to(xla.device())
       optimizer.zero_grad()
       outputs = ddp_model(inputs)
       loss = loss_fn(outputs, labels)
       loss.backward()
       optimizer.step()

 if __name__ == '__main__':
-  mp.spawn(_mp_fn, args=(), nprocs=world_size)
+  xla.launch(_mp_fn, args=())

有關Pytorch/XLA的其他信息，包括其語義和功能的描述，可在pytorch.org上獲得。在編寫XLA設備（TPU，CUDA，CPU和...）上運行的網絡時，請參見“最佳實踐指南”。

我們的全面用戶指南可用：

Pytorch/XLA教程

雲TPU VM QuickStart
雲TPU POD SLICE快速啟動
在TPU VM上進行分析
GPU指南

可用的碼頭圖像和車輪

Python包

Pytorch/XLA從版本R2.1開始的版本將在PYPI上提供。現在，您可以使用pip install torch_xla安裝主構建。要安裝與您tpu安裝的torch_xla相對應的Cloud TPU插件

 pip install torch_xla[tpu] -f https://storage.googleapis.com/libtpu-releases/index.html

我們的公共GCS存儲桶中有GPU和夜間構建。

版本	雲GPU VM車輪
2.5（Cuda 12.1 + Python 3.9）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.1 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.1 + Python 3.11）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.4 + Python 3.9）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.4 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.4 + Python 3.11）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl`
每晚（Python 3.8）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev-cp38-cp38-linux_x86_64.whl`
每晚（Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl`
每晚（CUDA 12.1 + Python 3.8）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.6.0.dev-cp38-cp38-linux_x86_64.whl`

在08/13/2024之前使用夜間構建

您也可以在“ Torch_xla-nightly”之後添加“+Yyyymmdd”，以獲取指定日期的夜間輪子。這是一個示例：

 pip3 install torch==2.6.0.dev20240925+cpu --index-url https://download.pytorch.org/whl/nightly/cpu
pip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-nightly%2B20240925-cp310-cp310-linux_x86_64.whl

可以在https://download.pytorch.org/whl/nightly/torch/上找到火炬輪版2.6.0.dev20240925+cpu 。

在08/20/2024之後使用夜間構建

您也可以在torch_xla-2.6.0.dev之後添加yyyymmdd以獲取指定日期的夜間輪。這是一個示例：

 pip3 install torch==2.5.0.dev20240820+cpu --index-url https://download.pytorch.org/whl/nightly/cpu
pip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.5.0.dev20240820-cp310-cp310-linux_x86_64.whl

可以在https://download.pytorch.org/whl/nightly/torch/上找到火炬輪版2.6.0.dev20240925+cpu 。

較舊的版本

版本	雲TPU VMS輪
2.4（Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.4.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.3（Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.3.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.2（Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.2.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.1（XRT + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/xrt/tpuvm/torch_xla-2.1.0%2Bxrt-cp310-cp310-manylinux_2_28_x86_64.whl`
2.1（Python 3.8）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.1.0-cp38-cp38-linux_x86_64.whl`

版本	GPU輪
2.5（Cuda 12.1 + Python 3.9）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.1 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.1 + Python 3.11）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.4 + Python 3.9）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.4 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.4 + Python 3.11）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.4（Cuda 12.1 + Python 3.9）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.4（Cuda 12.1 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.4（Cuda 12.1 + Python 3.11）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.3（Cuda 12.1 + Python 3.8）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.3.0-cp38-cp38-manylinux_2_28_x86_64.whl`
2.3（Cuda 12.1 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.3.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.3（Cuda 12.1 + Python 3.11）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.3.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.2（Cuda 12.1 + Python 3.8）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.2.0-cp38-cp38-manylinux_2_28_x86_64.whl`
2.2（Cuda 12.1 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.2.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.1 + CUDA 11.8	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/11.8/torch_xla-2.1.0-cp38-cp38-manylinux_2_28_x86_64.whl`
夜間 + cuda 12.0> = 2023/06/27	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.0/torch_xla-nightly-cp38-cp38-linux_x86_64.whl`

Docker

版本	雲TPU VMS碼頭
2.5	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_tpuvm`
2.4	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_tpuvm`
2.3	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_tpuvm`
2.2	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.2.0_3.10_tpuvm`
2.1	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_tpuvm`
夜間python	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm`

要使用上述碼頭機，請通過--privileged --net host --shm-size=16G 。這是一個示例：

docker run --privileged --net host --shm-size=16G -it us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm /bin/bash

版本	GPU CUDA 12.4 Docker
2.5	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_cuda_12.4`
2.4	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_cuda_12.4`

版本	GPU CUDA 12.1 DOCKER
2.5	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_cuda_12.1`
2.4	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_cuda_12.1`
2.3	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_cuda_12.1`
2.2	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.2.0_3.10_cuda_12.1`
2.1	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_cuda_12.1`
每晚	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1`
每晚約會	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1_YYYYMMDD`

版本	GPU CUDA 11.8 + DOCKER
2.1	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_cuda_11.8`
2.0	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.0_3.8_cuda_11.8`

使用GPU在計算實例上運行。

故障排除

如果Pytorch/XLA未按預期執行，請參見“故障排除指南”，該指南具有調試和優化網絡的建議。

提供反饋

Pytorch/XLA團隊總是很高興收到用戶和OSS貢獻者的來信！最好的方法是在此Github上提出問題。歡迎問題，錯誤報告，功能請求，構建問題等！

貢獻

請參閱貢獻指南。

免責聲明

該存儲庫由Google，Meta以及貢獻者文件中列出的許多個人貢獻者共同維護和維護。有關針對Meta的問題，請發送電子郵件至[email protected]。有關針對Google的問題，請發送電子郵件至[email protected]。對於所有其他問題，請在此處在此存儲庫中打開一個問題。

其他讀數

您可以在

雲TPU VM上的性能調試
懶張量介紹
使用Pytorch / XLA和Cloud TPU VM縮放深度學習工作負載
使用FSDP在雲TPU上縮放Pytorch模型

xla

Pytorch/XLA

安裝

TPU

GPU插件

入門

Pytorch/XLA教程

可用的碼頭圖像和車輪

Python包

在08/20/2024之後使用夜間構建

Docker

故障排除

提供反饋

貢獻

免責聲明

其他讀數

相關項目

waymo open dataset

SmartTube

Sunamu

MySchedule.py

chat.petals.dev

viptools for eslam

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

SmartTube

Sunamu

waymo open dataset

wp functions

termwind