ดาวน์โหลด ai on gke - ai on gke ดาวน์โหลดซอร์สโค้ด

ai on gke

ซอร์สโค้ดอื่น ๆ

v1.7

ดาวน์โหลด

AI บนสินทรัพย์ GKE

พื้นที่เก็บข้อมูลนี้มีเนื้อหาที่เกี่ยวข้องกับภาระงาน AI/ML ใน Google Kubernetes Engine (GKE)

ภาพรวม

เรียกใช้ปริมาณงาน AI/ML ที่ปรับให้เหมาะสมด้วยความสามารถในการประสานแพลตฟอร์ม Google Kubernetes Engine (GKE) แพลตฟอร์ม AI/ML ที่แข็งแกร่งจะพิจารณาเลเยอร์ต่อไปนี้:

การจัดโครงสร้างพื้นฐานที่รองรับ GPU และ TPU สำหรับการฝึกอบรมและการให้บริการปริมาณงานในวงกว้าง
การบูรณาการที่ยืดหยุ่นกับการประมวลผลแบบกระจายและเฟรมเวิร์กการประมวลผลข้อมูล
รองรับหลายทีมบนโครงสร้างพื้นฐานเดียวกันเพื่อเพิ่มการใช้ทรัพยากรให้เกิดประโยชน์สูงสุด

โครงสร้างพื้นฐาน

โมดูลแอปพลิเคชัน AI-on-GKE จะถือว่าคุณมีคลัสเตอร์ GKE ที่ใช้งานได้อยู่แล้ว หากไม่เป็นเช่นนั้น ให้ทำตามคำแนะนำภายใต้ Infrastructure/README.md เพื่อติดตั้งคลัสเตอร์ Standard หรือ Autopilot GKE

 .
├── LICENSE
├── README.md
├── infrastructure
│   ├── README.md
│   ├── backend.tf
│   ├── main.tf
│   ├── outputs.tf
│   ├── platform.tfvars
│   ├── variables.tf
│   └── versions.tf
├── modules
│   ├── gke-autopilot-private-cluster
│   ├── gke-autopilot-public-cluster
│   ├── gke-standard-private-cluster
│   ├── gke-standard-public-cluster
│   ├── jupyter
│   ├── jupyter_iap
│   ├── jupyter_service_accounts
│   ├── kuberay-cluster
│   ├── kuberay-logging
│   ├── kuberay-monitoring
│   ├── kuberay-operator
│   └── kuberay-serviceaccounts
└── tutorial.md

หากต้องการทำให้คลัสเตอร์ GKE ใหม่ใช้งานได้ ให้อัปเดตไฟล์ platform.tfvars ด้วยค่าที่เหมาะสม จากนั้นดำเนินการคำสั่ง Terraform ด้านล่าง

 terraform init
terraform apply -var-file platform.tfvars

การใช้งาน

โครงสร้าง repo มีลักษณะดังนี้:

 .
├── LICENSE
├── Makefile
├── README.md
├── applications
│   ├── jupyter
│   └── ray
├── contributing.md
├── dcgm-on-gke
│   ├── grafana
│   └── quickstart
├── gke-a100-jax
│   ├── Dockerfile
│   ├── README.md
│   ├── build_push_container.sh
│   ├── kubernetes
│   └── train.py
├── gke-batch-refarch
│   ├── 01_gke
│   ├── 02_platform
│   ├── 03_low_priority
│   ├── 04_high_priority
│   ├── 05_compact_placement
│   ├── 06_jobset
│   ├── Dockerfile
│   ├── README.md
│   ├── cloudbuild-create.yaml
│   ├── cloudbuild-destroy.yaml
│   ├── create-platform.sh
│   ├── destroy-platform.sh
│   └── images
├── gke-disk-image-builder
│   ├── README.md
│   ├── cli
│   ├── go.mod
│   ├── go.sum
│   ├── imager.go
│   └── script
├── gke-dws-examples
│   ├── README.md
│   ├── dws-queues.yaml
│   ├── job.yaml
│   └── kueue-manifests.yaml
├── gke-online-serving-single-gpu
│   ├── README.md
│   └── src
├── gke-tpu-examples
│   ├── single-host-inference
│   └── training
├── indexed-job
│   ├── Dockerfile
│   ├── README.md
│   └── mnist.py
├── jobset
│   └── pytorch
├── modules
│   ├── gke-autopilot-private-cluster
│   ├── gke-autopilot-public-cluster
│   ├── gke-standard-private-cluster
│   ├── gke-standard-public-cluster
│   ├── jupyter
│   ├── jupyter_iap
│   ├── jupyter_service_accounts
│   ├── kuberay-cluster
│   ├── kuberay-logging
│   ├── kuberay-monitoring
│   ├── kuberay-operator
│   └── kuberay-serviceaccounts
├── saxml-on-gke
│   ├── httpserver
│   └── single-host-inference
├── training-single-gpu
│   ├── README.md
│   ├── data
│   └── src
├── tutorial.md
└── tutorials
    ├── e2e-genai-langchain-app
    ├── finetuning-llama-7b-on-l4
    └── serving-llama2-70b-on-l4-gpus

จูปิเตอร์ ฮับ

พื้นที่เก็บข้อมูลนี้มีเทมเพลต Terraform สำหรับการเรียกใช้ JupyterHub บน Google Kubernetes Engine นอกจากนี้ เรายังรวมสมุดบันทึกตัวอย่างบางส่วนไว้ด้วย (ภายใต้ applications/ray/example_notebooks ) รวมถึงโน้ตบุ๊กที่รองรับรุ่น GPT-J-6B ที่มี Ray AIR (ดูที่นี่สำหรับสมุดบันทึกต้นฉบับ) หากต้องการเรียกใช้ ให้ทำตามคำแนะนำที่ applications/ray/README.md เพื่อติดตั้งคลัสเตอร์ Ray

โมดูล jupyter นี้ปรับใช้ทรัพยากรต่อไปนี้ หนึ่งครั้งต่อผู้ใช้: