petals下载 - petals源代码下载

petals

Ai源码

v2.2.0:

下载

在家运行 BitTorrent 风格的大型语言模型。
微调和推理速度比卸载快 10 倍

使用分布式Llama 3.1 （高达 405B）、 Mixtral (8x22B)、 Falcon (40B+) 或BLOOM (176B) 生成文本，并根据您自己的任务对它们进行微调 - 直接从您的台式计算机或 Google Colab：

 from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM

# Choose any model available at https://health.petals.dev
model_name = "meta-llama/Meta-Llama-3.1-405B-Instruct"

# Connect to a distributed network hosting model layers
tokenizer = AutoTokenizer . from_pretrained ( model_name )
model = AutoDistributedModelForCausalLM . from_pretrained ( model_name )

# Run the model as if it were on your computer
inputs = tokenizer ( "A cat sat" , return_tensors = "pt" )[ "input_ids" ]
outputs = model . generate ( inputs , max_new_tokens = 5 )
print ( tokenizer . decode ( outputs [ 0 ]))  # A cat sat on a mat...

立即在 Colab 中尝试

？想运行骆驼吗？请求访问其权重，然后在加载模型之前在终端中运行huggingface-cli login 。或者只是在我们的聊天机器人应用程序中尝试一下。

？隐私。您的数据将在公众群体中其他人的帮助下进行处理。在此了解有关隐私的更多信息。对于敏感数据，您可以在您信任的人之间建立一个私人群。

有疑问吗？请在我们的 Discord 中联系我们！

连接您的 GPU 并增加 Petals 容量

Petals 是一个社区运行的系统——我们依赖人们共享他们的 GPU。您可以帮助提供可用模型之一或从托管新模型？模型中心！

作为示例，以下是如何在 GPU 上托管 Llama 3.1 (405B) Instruct 的一部分：

？想接待骆驼吗？请求访问其权重，然后在加载模型之前在终端中运行huggingface-cli login 。

？ Linux + 蟒蛇。针对 NVIDIA GPU 运行以下命令（或者针对 AMD 执行以下命令）：

conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
pip install git+https://github.com/bigscience-workshop/petals
python -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct

？ Windows + WSL。请按照我们的 Wiki 上的指南进行操作。

？码头工人。为 NVIDIA GPU 运行我们的 Docker 镜像（或者针对 AMD 执行此操作）：

sudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm 
    learningathome/petals:main 
    python -m petals.cli.run_server --port 31330 meta-llama/Meta-Llama-3.1-405B-Instruct

？ macOS + Apple M1/M2 GPU。安装 Homebrew，然后运行以下命令：

brew install python
python3 -m pip install git+https://github.com/bigscience-workshop/petals
python3 -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct

了解更多信息（如何使用多个 GPU、如何在启动时启动服务器等）

安全。托管服务器不允许其他人在您的计算机上运行自定义代码。在这里了解更多。

有疑问吗？请在我们的 Discord 中联系我们！

？谢谢你！一旦您加载并托管了 10 多个区块，我们就可以在群监视器上显示您的名字或链接，以表达谢意。您可以使用--public_name YOUR_NAME指定它们。

它是如何运作的？

您加载模型的一小部分，然后加入服务其他部分的人员网络。 Llama 2 (70B) 的单批次推理速度高达6 个令牌/秒， Falcon (180B) 的单批次推理速度高达4 个令牌/秒，足以满足聊天机器人和交互式应用程序的需要。
您可以采用任何微调和采样方法，通过模型执行自定义路径，或查看其隐藏状态。您可以享受 API 的便利以及PyTorch的灵活性和?变形金刚。

阅读论文查看常见问题解答

教程、示例等

基础教程：

入门：教程
Prompt-tune Llama-65B 用于文本语义分类：教程
提示调整 BLOOM 创建拟人化聊天机器人：教程

有用的工具：

聊天机器人 Web 应用程序（通过 HTTP/WebSocket 端点连接到 Petals）：源代码
公共群监控：源代码

高级指南：

启动私人集群：指南
运行自定义模型：指南

基准测试

请参阅我们论文的第 3.3 节。

贡献

请参阅我们有关贡献的常见问题解答。

引文

亚历山大·博尔祖诺夫、德米特里·巴兰丘克、蒂姆·德特默斯、马克斯·里亚宾宁、尤尼斯·贝尔卡达、阿乔姆·丘马琴科、帕维尔·萨米金和科林·拉斐尔。花瓣：大型模型的协作推理和微调。计算语言学协会第 61 届年会论文集（第 3 卷：系统演示）。 2023 年。

 @inproceedings { borzunov2023petals ,
  title = { Petals: Collaborative Inference and Fine-tuning of Large Models } ,
  author = { Borzunov, Alexander and Baranchuk, Dmitry and Dettmers, Tim and Riabinin, Maksim and Belkada, Younes and Chumachenko, Artem and Samygin, Pavel and Raffel, Colin } ,
  booktitle = { Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) } ,
  pages = { 558--568 } ,
  year = { 2023 } ,
  url = { https://arxiv.org/abs/2209.01188 }
}

亚历山大·博尔祖诺夫、马克斯·里亚宾宁、阿乔姆·楚马琴科、德米特里·巴兰丘克、蒂姆·戴特默斯、尤尼斯·贝尔卡达、帕维尔·萨米金和科林·拉斐尔。通过互联网对大型语言模型进行分布式推理和微调。神经信息处理系统的进展36 (2023)。

 @inproceedings { borzunov2023distributed ,
  title = { Distributed inference and fine-tuning of large language models over the {I}nternet } ,
  author = { Borzunov, Alexander and Ryabinin, Max and Chumachenko, Artem and Baranchuk, Dmitry and Dettmers, Tim and Belkada, Younes and Samygin, Pavel and Raffel, Colin } ,
  booktitle = { Advances in Neural Information Processing Systems } ,
  volume = { 36 } ,
  pages = { 12312--12331 } ,
  year = { 2023 } ,
  url = { https://arxiv.org/abs/2312.08361 }
}