sarathi serve下载 - sarathi serve源代码下载

sarathi serve

其他源码

1.0.0

下载

萨拉蒂服务

Sarathi-Serve 是一个高吞吐量、低延迟的 LLM 服务框架。请参阅我们的 OSDI'24 论文了解更多详细信息。

设置

设置 CUDA

Sarathi-Serve 已在 H100 和 A100 GPU 上使用 CUDA 12.3 进行了测试。

克隆存储库

git clone [email protected]:microsoft/sarathi-serve.git

创建曼巴环境

如果您还没有安装 mamba，

wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-Linux-x86_64.sh # follow the instructions from there

创建Python 3.10环境，

mamba create -p ./env python=3.10

安装 Sarathi-Serve

pip install -e . --extra-index-url https://flashinfer.ai/whl/cu121/torch2.3/

再现结果

请参阅osdi-experiments中每个图对应的各个文件夹中的自述文件。

引文

如果您使用我们的工作，请考虑引用我们的论文：

 @article{agrawal2024taming,
  title={Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve},
  author={Agrawal, Amey and Kedia, Nitin and Panwar, Ashish and Mohan, Jayashree and Kwatra, Nipun and Gulavani, Bhargav S and Tumanov, Alexey and Ramjee, Ramachandran},
  journal={Proceedings of 18th USENIX Symposium on Operating Systems Design and Implementation, 2024, Santa Clara},
  year={2024}
}