sarathi serveダウンロード - sarathi serveソースコードのダウンロード

sarathi serve

その他のソースコード

1.0.0

ダウンロード

サラティサーブ

Sarathi-Serve は、高スループットかつ低遅延の LLM サービスフレームワークです。詳細については、OSDI'24 の論文を参照してください。

設定

CUDAのセットアップ

Sarathi-Serve は、H100 および A100 GPU 上の CUDA 12.3 でテストされています。

リポジトリのクローンを作成する

git clone [email protected]:microsoft/sarathi-serve.git

マンバ環境を作成する

まだお持ちでない場合は、mamba をセットアップします。

wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-Linux-x86_64.sh # follow the instructions from there

Python 3.10環境を作成し、

mamba create -p ./env python=3.10

Sarathi-Serve をインストールする

pip install -e . --extra-index-url https://flashinfer.ai/whl/cu121/torch2.3/

結果の再現

osdi-experimentsの各図に対応する個別のフォルダーにある readme を参照してください。

引用

私たちの著作物を使用する場合は、私たちの論文を引用することを検討してください。

 @article{agrawal2024taming,
  title={Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve},
  author={Agrawal, Amey and Kedia, Nitin and Panwar, Ashish and Mohan, Jayashree and Kwatra, Nipun and Gulavani, Bhargav S and Tumanov, Alexey and Ramjee, Ramachandran},
  journal={Proceedings of 18th USENIX Symposium on Operating Systems Design and Implementation, 2024, Santa Clara},
  year={2024}
}