batch inference下载 - batch inference源码下载

batch inference

其他源码

1.0.0

下载

批量推理工具包

Batch Inference Toolkit(batch-inference) 是一个 Python 包，它动态批处理来自多个请求的模型输入张量、执行模型、取消批处理输出张量，然后将它们分别返回到每个请求。由于更好的计算并行性和更好的缓存局部性，这将提高系统吞吐量。整个过程对开发者来说是透明的。

何时使用

当您想要在云服务器上托管深度学习模型推理时，尤其是在 GPU 上

为什么要使用

它可以将您的服务器吞吐量提高数倍

批量推理的优点

平台无关的轻量级Python库
使用内置批处理算法只需更改几行代码即可加入
灵活的 API，支持自定义批处理算法和输入类型
支持多进程远程模式，避免python GIL瓶颈
流行模型的教程和基准测试：

模型	与基线相比的吞吐量	链接
伯特嵌入	4.7倍	教程
GPT 完成	16x	教程

安装

从点安装

python -m pip install batch-inference --upgrade

从源代码构建和安装（针对开发人员）

git clone https://github.com/microsoft/batch-inference.git
python -m pip install -e .[docs,testing]

# if you want to format the code before commit
pip install pre-commit
pre-commit install

# run unittests
python -m unittest discover tests

例子

让我们从一个玩具模型开始来学习 API。首先，您需要在模型类中定义predict_batch方法，然后将批处理装饰器添加到模型类中。

批处理装饰器添加了 host() 方法来创建ModelHost对象。 ModelHost的predict方法将单个查询作为输入，在调用predict_batch方法之前会将多个查询合并为一个批次。在返回结果之前，predict 方法还会拆分 Predict_batch 方法的输出。

 import numpy as np
from batch_inference import batching
from batch_inference . batcher . concat_batcher import ConcatBatcher

@ batching ( batcher = ConcatBatcher (), max_batch_size = 32 )
class MyModel :
    def __init__ ( self , k , n ):
        self . weights = np . random . randn ( k , n ). astype ( "f" )

    # shape of x: [batch_size, m, k]
    def predict_batch ( self , x ):
        y = np . matmul ( x , self . weights )
        return y

# initialize MyModel with k=3 and n=3
host = MyModel . host ( 3 , 3 )
host . start ()

# shape of x: [1, 3, 3]
def process_request ( x ):
    y = host . predict ( x )
    return y

host . stop ()