与 Friendli 一起增强生成式人工智能服务
Friendli Client 提供了方便的界面,可与 Friendli Suite 提供的端点服务进行交互,Friendli Suite 是服务生成 AI 模型的终极解决方案。它专为灵活性和性能而设计,支持同步和异步操作,让您可以轻松地将强大的 AI 功能集成到您的应用程序中。
要开始使用 Friendli,请使用pip
安装客户端包:
pip install friendli-client
重要的
在使用client = Friendli()
初始化客户端实例之前,您必须设置FRIENDLI_TOKEN
环境变量。或者,您可以在创建客户端时提供个人访问令牌的值作为token
参数,如下所示:
from friendli import Friendli
client = Friendli ( token = "YOUR PERSONAL ACCESS TOKEN" )
Friendli Serverless Endpoint 提供简单的点击即用界面,用于访问 Llama 3.1 等流行的开源模型。通过按代币付费计费,这是探索和实验的理想选择。
要与无服务器端点托管的模型进行交互,请提供要在model
参数中使用的模型代码。请参阅定价表,了解可用型号代码及其定价的列表。
from friendli import Friendli
client = Friendli ()
chat_completion = client . chat . completions . create (
model = "meta-llama-3.1-8b-instruct" ,
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
)
print ( chat_completion . choices [ 0 ]. message . content )
Friendli 专用端点使您能够在专用 GPU 资源上运行自定义生成 AI 模型。
要与专用端点交互,请在model
参数中提供端点 ID。
import os
from friendli import Friendli
client = Friendli (
team_id = os . environ [ "TEAM_ID" ], # If not provided, default team is used.
use_dedicated_endpoint = True ,
)
chat_completion = client . chat . completions . create (
model = os . environ [ "ENDPOINT_ID" ],
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
)
print ( chat_completion . choices [ 0 ]. message . content )
Friendli Container 非常适合那些喜欢在自己的基础设施中为 LLM 提供服务的用户。通过在本地或云 GPU 上的容器中部署 Friendli 引擎,您可以保持对数据和操作的完全控制,确保安全性并遵守内部策略。
from friendli import Friendli
client = Friendli ( base_url = "http://0.0.0.0:8000" )
chat_completion = client . chat . completions . create (
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
)
print ( chat_completion . choices [ 0 ]. message . content )
import asyncio
from friendli import AsyncFriendli
client = AsyncFriendli ()
async def main () -> None :
chat_completion = await client . chat . completions . create (
model = "meta-llama-3.1-8b-instruct" ,
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
)
print ( chat_completion . choices [ 0 ]. message . content )
asyncio . run ( main ())
from friendli import Friendli
client = Friendli ()
stream = client . chat . completions . create (
model = "meta-llama-3.1-8b-instruct" ,
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
stream = True ,
)
for chunk in stream :
print ( chunk . choices [ 0 ]. delta . content or "" , end = "" , flush = True )
异步客户端 ( AsyncFriendli
) 使用相同的接口来传输响应。
import asyncio
from friendli import AsyncFriendli
client = AsyncFriendli ()
async def main () -> None :
stream = await client . chat . completions . create (
model = "meta-llama-3.1-8b-instruct" ,
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
stream = True ,
)
async for chunk in stream :
print ( chunk . choices [ 0 ]. delta . content or "" , end = "" , flush = True )
asyncio . run ( main ())
如果您的终端节点服务于多 LoRA 模型,您可以通过在model
参数中提供适配器路由来向其中一个适配器发送请求。
对于 Friendli 专用端点,请提供端点 ID 和适配器路由,并以冒号:
:) 分隔。
import os
from friendli import Friendli
client = Friendli (
team_id = os . environ [ "TEAM_ID" ], # If not provided, default team is used.
use_dedicated_endpoint = True ,
)
chat_completion = client . lora . completions . create (
model = f" { os . environ [ 'ENDPOINT_ID' ] } : { os . environ [ 'ADAPTER_ROUTE' ] } " ,
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
)
对于Friendli Container,只需提供适配器名称。
import os
from friendli import Friendli
client = Friendli ( base_url = "http://0.0.0.0:8000" )
chat_completion = client . lora . completions . create (
model = os . environ [ "ADAPTER_NAME" ],
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
)
重要的
gRPC 仅 Friendli Container 支持,且仅提供v1/completions
的流式 API。
当 Frienldi Container 在 gPRC 模式下运行时,客户端可以通过使用use_grpc=True
参数初始化 gRPC 服务器来与 gRPC 服务器交互。
from friendli import Friendli
client = Friendli ( base_url = "0.0.0.0:8000" , use_grpc = True )
stream = client . chat . completions . create (
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
stream = True , # Only streaming mode is available
)
for chunk in stream :
print ( chunk . choices [ 0 ]. delta . content or "" , end = "" , flush = True )
客户端使用httpx
发送HTTP请求。您可以在初始化Friendli
时提供自定义的httpx.Client
。
import httpx
from friendli import Friendli
with httpx . Client () as client :
client = Friendli ( http_client = http_client )
对于异步客户端,您可以提供httpx.AsyncClient
。
import httx
from friendli import AsyncFriendli
with httpx . AsyncClient () as client :
client = AsyncFriendli ( http_client = http_client )
import grpc
from friendli import Friendli
with grpc . insecure_channel ( "0.0.0.0:8000" ) as channel :
client = Friendli ( use_grpc = True , grpc_channel = channel )
您可以为异步客户端使用相同的接口。
import grpc . aio
from friendli import AsyncFriendli
async with grpc . aio . insecure_channel ( "0.0.0.0:8000" ) as channel :
client = AsyncFriendli ( use_grpc = True , grpc_channel = channel )
Friendli客户端提供了多种方法来管理和释放资源。
Friendli
和AsyncFriendli
客户端都可以在其生命周期内保留网络连接或其他资源。为了确保正确释放这些资源,您应该调用close()
方法或在上下文管理器中使用客户端。
from friendli import Friendli
client = Friendli ()
# Use the client for various operations...
# When done, close the client to release resources
client . close ()
对于异步客户端,模式类似:
import asyncio
from friendli import AsyncFriendli
client = AsyncFriendli ()
# Use the client for various async operations...
# When done, close the client to release resources
await client . close ()
您还可以使用上下文管理器在退出块时自动关闭客户端并释放资源,使其成为更安全、更方便的资源管理方式。
from friendli import Friendli
with Friendli () as client :
...
对于异步使用:
import asyncio
from friendli import AsyncFriendli
async def main ():
async with AsyncFriendli () as client :
...
asyncio . run ( main ())
使用流式响应时,交互完成后正确关闭 HTTP 连接至关重要。默认情况下,一旦流中的所有数据都被消耗完(即,当 for 循环到达末尾时),连接就会自动关闭。但是,如果流式传输因异常或其他问题而中断,连接可能会保持打开状态,并且在被垃圾收集之前不会被释放。为了确保正确释放所有底层连接和资源,显式关闭连接非常重要,尤其是在流过早终止时。
from friendli import Friendli
client = Friendli ()
stream = client . chat . completions . create (
model = "meta-llama-3.1-8b-instruct" ,
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
stream = True ,
)
try :
for chunk in stream :
print ( chunk . choices [ 0 ]. delta . content or "" , end = "" , flush = True )
finally :
stream . close () # Ensure the stream is closed after use
对于异步流:
import asyncio
from friendli import AsyncFriendli
client = AsyncFriendli ()
async def main ():
stream = await client . chat . completions . create (
model = "meta-llama-3.1-8b-instruct" ,
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
stream = True ,
)
try :
async for chunk in stream :
print ( chunk . choices [ 0 ]. delta . content or "" , end = "" , flush = True )
finally :
await stream . close () # Ensure the stream is closed after use
asyncio . run ( main ())
您还可以使用上下文管理器在退出块时自动关闭客户端并释放资源,使其成为更安全、更方便的资源管理方式。
from friendli import Friendli
client = Friendli ()
with client . chat . completions . create (
model = "meta-llama-3.1-8b-instruct" ,
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
stream = True ,
) as stream :
for chunk in stream :
print ( chunk . choices [ 0 ]. delta . content or "" , end = "" , flush = True )
对于异步流:
import asyncio
from friendli import AsyncFriendli
client = AsyncFriendli ()
async def main ():
async with await client . chat . completions . create (
model = "meta-llama-3.1-8b-instruct" ,
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
stream = True ,
) as stream :
async for chunk in stream :
print ( chunk . choices [ 0 ]. delta . content or "" , end = "" , flush = True )
asyncio . run ( main ())
将 gRPC 接口与流式传输结合使用时,您可能希望在正在进行的流操作完成之前取消该操作。如果您由于超时或某些其他情况需要停止流,这尤其有用。
对于同步 gRPC 流:
from friendli import Friendli
client = Friendli ( base_url = "0.0.0.0:8000" , use_grpc = True )
stream = client . chat . completions . create (
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
stream = True ,
)
try :
for chunk in stream :
print ( chunk . choices [ 0 ]. delta . content or "" , end = "" , flush = True )
except SomeException :
stream . cancel () # Cancel the stream in case of an error or interruption
对于异步 gRPC 流:
import asyncio
from friendli import AsyncFriendli
client = AsyncFriendli ( base_url = "0.0.0.0:8000" , use_grpc = True )
async def main ():
stream = await client . chat . completions . create (
messages = [
{
"role" : "user" ,
"content" : "Tell me how to make a delicious pancake" ,
}
],
stream = True ,
)
try :
async for chunk in stream :
print ( chunk . choices [ 0 ]. delta . content or "" , end = "" , flush = True )
except SomeException :
stream . cancel () # Cancel the stream in case of an error or interruption
asyncio . run ( main ())
您还可以直接使用 CLI 调用生成 API。
friendli api chat-completions create
-g " user Tell me how to make a delicious pancake "
-m meta-llama-3.1-8b-instruct
有关friendli
命令的更多信息,请在终端shell中运行friendli --help
。这将为您提供可用选项和使用说明的详细列表。
提示
查看我们的官方文档以了解更多信息!