⚡ 在 Ruby 中构建 LLM 支持的应用程序 ⚡
有关 Rails 的深度集成,请参阅:langchainrb_rails gem。
可用于付费咨询活动!给我发电子邮件。
安装 gem 并通过执行以下命令将其添加到应用程序的 Gemfile 中:
bundle add langchainrb
如果没有使用bundler来管理依赖项,请通过执行以下命令安装gem:
gem install langchainrb
可能需要额外的宝石。默认情况下不包含它们,因此您可以仅包含您需要的内容。
require "langchain"
Langchain::LLM
模块提供了与各种大型语言模型 (LLM) 提供商交互的统一接口。这种抽象允许您轻松地在不同的 LLM 后端之间切换,而无需更改应用程序代码。
所有LLM类都继承自Langchain::LLM::Base
并为常用操作提供一致的接口:
大多数 LLM 课程可以使用 API 密钥和可选的默认选项进行初始化:
llm = Langchain :: LLM :: OpenAI . new (
api_key : ENV [ "OPENAI_API_KEY" ] ,
default_options : { temperature : 0.7 , chat_model : "gpt-4o" }
)
使用embed
方法为给定文本生成嵌入:
response = llm . embed ( text : "Hello, world!" )
embedding = response . embedding
embed()
接受的参数text
:(必需)要嵌入的输入文本。model
:(可选)要使用的模型名称或将使用默认嵌入模型。使用complete
方法为给定提示生成补全:
response = llm . complete ( prompt : "Once upon a time" )
completion = response . completion
complete()
接受的参数prompt
:(必填)输入完成提示。max_tokens
:(可选)要生成的最大令牌数。temperature
:(可选)控制生成的随机性。较高的值(例如,0.8)使输出更加随机,而较低的值(例如,0.2)使其更加确定。top_p
:(可选)温度的替代方案,控制生成令牌的多样性。n
:(可选)为每个提示生成的完成数。stop
:(可选)API 将停止生成更多令牌的序列。presence_penalty
:(可选)根据新标记目前在文本中的存在情况对其进行惩罚。frequency_penalty
:(可选)根据新标记在文本中出现的频率对其进行惩罚。使用chat
方法生成聊天完成结果:
messages = [
{ role : "system" , content : "You are a helpful assistant." } ,
{ role : "user" , content : "What's the weather like today?" }
# Google Gemini and Google VertexAI expect messages in a different format:
# { role: "user", parts: [{ text: "why is the sky blue?" }]}
]
response = llm . chat ( messages : messages )
chat_completion = response . chat_completion
chat()
接受的参数messages
:(必需)表示对话历史记录的消息对象数组。model
:(可选)要使用的特定聊天模型。temperature
:(可选)控制生成的随机性。top_p
:(可选)温度的替代方案,控制生成令牌的多样性。n
:(可选)要生成的聊天完成选项的数量。max_tokens
:(可选)聊天完成时生成的最大令牌数。stop
:(可选)API 将停止生成更多令牌的序列。presence_penalty
:(可选)根据新标记目前在文本中的存在情况对其进行惩罚。frequency_penalty
:(可选)根据新标记在文本中出现的频率对其进行惩罚。logit_bias
:(可选)修改指定标记出现在补全中的可能性。user
:(可选)代表您的最终用户的唯一标识符。tools
:(可选)模型可能调用的工具列表。tool_choice
:(可选)控制模型如何调用函数。 由于统一的接口,您可以通过更改实例化的类轻松地在不同的 LLM 提供商之间切换:
# Using Anthropic
anthropic_llm = Langchain :: LLM :: Anthropic . new ( api_key : ENV [ "ANTHROPIC_API_KEY" ] )
# Using Google Gemini
gemini_llm = Langchain :: LLM :: GoogleGemini . new ( api_key : ENV [ "GOOGLE_GEMINI_API_KEY" ] )
# Using OpenAI
openai_llm = Langchain :: LLM :: OpenAI . new ( api_key : ENV [ "OPENAI_API_KEY" ] )
每个 LLM 方法都会返回一个响应对象,该对象提供用于访问结果的一致接口:
embedding
:返回嵌入向量completion
:返回生成的文本补全chat_completion
: 返回生成的聊天完成tool_calls
:返回 LLM 进行的工具调用prompt_tokens
:返回提示中的标记数量completion_tokens
:返回完成中的令牌数量total_tokens
:返回使用的令牌总数笔记
虽然核心接口在各个提供商之间是一致的,但一些法学硕士可能会提供额外的功能或参数。请查阅每个法学硕士课程的文档,了解特定于提供商的功能和选项。
创建带有输入变量的提示:
prompt = Langchain :: Prompt :: PromptTemplate . new ( template : "Tell me a {adjective} joke about {content}." , input_variables : [ "adjective" , "content" ] )
prompt . format ( adjective : "funny" , content : "chickens" ) # "Tell me a funny joke about chickens."
仅使用提示而不使用 input_variables 创建 PromptTemplate:
prompt = Langchain :: Prompt :: PromptTemplate . from_template ( "Tell me a funny joke about chickens." )
prompt . input_variables # []
prompt . format # "Tell me a funny joke about chickens."
将提示模板保存到 JSON 文件:
prompt . save ( file_path : "spec/fixtures/prompt/prompt_template.json" )
使用 JSON 文件加载新的提示模板:
prompt = Langchain :: Prompt . load_from_path ( file_path : "spec/fixtures/prompt/prompt_template.json" )
prompt . input_variables # ["adjective", "content"]
使用一些镜头示例创建提示:
prompt = Langchain :: Prompt :: FewShotPromptTemplate . new (
prefix : "Write antonyms for the following words." ,
suffix : "Input: {adjective} n Output:" ,
example_prompt : Langchain :: Prompt :: PromptTemplate . new (
input_variables : [ "input" , "output" ] ,
template : "Input: {input} n Output: {output}"
) ,
examples : [
{ "input" : "happy" , "output" : "sad" } ,
{ "input" : "tall" , "output" : "short" }
] ,
input_variables : [ "adjective" ]
)
prompt . format ( adjective : "good" )
# Write antonyms for the following words.
#
# Input: happy
# Output: sad
#
# Input: tall
# Output: short
#
# Input: good
# Output:
将提示模板保存到 JSON 文件:
prompt . save ( file_path : "spec/fixtures/prompt/few_shot_prompt_template.json" )
使用 JSON 文件加载新的提示模板:
prompt = Langchain :: Prompt . load_from_path ( file_path : "spec/fixtures/prompt/few_shot_prompt_template.json" )
prompt . prefix # "Write antonyms for the following words."
使用 YAML 文件加载新的提示模板:
prompt = Langchain :: Prompt . load_from_path ( file_path : "spec/fixtures/prompt/prompt_template.yaml" )
prompt . input_variables #=> ["adjective", "content"]
将 LLM 文本响应解析为结构化输出,例如 JSON。
您可以使用StructuredOutputParser
生成提示,指示 LLM 提供符合特定 JSON 模式的 JSON 响应:
json_schema = {
type : "object" ,
properties : {
name : {
type : "string" ,
description : "Persons name"
} ,
age : {
type : "number" ,
description : "Persons age"
} ,
interests : {
type : "array" ,
items : {
type : "object" ,
properties : {
interest : {
type : "string" ,
description : "A topic of interest"
} ,
levelOfInterest : {
type : "number" ,
description : "A value between 0 and 100 of how interested the person is in this interest"
}
} ,
required : [ "interest" , "levelOfInterest" ] ,
additionalProperties : false
} ,
minItems : 1 ,
maxItems : 3 ,
description : "A list of the person's interests"
}
} ,
required : [ "name" , "age" , "interests" ] ,
additionalProperties : false
}
parser = Langchain :: OutputParsers :: StructuredOutputParser . from_json_schema ( json_schema )
prompt = Langchain :: Prompt :: PromptTemplate . new ( template : "Generate details of a fictional character. n {format_instructions} n Character description: {description}" , input_variables : [ "description" , "format_instructions" ] )
prompt_text = prompt . format ( description : "Korean chemistry student" , format_instructions : parser . get_format_instructions )
# Generate details of a fictional character.
# You must format your output as a JSON value that adheres to a given "JSON Schema" instance.
# ...
然后解析 llm 响应:
llm = Langchain :: LLM :: OpenAI . new ( api_key : ENV [ "OPENAI_API_KEY" ] )
llm_response = llm . chat ( messages : [ { role : "user" , content : prompt_text } ] ) . completion
parser . parse ( llm_response )
# {
# "name" => "Kim Ji-hyun",
# "age" => 22,
# "interests" => [
# {
# "interest" => "Organic Chemistry",
# "levelOfInterest" => 85
# },
# ...
# ]
# }
如果解析器无法解析 LLM 响应,您可以使用OutputFixingParser
。它向 LLM 发送一条错误消息、先前的输出和原始提示文本,要求“固定”响应:
begin
parser . parse ( llm_response )
rescue Langchain :: OutputParsers :: OutputParserException => e
fix_parser = Langchain :: OutputParsers :: OutputFixingParser . from_llm (
llm : llm ,
parser : parser
)
fix_parser . parse ( llm_response )
end
或者,如果您不需要处理OutputParserException
,您可以简化代码:
# we already have the `OutputFixingParser`:
# parser = Langchain::OutputParsers::StructuredOutputParser.from_json_schema(json_schema)
fix_parser = Langchain :: OutputParsers :: OutputFixingParser . from_llm (
llm : llm ,
parser : parser
)
fix_parser . parse ( llm_response )
具体示例请参见此处
RAG 是一种帮助法学硕士生成准确和最新信息的方法。典型的 RAG 工作流程遵循以下 3 个步骤:
Langchain.rb 在支持的矢量搜索数据库之上提供了一个方便的统一界面,可以轻松配置索引、添加数据、查询和检索。
数据库 | 开源 | 云产品 |
---|---|---|
色度 | ✅ | ✅ |
埃普西拉 | ✅ | ✅ |
汉斯库 | ✅ | |
米尔乌斯 | ✅ | ✅ 齐利兹云 |
松果 | ✅ | |
向量 | ✅ | ✅ |
奎德兰特 | ✅ | ✅ |
韦维阿特 | ✅ | ✅ |
弹性搜索 | ✅ | ✅ |
选择您将使用的矢量搜索数据库,添加 gem 依赖项并实例化客户端:
gem "weaviate-ruby" , "~> 0.8.9"
选择并实例化您将用于生成嵌入的 LLM 提供商
llm = Langchain :: LLM :: OpenAI . new ( api_key : ENV [ "OPENAI_API_KEY" ] )
client = Langchain :: Vectorsearch :: Weaviate . new (
url : ENV [ "WEAVIATE_URL" ] ,
api_key : ENV [ "WEAVIATE_API_KEY" ] ,
index_name : "Documents" ,
llm : llm
)
您可以实例化任何其他支持的矢量搜索数据库:
client = Langchain :: Vectorsearch :: Chroma . new ( ... ) # `gem "chroma-db", "~> 0.6.0"`
client = Langchain :: Vectorsearch :: Epsilla . new ( ... ) # `gem "epsilla-ruby", "~> 0.0.3"`
client = Langchain :: Vectorsearch :: Hnswlib . new ( ... ) # `gem "hnswlib", "~> 0.8.1"`
client = Langchain :: Vectorsearch :: Milvus . new ( ... ) # `gem "milvus", "~> 0.9.3"`
client = Langchain :: Vectorsearch :: Pinecone . new ( ... ) # `gem "pinecone", "~> 0.1.6"`
client = Langchain :: Vectorsearch :: Pgvector . new ( ... ) # `gem "pgvector", "~> 0.2"`
client = Langchain :: Vectorsearch :: Qdrant . new ( ... ) # `gem "qdrant-ruby", "~> 0.9.3"`
client = Langchain :: Vectorsearch :: Elasticsearch . new ( ... ) # `gem "elasticsearch", "~> 8.2.0"`
创建默认架构:
client . create_default_schema
将纯文本数据添加到矢量搜索数据库:
client . add_texts (
texts : [
"Begin by preheating your oven to 375°F (190°C). Prepare four boneless, skinless chicken breasts by cutting a pocket into the side of each breast, being careful not to cut all the way through. Season the chicken with salt and pepper to taste. In a large skillet, melt 2 tablespoons of unsalted butter over medium heat. Add 1 small diced onion and 2 minced garlic cloves, and cook until softened, about 3-4 minutes. Add 8 ounces of fresh spinach and cook until wilted, about 3 minutes. Remove the skillet from heat and let the mixture cool slightly." ,
"In a bowl, combine the spinach mixture with 4 ounces of softened cream cheese, 1/4 cup of grated Parmesan cheese, 1/4 cup of shredded mozzarella cheese, and 1/4 teaspoon of red pepper flakes. Mix until well combined. Stuff each chicken breast pocket with an equal amount of the spinach mixture. Seal the pocket with a toothpick if necessary. In the same skillet, heat 1 tablespoon of olive oil over medium-high heat. Add the stuffed chicken breasts and sear on each side for 3-4 minutes, or until golden brown."
]
)
或者使用文件解析器将数据加载、解析并索引到数据库中:
my_pdf = Langchain . root . join ( "path/to/my.pdf" )
my_text = Langchain . root . join ( "path/to/my.txt" )
my_docx = Langchain . root . join ( "path/to/my.docx" )
client . add_data ( paths : [ my_pdf , my_text , my_docx ] )
支持的文件格式:docx、html、pdf、text、json、jsonl、csv、xlsx、eml、pptx。
根据传入的查询字符串检索相似文档:
client . similarity_search (
query : ,
k : # number of results to be retrieved
)
根据通过 HyDE 技术传入的查询字符串检索相似文档:
client . similarity_search_with_hyde ( )
根据传入的嵌入检索类似文档:
client . similarity_search_by_vector (
embedding : ,
k : # number of results to be retrieved
)
基于RAG的查询
client . ask ( question : "..." )
Langchain::Assistant
是一个强大且灵活的类,它结合了大型语言模型 (LLM)、工具和对话管理来创建智能的交互式助手。它旨在处理复杂的对话、执行工具并根据交互上下文提供连贯的响应。
llm = Langchain :: LLM :: OpenAI . new ( api_key : ENV [ "OPENAI_API_KEY" ] )
assistant = Langchain :: Assistant . new (
llm : llm ,
instructions : "You're a helpful AI assistant" ,
tools : [ Langchain :: Tool :: NewsRetriever . new ( api_key : ENV [ "NEWS_API_KEY" ] ) ]
)
# Add a user message and run the assistant
assistant . add_message_and_run! ( content : "What's the latest news about AI?" )
# Supply an image to the assistant
assistant . add_message_and_run! (
content : "Show me a picture of a cat" ,
image_url : "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
)
# Access the conversation thread
messages = assistant . messages
# Run the assistant with automatic tool execution
assistant . run ( auto_tool_execution : true )
# If you want to stream the response, you can add a response handler
assistant = Langchain :: Assistant . new (
llm : llm ,
instructions : "You're a helpful AI assistant" ,
tools : [ Langchain :: Tool :: NewsRetriever . new ( api_key : ENV [ "NEWS_API_KEY" ] ) ]
) do | response_chunk |
# ...handle the response stream
# print(response_chunk.inspect)
end
assistant . add_message ( content : "Hello" )
assistant . run ( auto_tool_execution : true )
请注意,目前并非所有法学硕士都支持流式传输。
llm
:要使用的 LLM 实例(必需)tools
:工具实例数组(可选)instructions
:助手的系统指令(可选)tool_choice
:指定如何选择工具。默认值:“自动”。可以传递特定的工具函数名称。这将强制助手始终使用此功能。parallel_tool_calls
:是否进行多个并行工具调用。默认值:trueadd_message_callback
:将任何消息添加到对话时调用的回调函数(proc、lambda)(可选) assistant . add_message_callback = -> ( message ) { puts "New message: #{ message } " }
tool_execution_callback
:在执行工具之前调用的回调函数(proc、lambda)(可选) assistant . tool_execution_callback = -> ( tool_call_id , tool_name , method_name , tool_arguments ) { puts "Executing tool_call_id: #{ tool_call_id } , tool_name: #{ tool_name } , method_name: #{ method_name } , tool_arguments: #{ tool_arguments } " }
add_message
:将用户消息添加到消息数组中run!
:处理对话并生成响应add_message_and_run!
:结合添加消息和运行助手submit_tool_output
:手动将输出提交到工具调用messages
:返回正在进行的消息列表Langchain::Tool::Calculator
:用于计算数学表达式。需要gem "eqn"
。Langchain::Tool::Database
:连接您的 SQL 数据库。需要gem "sequel"
。Langchain::Tool::FileSystem
:与文件系统交互(读和写)。Langchain::Tool::RubyCodeInterpreter
:用于评估生成的 Ruby 代码。需要gem "safe_ruby"
(需要更好的解决方案)。Langchain::Tool::NewsRetriever
:NewsApi.org 的包装器,用于获取新闻文章。Langchain::Tool::Tavily
:Tavily AI 的包装器。Langchain::Tool::Weather
:调用 Open Weather API 来检索当前天气。Langchain::Tool::Wikipedia
:调用维基百科 API。通过创建extend Langchain::ToolDefinition
模块并实现所需方法的类,可以使用自定义工具轻松扩展 Langchain::Assistant。
class MovieInfoTool
extend Langchain :: ToolDefinition
define_function :search_movie , description : "MovieInfoTool: Search for a movie by title" do
property :query , type : "string" , description : "The movie title to search for" , required : true
end
define_function :get_movie_details , description : "MovieInfoTool: Get detailed information about a specific movie" do
property :movie_id , type : "integer" , description : "The TMDb ID of the movie" , required : true
end
def initialize ( api_key : )
@api_key = api_key
end
def search_movie ( query : )
...
end
def get_movie_details ( movie_id : )
...
end
end
movie_tool = MovieInfoTool . new ( api_key : "..." )
assistant = Langchain :: Assistant . new (
llm : llm ,
instructions : "You're a helpful AI assistant that can provide movie information" ,
tools : [ movie_tool ]
)
assistant . add_message_and_run ( content : "Can you tell me about the movie 'Inception'?" )
# Check the response in the last message in the conversation
assistant . messages . last
该助手包括对无效输入、不支持的 LLM 类型和工具执行失败的错误处理。它使用状态机来管理会话流并优雅地处理不同的场景。
评估模块是一个工具集合,可用于评估和跟踪 LLM 和 RAG(检索增强生成)管道的输出产品的性能。
Ragas 可帮助您评估检索增强生成 (RAG) 管道。该实现基于本文和原始 Python 存储库。 Ragas 跟踪以下 3 个指标并分配 0.0 - 1.0 分数:
# We recommend using Langchain::LLM::OpenAI as your llm for Ragas
ragas = Langchain :: Evals :: Ragas :: Main . new ( llm : llm )
# The answer that the LLM generated
# The question (or the original prompt) that was asked
# The context that was retrieved (usually from a vectorsearch database)
ragas . score ( answer : "" , question : "" , context : "" )
# =>
# {
# ragas_score: 0.6601257446503674,
# answer_relevance_score: 0.9573145866787608,
# context_relevance_score: 0.6666666666666666,
# faithfulness_score: 0.5
# }
其他可用示例:/examples
Langchain.rb 使用标准 Ruby Logger 机制并默认为相同level
值(当前为Logger::DEBUG
)。
显示所有日志消息:
Langchain . logger . level = Logger :: DEBUG
默认情况下,记录器记录到STDOUT
。为了配置日志目标(即记录到文件),请执行以下操作:
Langchain . logger = Logger . new ( "path/to/file" , ** Langchain :: LOGGER_OPTIONS )
如果您在安装pragmatic_segmenter
所需的unicode
gem 时遇到问题,请尝试运行:
gem install unicode -- --with-cflags= " -Wno-incompatible-function-pointer-types "
git clone https://github.com/andreibondarev/langchainrb.git
cp .env.example .env
,然后填写.env
中的环境变量bundle exec rake
以确保测试通过并运行 standardrbbin/console
在 REPL 会话中加载 gem。请随意添加您自己的 LLM、工具、代理等实例并进行试验。gem install lefthook && lefthook install -f
加入我们的 Langchain.rb Discord 服务器。
欢迎在 GitHub 上提交错误报告和拉取请求:https://github.com/andreibondarev/langchainrb。
该 gem 根据 MIT 许可证条款作为开源提供。