goodai ltm下载 - goodai ltm源码下载

好AI-LTM

GoodAI-LTM 通过结合文本嵌入模型、重新排序、向量数据库、内存和查询重写、自动分块、块元数据和块扩展等基本组件，为智能体配备基于文本的长期记忆。该软件包专门设计用于为社交代理提供以对话为中心的内存流。

此外，GoodAI-LTM 还包括一个会话代理组件 (LTMAgent)，用于无缝集成到基于 Python 的应用程序中。

安装

 pip install goodai-ltm

LTMAgent 的使用

调用LTMAgent实例的reply方法以获取来自代理的响应。

 from goodai.ltm.agent import LTMAgent

agent = LTMAgent(model="gpt-3.5-turbo")
response = agent.reply("What can you tell me about yourself?")
print(response)

model参数可以是 litellm 库支持的任何模型的名称。

会话历史记录由代理自动维护。如果要启动新会话，请调用new_session方法。

 agent.new_session()
print(f"Number of messages in session: {len(agent.session.message_history)}")

代理拥有对话记忆和知识库。您可以通过调用add_knowledge方法告诉代理存储知识。

 agent.clear_knowledge()
agent.add_knowledge("The user's birthday is February 10.")
agent.add_knowledge("Refer to the user as 'boss'.")
response = agent.reply("Today is February 10. I think this is an important date. Can you remind me?")
print(response)

LTMAgent是一个无缝 RAG 系统。 ltm_agent_with_wiki 示例展示了如何将 Wikipedia 文章添加到代理的知识库。

您可以通过state_as_text方法以字符串形式获取代理的状态，从而保留代理的配置及其记忆/知识。

 state_text = agent.state_as_text()
# Persist state_text to secondary storage

要从状态文本构建代理，请调用from_state_text方法。

 agent2 = LTMAgent.from_state_text(state_text)

请注意，这不会恢复对话会话。要保留对话会话，请调用会话的state_as_text方法。

 from goodai.ltm.agent import LTMAgentSession

session_state_text = agent.session.state_as_text()
# session_state_text can be persisted in secondary storage
# The session.session_id field can serve as an identifier of the persisted session
# Now let's restore the session in agent2
p_session = LTMAgentSession.from_state_text(session_state_text)
agent2.use_session(p_session)

文本内存的使用（低级）

以下代码片段创建 LTM 的实例，加载一些文本，然后根据查询检索最相关的文本段落（扩展块）：

 from goodai.ltm.mem.auto import AutoTextMemory
mem = AutoTextMemory.create()
mem.add_text("Lorem ipsum dolor sit amet, consectetur adipiscing elitn")
mem.add_text("Duis aute irure dolor in reprehenderit in voluptate velit esse cillum doloren",
             metadata={'title': 'My document', 'tags': ['latin']})
r_memories = mem.retrieve(query='dolorem eum fugiat quo voluptas nulla pariatur?', k=3)

创建文本内存实例

可以按如下方式创建默认内存实例：

 from goodai.ltm.mem.auto import AutoTextMemory

mem = AutoTextMemory.create()

您还可以通过向create方法传递参数来配置内存。在以下示例中，内存使用“gpt2”分词器进行分块，使用 T5 模型进行嵌入，使用 FAISS 索引进行嵌入存储（而不是简单的向量数据库），并使用自定义分块配置。

 import torch
from transformers import AutoTokenizer
from goodai.ltm.embeddings.auto import AutoTextEmbeddingModel
from goodai.ltm.mem.auto import AutoTextMemory
from goodai.ltm.mem.config import TextMemoryConfig
from goodai.ltm.mem.mem_foundation import VectorDbType

embedding_model = AutoTextEmbeddingModel.from_pretrained('st:sentence-transformers/sentence-t5-base')
tokenizer = AutoTokenizer.from_pretrained('gpt2')
config = TextMemoryConfig()
config.chunk_capacity = 30  # tokens
config.queue_capacity = 10000  # chunks
mem = AutoTextMemory.create(emb_model=embedding_model,
                            matching_model=None, 
                            tokenizer=tokenizer,
                            vector_db_type=VectorDbType.FAISS_FLAT_L2, 
                            config=config,
                            device=torch.device('cuda:0'))

将文本添加到内存中

调用add_text方法将文本添加到内存中。文本可能由短语、句子或文档组成。

 mem.add_text("Lorem ipsum dolor sit amet, consectetur adipiscing elitn")

在内部，内存会自动对文本进行分块和索引。

文本可以与任意元数据字典关联，例如：

 mem.add_text("Duis aute irure dolor in reprehenderit in voluptate velit esse cillum doloren",
             metadata={'title': 'My document', 'tags': ['latin']})

内存将使用add_text存储的文本与之前发送到内存的任何文本连接起来，但您可以调用add_separator以确保新文本不会添加到之前创建的块中。

检索

要检索与查询关联的段落列表，请调用retrieve方法：

 r_memories = mem.retrieve("What does Jake propose?", k=2)

retrieve方法返回RetrievedMemory类型的对象列表，按相关性降序排列。每个检索到的内存都具有以下属性：

passage ：记忆中的文字。这对应于在匹配块中找到的文本，但它可以使用来自相邻块的文本来扩展。
timestamp ：创建检索到的块的时间（默认为自纪元以来的秒数）。
distance ：查询和块通道之间的计算距离。
relevance ：0到1之间的数字，表示检索到的内存的相关性。
confidence ：如果查询-段落匹配模型可用，则这是该模型分配的概率。
metadata ：与检索到的文本关联的元数据（如果有）。

嵌入模型

加载中

嵌入模型的加载如下：

 from goodai.ltm.embeddings.auto import AutoTextEmbeddingModel

em = AutoTextEmbeddingModel.from_pretrained(model_name)

model_name可以是以下之一：

一个 SentenceTransformer (Huggingface)，以"st:"开头，例如"st:sentence-transformers/multi-qa-mpnet-base-cos-v1" 。
标志嵌入模型，以"flag:"开头，例如"flag:BAAI/bge-base-en-v1.5" 。
OpenAI 嵌入模型名称，以"openai:"开头，例如"openai:text-embedding-ada-002" 。
我们经过微调的模型之一：

姓名	基础型号	＃参数	# 存储嵌入
em-MiniLM-p1-01	多 qa-MiniLM-L6-cos-v1	22.7m	1
em-MiniLM-p3-01	多 qa-MiniLM-L6-cos-v1	22.7m	3
em-distilroberta-p1-01	句子变形金刚/all-distrilroberta-v1	82.1m	1
em-distilroberta-p3-01	句子变形金刚/all-distrilroberta-v1	82.1m	3
em-distilroberta-p5-01	句子变形金刚/all-distrilroberta-v1	82.1m	5

嵌入模型的使用

要获取查询列表的嵌入，请调用encode_queries方法，如下所示：

 r_emb = em.encode_queries(['hello?'])

这将返回一个 numpy 数组。要获取 Pytorch 张量，请添加convert_to_tensor参数：

 r_emb = em.encode_queries(['hello?'], convert_to_tensor=True)

要获取段落列表的嵌入，请调用encode_corpus方法，如下所示：

 s_emb = em.encode_corpus(['it was...', 'the best of...'])

查询和段落可以有多个嵌入。嵌入张量有 3 个轴：批量大小、嵌入数量和嵌入维度数量。通常，每个查询/段落的嵌入数量为 1，但也有一些例外。

查询-段落匹配模型

加载中

查询-段落匹配/重新排序模型可以按如下方式加载：

 from goodai.ltm.reranking.auto import AutoTextMatchingModel

model = AutoTextMatchingModel.from_pretrained(model_name)

model_name可以是以下之一：

“st:”前缀后跟与 SentenceTransformers 库兼容的 Huggingface 交叉编码器的名称，例如“st:cross-encoder/stsb-distilroberta-base”
“em:”前缀后跟该库支持的嵌入模型的名称，例如“em:openai:text-embedding-ada-002”或“em:em-distilroberta-p3-01”

默认情况下，内存实例不使用查询-通道匹配模型。要启用它，应进行如下配置：

 from goodai.ltm.embeddings.auto import AutoTextEmbeddingModel
from goodai.ltm.mem.auto import AutoTextMemory
from goodai.ltm.mem.config import TextMemoryConfig
from goodai.ltm.reranking.auto import AutoTextMatchingModel


# Low-resource embedding model
emb_model = AutoTextEmbeddingModel.from_pretrained('em-MiniLM-p1-01')
# QPM model that boosts retrieval accuracy
qpm_model = AutoTextMatchingModel.from_pretrained('em:em-distilroberta-p5-01')
config = TextMemoryConfig()
config.reranking_k_factor = 8
mem = AutoTextMemory.create(matching_model=qpm_model, emb_model=emb_model, config=config)

reranking_k_factor设置告诉内存应该考虑重新排序多少个候选者。用户请求k内存。重新排序算法考虑k * reranking_k_factor块。

查询-段落匹配模型的使用

该模型的predict方法采用查询通道元组列表并返回表示估计匹配概率的浮点数列表。例子：

 model = AutoTextMatchingModel.from_pretrained('em:em-distilroberta-p5-01')
sentences = [
    ('Mike: What is your favorite color?', 'Steve: My favorite color is purple.'),
    ('Name the inner planets.', 'It was the best of times, it was the worst of times.'),
]
prob = model.predict(sentences)
print(prob)