goodai ltm下載 - goodai ltm源碼下載

好AI-LTM

GoodAI-LTM 透過結合文字嵌入模型、重新排序、向量資料庫、記憶體和查詢重寫、自動分塊、區塊元資料和區塊擴展等基本組件，為智能體配備基於文字的長期記憶。該軟體包專門設計用於為社交代理提供以對話為中心的記憶體流。

此外，GoodAI-LTM 還包括一個會話代理元件 (LTMAgent)，用於無縫整合到基於 Python 的應用程式中。

安裝

 pip install goodai-ltm

LTMAgent 的使用

呼叫LTMAgent實例的reply方法以取得來自代理程式的回應。

 from goodai.ltm.agent import LTMAgent

agent = LTMAgent(model="gpt-3.5-turbo")
response = agent.reply("What can you tell me about yourself?")
print(response)

model參數可以是 litellm 庫支援的任何模型的名稱。

會話歷史記錄由代理自動維護。如果要啟動新會話，請呼叫new_session方法。

 agent.new_session()
print(f"Number of messages in session: {len(agent.session.message_history)}")

代理人擁有對話記憶和知識庫。您可以透過呼叫add_knowledge方法告訴代理程式儲存知識。

 agent.clear_knowledge()
agent.add_knowledge("The user's birthday is February 10.")
agent.add_knowledge("Refer to the user as 'boss'.")
response = agent.reply("Today is February 10. I think this is an important date. Can you remind me?")
print(response)

LTMAgent是一個無縫 RAG 系統。 ltm_agent_with_wiki 範例展示如何將 Wikipedia 文章新增至代理程式的知識庫。

您可以透過state_as_text方法以字串形式取得代理的狀態，從而保留代理的配置及其記憶/知識。

 state_text = agent.state_as_text()
# Persist state_text to secondary storage

若要從狀態文字建立代理，請呼叫from_state_text方法。

 agent2 = LTMAgent.from_state_text(state_text)

請注意，這不會恢復對話會話。若要保留對話會話，請呼叫會話的state_as_text方法。

 from goodai.ltm.agent import LTMAgentSession

session_state_text = agent.session.state_as_text()
# session_state_text can be persisted in secondary storage
# The session.session_id field can serve as an identifier of the persisted session
# Now let's restore the session in agent2
p_session = LTMAgentSession.from_state_text(session_state_text)
agent2.use_session(p_session)

文字記憶體的使用（低階）

以下程式碼片段建立 LTM 的實例，載入一些文本，然後根據查詢檢索最相關的文本段落（擴充區塊）：

 from goodai.ltm.mem.auto import AutoTextMemory
mem = AutoTextMemory.create()
mem.add_text("Lorem ipsum dolor sit amet, consectetur adipiscing elitn")
mem.add_text("Duis aute irure dolor in reprehenderit in voluptate velit esse cillum doloren",
             metadata={'title': 'My document', 'tags': ['latin']})
r_memories = mem.retrieve(query='dolorem eum fugiat quo voluptas nulla pariatur?', k=3)

建立文字記憶體實例

可以如下建立預設記憶體實例：

 from goodai.ltm.mem.auto import AutoTextMemory

mem = AutoTextMemory.create()

您也可以透過向create方法傳遞參數來配置記憶體。在以下範例中，記憶體使用「gpt2」分詞器進行分塊，使用 T5 模型進行嵌入，使用 FAISS 索引進行嵌入儲存（而不是簡單的向量資料庫），並使用自訂分塊配置。

 import torch
from transformers import AutoTokenizer
from goodai.ltm.embeddings.auto import AutoTextEmbeddingModel
from goodai.ltm.mem.auto import AutoTextMemory
from goodai.ltm.mem.config import TextMemoryConfig
from goodai.ltm.mem.mem_foundation import VectorDbType

embedding_model = AutoTextEmbeddingModel.from_pretrained('st:sentence-transformers/sentence-t5-base')
tokenizer = AutoTokenizer.from_pretrained('gpt2')
config = TextMemoryConfig()
config.chunk_capacity = 30  # tokens
config.queue_capacity = 10000  # chunks
mem = AutoTextMemory.create(emb_model=embedding_model,
                            matching_model=None, 
                            tokenizer=tokenizer,
                            vector_db_type=VectorDbType.FAISS_FLAT_L2, 
                            config=config,
                            device=torch.device('cuda:0'))

將文字加入記憶體中

呼叫add_text方法將文字加入記憶體。文字可能由片語、句子或文件組成。

 mem.add_text("Lorem ipsum dolor sit amet, consectetur adipiscing elitn")

在內部，記憶體會自動對文字進行分塊和索引。

文字可以與任意元資料字典關聯，例如：

 mem.add_text("Duis aute irure dolor in reprehenderit in voluptate velit esse cillum doloren",
             metadata={'title': 'My document', 'tags': ['latin']})

記憶體將使用add_text儲存的文字與先前發送到記憶體的任何文字連接起來，但您可以呼叫add_separator以確保新文字不會加入到先前建立的區塊中。

檢索

若要檢索與查詢關聯的段落列表，請呼叫retrieve方法：

 r_memories = mem.retrieve("What does Jake propose?", k=2)

retrieve方法傳回RetrievedMemory類型的物件列表，依相關性降序排列。每個檢索到的記憶體都具有以下屬性：

passage ：記憶中的文字。這對應於在匹配區塊中找到的文本，但它可以使用來自相鄰區塊的文本來擴展。
timestamp ：建立檢索到的區塊的時間（預設為自紀元以來的秒數）。
distance ：查詢和區塊通道之間的計算距離。
relevance ：0到1之間的數字，表示檢索到的記憶體的相關性。
confidence ：如果查詢-段落符合模型可用，則這是該模型分配的機率。
metadata ：與檢索到的文字關聯的元資料（如果有）。

嵌入模型

載入中

嵌入模型的載入如下：

 from goodai.ltm.embeddings.auto import AutoTextEmbeddingModel

em = AutoTextEmbeddingModel.from_pretrained(model_name)

model_name可以是以下之一：

一個 SentenceTransformer (Huggingface)，以"st:"開頭，例如"st:sentence-transformers/multi-qa-mpnet-base-cos-v1" 。
標誌嵌入模型，以"flag:"開頭，例如"flag:BAAI/bge-base-en-v1.5" 。
OpenAI 嵌入模型名稱，以"openai:"開頭，例如"openai:text-embedding-ada-002" 。
我們經過微調的模型之一：

姓名	基本型號	＃參數	# 儲存嵌入
em-MiniLM-p1-01	多 qa-MiniLM-L6-cos-v1	22.7m	1
em-MiniLM-p3-01	多 qa-MiniLM-L6-cos-v1	22.7m	3
em-distilroberta-p1-01	句子變形金剛/all-distrilroberta-v1	82.1m	1
em-distilroberta-p3-01	句子變形金剛/all-distrilroberta-v1	82.1m	3
em-distilroberta-p5-01	句子變形金剛/all-distrilroberta-v1	82.1m	5

嵌入模型的使用

若要取得查詢清單的嵌入，請呼叫encode_queries方法，如下所示：

 r_emb = em.encode_queries(['hello?'])

這將會傳回一個 numpy 陣列。若要取得 Pytorch 張量，請新增convert_to_tensor參數：

 r_emb = em.encode_queries(['hello?'], convert_to_tensor=True)

若要取得段落清單的嵌入，請呼叫encode_corpus方法，如下所示：

 s_emb = em.encode_corpus(['it was...', 'the best of...'])

查詢和段落可以有多個嵌入。嵌入張量有 3 個軸：批量大小、嵌入數量和嵌入維度數量。通常，每個查詢/段落的嵌入數量為 1，但也有一些例外。

查詢-段落比對模型

載入中

查詢-段落匹配/重新排序模型可以如下載入：

 from goodai.ltm.reranking.auto import AutoTextMatchingModel

model = AutoTextMatchingModel.from_pretrained(model_name)

model_name可以是以下之一：

“st:”前綴後面跟著 SentenceTransformers 庫相容的 Huggingface 交叉編碼器的名稱，例如“st:cross-encoder/stsb-distilroberta-base”
“em:”前綴後面跟著該庫支援的嵌入模型的名稱，例如“em:openai:text-embedding-ada-002”或“em:em-distilroberta-p3-01”

預設情況下，記憶體實例不使用查詢-通道匹配模型。要啟用它，應進行如下配置：

 from goodai.ltm.embeddings.auto import AutoTextEmbeddingModel
from goodai.ltm.mem.auto import AutoTextMemory
from goodai.ltm.mem.config import TextMemoryConfig
from goodai.ltm.reranking.auto import AutoTextMatchingModel


# Low-resource embedding model
emb_model = AutoTextEmbeddingModel.from_pretrained('em-MiniLM-p1-01')
# QPM model that boosts retrieval accuracy
qpm_model = AutoTextMatchingModel.from_pretrained('em:em-distilroberta-p5-01')
config = TextMemoryConfig()
config.reranking_k_factor = 8
mem = AutoTextMemory.create(matching_model=qpm_model, emb_model=emb_model, config=config)

reranking_k_factor設定告訴記憶體應該考慮重新排序多少個候選者。用戶請求k記憶體。重新排序演算法考慮k * reranking_k_factor區塊。

查詢-段落匹配模型的使用

此模型的predict方法採用查詢通道元組列表並傳回表示估計匹配機率的浮點數列表。例子：

 model = AutoTextMatchingModel.from_pretrained('em:em-distilroberta-p5-01')
sentences = [
    ('Mike: What is your favorite color?', 'Steve: My favorite color is purple.'),
    ('Name the inner planets.', 'It was the best of times, it was the worst of times.'),
]
prob = model.predict(sentences)
print(prob)