Marqo 不僅僅是一個向量資料庫,它還是一個用於文字和圖像的端到端向量搜尋引擎。向量產生、儲存和檢索均透過單一 API 進行開箱即用的處理。無需自備嵌入。
僅靠向量相似度不足以進行向量搜尋。向量搜尋需要的不僅僅是向量資料庫,它還需要機器學習 (ML) 部署和管理、輸入的預處理和轉換,以及無需重新訓練模型即可修改搜尋行為的能力。 Marqo 包含所有這些部分,使開發人員能夠以最小的努力將向量搜尋建置到他們的應用程式中。下面可以找到完整的功能清單。
向量資料庫是向量相似性的專用元件,僅服務於向量搜尋系統的一個元件。它們是“向量輸入-向量輸出”。它們仍然需要向量的生成、機器學習模型的管理、相關的編排和輸入的處理。 Marqo 透過「文件輸入、文件輸出」讓這一切變得簡單。文字和圖像的預處理、嵌入內容、儲存元資料以及推理和儲存的部署都由 Marqo 負責。
以下是使用 Marqo 進行向量搜尋的最小範例的程式碼片段(請參閱入門):
Marqo 需要 Docker。要安裝 Docker,請造訪 Docker 官方網站。確保 docker 至少有 8GB 記憶體和 50GB 儲存空間。在 Docker 桌面中,您可以透過點擊設定圖示、資源並選擇 8GB 記憶體來完成此操作。
使用 docker 運行 Marqo:
docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest
pip install marqo
import marqo
mq = marqo . Client ( url = 'http://localhost:8882' )
mq . create_index ( "my-first-index" , model = "hf/e5-base-v2" )
mq . index ( "my-first-index" ). add_documents ([
"Title" : "The Travels of Marco Polo" ,
"Description" : "A 13th-century travelogue describing Polo's travels"
"Title" : "Extravehicular Mobility Unit (EMU)" ,
"Description" : "The EMU is a spacesuit that provides environmental protection, "
"mobility, life support, and communications for astronauts" ,
"_id" : "article_591"
tensor_fields = [ "Description" ]
results = mq . index ( "my-first-index" ). search (
q = "What is the best outfit to wear on the moon?"
⚡ 性能
Marqo 已整合到流行的人工智慧和資料處理框架中,更多功能正在開發中。
Haystack 是一個開源框架,用於建立利用 NLP 技術的應用程序,例如法學碩士、嵌入模型等。透過這種集成,您可以使用 Marqo 作為 Haystack 管道的文檔存儲,例如檢索增強、問答、文檔搜尋等。
Griptape 可以為企業應用程式安全可靠地部署基於 LLM 的代理,MarqoVectorStoreDriver 使這些代理能夠使用您自己的資料進行可擴展搜尋。透過這種集成,您可以透過 Marqo 利用開源或自訂微調模型,向您的法學碩士提供相關結果。
透過這種集成,您可以透過 Marqo 將開源或自訂微調模型用於具有向量搜尋元件的 LangChain 應用程式。 Marqo 向量儲存實作可以插入現有鏈,例如檢索 QA 和會話檢索 QA。
⋙ 漢彌爾頓
透過這種集成,您可以透過 Marqo 將開源或自訂微調模型用於 Hamilton LLM 應用程式。
?特徵 | Marqo 的核心功能。 |
API 的客戶端。create_index()
使用預設設定建立一個新索引。您可以選擇指定要使用的模型。例如, mq.create_index("my-first-index", model="hf/all_datasets_v4_MiniLM-L6")
取得一個文件列表,表示為用於索引的 python 字典。 tensor_fields
欄位設定文件的 ID。否則,Marqo 將生成一個。我們來看看結果:
# let's print out the results:
import pprint
pprint . pprint ( results )
'hits' : [
'Title' : 'Extravehicular Mobility Unit (EMU)' ,
'Description' : 'The EMU is a spacesuit that provides environmental protection, mobility, life support, and'
'communications for astronauts' ,
'_highlights' : [{
'Description' : 'The EMU is a spacesuit that provides environmental protection, '
'mobility, life support, and communications for astronauts'
'_id' : 'article_591' ,
'_score' : 0.61938936
'Title' : 'The Travels of Marco Polo' ,
'Description' : "A 13th-century travelogue describing Polo's travels" ,
'_highlights' : [{ 'Title' : 'The Travels of Marco Polo' }],
'_id' : 'e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a' ,
'_score' : 0.60237324
'limit' : 10 ,
'processingTimeMs' : 49 ,
'query' : 'What is the best outfit to wear on the moon?'
字段。這是文件中與查詢最匹配的部分。 透過 ID 檢索文件。
result = mq . index ( "my-first-index" ). get_document ( document_id = "article_591" )
results = mq . index ( "my-first-index" ). get_stats ()
result = mq . index ( "my-first-index" ). search ( 'marco polo' , search_method = marqo . SearchMethods . LEXICAL )
為了支援圖像和文字搜索,Marqo 允許用戶即插即用 HuggingFace 的 CLIP 模型。請注意,如果您未配置多模式搜索,則圖像 url 將被視為字串。要開始對影像進行索引和搜索,請先使用 CLIP 配置建立索引,如下所示:
settings = {
"treat_urls_and_pointers_as_images" : True , # allows us to find an image file and index it
"model" : "ViT-L/14"
response = mq . create_index ( "my-multimodal-index" , ** settings )
然後可以將圖像新增到文件中,如下所示。您可以使用來自網際網路(例如 S3)或電腦磁碟的 URL:
response = mq . index ( "my-multimodal-index" ). add_documents ([{
"My_Image" : "https://raw.githubusercontent.com/marqo-ai/marqo-api-tests/mainline/assets/ai_hippo_realistic.png" ,
"Description" : "The hippopotamus, also called the common hippopotamus or river hippopotamus, is a large semiaquatic mammal native to sub-Saharan Africa" ,
"_id" : "hippo-facts"
}], tensor_fields = [ "My_Image" ])
results = mq . index ( "my-multimodal-index" ). search ( 'animal' )
results = mq . index ( "my-multimodal-index" ). search ( 'https://raw.githubusercontent.com/marqo-ai/marqo-api-tests/mainline/assets/ai_hippo_statue.png' )
import marqo
import pprint
mq = marqo . Client ( url = "http://localhost:8882" )
mq . create_index ( "my-weighted-query-index" )
mq . index ( "my-weighted-query-index" ). add_documents (
"Title" : "Smartphone" ,
"Description" : "A smartphone is a portable computer device that combines mobile telephone "
"functions and computing functions into one unit." ,
"Title" : "Telephone" ,
"Description" : "A telephone is a telecommunications device that permits two or more users to"
"conduct a conversation when they are too far apart to be easily heard directly." ,
"Title" : "Thylacine" ,
"Description" : "The thylacine, also commonly known as the Tasmanian tiger or Tasmanian wolf, "
"is an extinct carnivorous marsupial."
"The last known of its species died in 1936." ,
tensor_fields = [ "Description" ]
# initially we ask for a type of communications device which is popular in the 21st century
query = {
# a weighting of 1.1 gives this query slightly more importance
"I need to buy a communications device, what should I get?" : 1.1 ,
# a weighting of 1 gives this query a neutral importance
# this will lead to 'Smartphone' being the top result
"The device should work like an intelligent computer." : 1.0 ,
results = mq . index ( "my-weighted-query-index" ). search ( q = query )
print ( "Query 1:" )
pprint . pprint ( results )
# now we ask for a type of communications which predates the 21st century
query = {
# a weighting of 1 gives this query a neutral importance
"I need to buy a communications device, what should I get?" : 1.0 ,
# a weighting of -1 gives this query a negation effect
# this will lead to 'Telephone' being the top result
"The device should work like an intelligent computer." : - 0.3 ,
results = mq . index ( "my-weighted-query-index" ). search ( q = query )
print ( " n Query 2:" )
pprint . pprint ( results )
Marqo 允許您擁有具有多模式組合欄位的索引。多模態組合欄位可以將文字和圖像組合到一個欄位中。這允許對組合的文字和圖像欄位中的文件進行評分。它還允許使用單一向量表示,而不需要多個向量表示,從而節省儲存空間。可以為每個文件設定每個組件的相對權重。
import marqo
import pprint
mq = marqo . Client ( url = "http://localhost:8882" )
settings = { "treat_urls_and_pointers_as_images" : True , "model" : "ViT-L/14" }
mq . create_index ( "my-first-multimodal-index" , ** settings )
mq . index ( "my-first-multimodal-index" ). add_documents (
"Title" : "Flying Plane" ,
"caption" : "An image of a passenger plane flying in front of the moon." ,
"image" : "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg" ,
"Title" : "Red Bus" ,
"caption" : "A red double decker London bus traveling to Aldwych" ,
"image" : "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg" ,
"Title" : "Horse Jumping" ,
"caption" : "A person riding a horse over a jump in a competition." ,
"image" : "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image1.jpg" ,
# Create the mappings, here we define our captioned_image mapping
# which weights the image more heavily than the caption - these pairs
# will be represented by a single vector in the index
mappings = {
"captioned_image" : {
"type" : "multimodal_combination" ,
"weights" : {
"caption" : 0.3 ,
"image" : 0.7
# We specify which fields to create vectors for.
# Note that captioned_image is treated as a single field.
tensor_fields = [ "captioned_image" ]
# Search this index with a simple text query
results = mq . index ( "my-first-multimodal-index" ). search (
q = "Give me some images of vehicles and modes of transport. I am especially interested in air travel and commercial aeroplanes."
print ( "Query 1:" )
pprint . pprint ( results )
# search the index with a query that uses weighted components
results = mq . index ( "my-first-multimodal-index" ). search (
q = {
"What are some vehicles and modes of transport?" : 1.0 ,
"Aeroplanes and other things that fly" : - 1.0
print ( " n Query 2:" )
pprint . pprint ( results )
results = mq . index ( "my-first-multimodal-index" ). search (
q = { "Animals of the Perissodactyla order" : - 1.0 }
print ( " n Query 3:" )
pprint . pprint ( results )
results = mq . index ( "my-first-index" ). delete_documents ( ids = [ "article_591" , "article_602" ])
results = mq . index ( "my-first-index" ). delete ()
我們支援 Marqo 的 Kubernetes 模板,您可以將其部署在您選擇的雲端提供者上。 Marqo 的 Kubernetes 實作可讓您部署具有副本、多個儲存分片和多個推理節點的叢集。這個儲存庫可以在這裡找到:https://github.com/marqo-ai/marqo-on-kubernetes
如果您正在尋找完全託管的雲端服務,可以在此處註冊 Marqo Cloud:https://cloud.marqo.ai。
Marqo 的完整文件可以在這裡找到:https://docs.marqo.ai/。
請注意,您不應在 Marqo 的 Vespa 叢集上執行其他應用程序,因為 Marqo 會自動變更和調整叢集上的設定。
Marqo 是一個社區項目,其目標是讓更廣泛的開發者社群能夠存取張量搜尋。我們很高興您有興趣提供協助!請閱讀本文以開始使用。
建立虛擬環境python -m venv ./venv
啟動虛擬環境source ./venv/bin/activate
從需求檔案安裝需求: pip install -r requirements.txt
透過運行 tox 檔案來運行測試。 CD 進入此目錄,然後執行「tox」。
如果更新依賴項,請確保刪除 .tox 目錄並重新執行。
建立帶有附加 github 問題的拉取請求。