marqo下載 - marqo原始碼下載

網站 |文檔 |演示 |話語| Slack 社群 |馬爾科雲

馬可

Marqo 不僅僅是一個向量資料庫，它還是一個用於文字和圖像的端到端向量搜尋引擎。向量產生、儲存和檢索均透過單一 API 進行開箱即用的處理。無需自備嵌入。

為什麼是馬庫？

僅靠向量相似度不足以進行向量搜尋。向量搜尋需要的不僅僅是向量資料庫，它還需要機器學習 (ML) 部署和管理、輸入的預處理和轉換，以及無需重新訓練模型即可修改搜尋行為的能力。 Marqo 包含所有這些部分，使開發人員能夠以最小的努力將向量搜尋建置到他們的應用程式中。下面可以找到完整的功能清單。

為什麼將嵌入生成與向量搜尋捆綁在一起？

向量資料庫是向量相似性的專用元件，僅服務於向量搜尋系統的一個元件。它們是“向量輸入-向量輸出”。它們仍然需要向量的生成、機器學習模型的管理、相關的編排和輸入的處理。 Marqo 透過「文件輸入、文件輸出」讓這一切變得簡單。文字和圖像的預處理、嵌入內容、儲存元資料以及推理和儲存的部署都由 Marqo 負責。

快速啟動

以下是使用 Marqo 進行向量搜尋的最小範例的程式碼片段（請參閱入門）：

Marqo 需要 Docker。要安裝 Docker，請造訪 Docker 官方網站。確保 docker 至少有 8GB 記憶體和 50GB 儲存空間。在 Docker 桌面中，您可以透過點擊設定圖示、資源並選擇 8GB 記憶體來完成此操作。
使用 docker 運行 Marqo：

docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest

安裝 Marqo 客戶端：

pip install marqo

開始索引和搜尋！讓我們來看下面一個簡單的例子：

 import marqo

mq = marqo . Client ( url = 'http://localhost:8882' )

mq . create_index ( "my-first-index" , model = "hf/e5-base-v2" )

mq . index ( "my-first-index" ). add_documents ([
    {
        "Title" : "The Travels of Marco Polo" ,
        "Description" : "A 13th-century travelogue describing Polo's travels"
    }, 
    {
        "Title" : "Extravehicular Mobility Unit (EMU)" ,
        "Description" : "The EMU is a spacesuit that provides environmental protection, "
                       "mobility, life support, and communications for astronauts" ,
        "_id" : "article_591"
    }],
    tensor_fields = [ "Description" ]
)

results = mq . index ( "my-first-index" ). search (
    q = "What is the best outfit to wear on the moon?"
)

核心特點

？最先進的嵌入

使用 PyTorch、Huggingface、OpenAI 等最新的機器學習模型。
從預先配置的模型開始或自備自己的模型。
CPU 和 GPU 支援。

⚡ 性能

嵌入儲存在記憶體中的 HNSW 索引中，實現了最先進的搜尋速度。
透過水平索引分片擴展到數億個文件索引。
非同步且非阻塞的資料上傳和搜尋。

？文檔輸入-文檔輸出

向量產生、儲存和檢索都是開箱即用的。
使用文字和圖像建立搜尋、實體解析和資料探索應用程式。
透過組合加權搜尋詞來建構複雜的語意查詢。
使用 Marqo 的查詢 DSL 過濾搜尋結果。
使用一系列支援的資料類型（如布林、整數和關鍵字）將非結構化資料和半結構化元資料一起儲存在文件中。

？託管雲端

Marqo 的低延遲最佳化部署。
單擊按鈕即可進行縮放推理。
高可用性。
24/7 支持。
存取控制。
在這裡了解更多。

整合

Marqo 已整合到流行的人工智慧和資料處理框架中，更多功能正在開發中。

？草垛

Haystack 是一個開源框架，用於建立利用 NLP 技術的應用程序，例如法學碩士、嵌入模型等。透過這種集成，您可以使用 Marqo 作為 Haystack 管道的文檔存儲，例如檢索增強、問答、文檔搜尋等。

？黏扣帶

Griptape 可以為企業應用程式安全可靠地部署基於 LLM 的代理，MarqoVectorStoreDriver 使這些代理能夠使用您自己的資料進行可擴展搜尋。透過這種集成，您可以透過 Marqo 利用開源或自訂微調模型，向您的法學碩士提供相關結果。

??朗查恩

透過這種集成，您可以透過 Marqo 將開源或自訂微調模型用於具有向量搜尋元件的 LangChain 應用程式。 Marqo 向量儲存實作可以插入現有鏈，例如檢索 QA 和會話檢索 QA。

⋙ 漢彌爾頓

透過這種集成，您可以透過 Marqo 將開源或自訂微調模型用於 Hamilton LLM 應用程式。

了解有關 Marqo 的更多信息


？快速啟動	在 5 分鐘內使用 Marqo 建立您的第一個應用程式。
？ Marqo 用於影像數據	使用 Marqo 建立進階影像搜尋。
Marqo 文本	在 Marqo 中建立多語言資料庫。
？將 Marqo 與 GPT 集成	使用 Marqo 作為知識庫，使 GPT 成為主題專家。
？ Marqo 創意 AI	將穩定擴散與語義搜尋結合，產生 10 萬張熱狗圖像並進行分類。
？ Marqo 與語音數據	使用 Marqo 和 ChatGPT 添加分類和轉錄以預處理音訊以進行問答。
Marqo 用於內容審核	使用 Marqo 建立進階影像搜尋以尋找和刪除內容。
☁️ Marqo 雲入門	從首次登入到使用 Marqo 建立您的第一個應用程序，了解如何設定和運行 Marqo Cloud
？ Marqo 電子商務	該專案是一個具有前端和後端的 Web 應用程序，使用 Python、Flask、ReactJS 和 Typescript。前端是一個 ReactJS 應用程序，它向後端（一個 Flask 應用程式）發出請求。後端向您的 Marqo 雲端 API 發出請求。
？ Marqo 聊天機器人	在本指南中，我們將使用 Marqo 和 OpenAI 的 ChatGPT API 來建立一個聊天機器人應用程式。我們將從現有的程式碼庫開始，然後逐步介紹如何自訂行為。
？特徵	Marqo 的核心功能。

入門

Marqo 需要 Docker。要安裝 Docker，請造訪 Docker 官方網站。確保 docker 至少有 8GB 記憶體和 50GB 儲存空間。
使用 docker 運行 Marqo：

docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -p 8882:8882 marqoai/marqo:latest

注意：如果您的marqo容器不斷被殺死，這很可能是由於分配給 Docker 的記憶體不足。在 Docker 設定中將 Docker 的記憶體限制增加到至少 6GB（建議 8GB）可能會解決該問題。

安裝 Marqo 客戶端：

pip install marqo

開始索引和搜尋！讓我們來看下面一個簡單的例子：

 import marqo

mq = marqo . Client ( url = 'http://localhost:8882' )

mq . create_index ( "my-first-index" )

mq . index ( "my-first-index" ). add_documents ([
    {
        "Title" : "The Travels of Marco Polo" ,
        "Description" : "A 13th-century travelogue describing Polo's travels"
    }, 
    {
        "Title" : "Extravehicular Mobility Unit (EMU)" ,
        "Description" : "The EMU is a spacesuit that provides environmental protection, "
                       "mobility, life support, and communications for astronauts" ,
        "_id" : "article_591"
    }],
    tensor_fields = [ "Description" ]
)

results = mq . index ( "my-first-index" ). search (
    q = "What is the best outfit to wear on the moon?"
)

mq是包裝marqo API 的客戶端。
create_index()使用預設設定建立一個新索引。您可以選擇指定要使用的模型。例如， mq.create_index("my-first-index", model="hf/all_datasets_v4_MiniLM-L6")將使用預設文字模型hf/all_datasets_v4_MiniLM-L6建立索引。通常需要對不同模型進行試驗才能針對您的特定用例實現最佳檢索。不同的模型也提供了推理速度和相關性之間的權衡。請參閱此處以了解完整的型號清單。
add_documents()取得一個文件列表，表示為用於索引的 python 字典。 tensor_fields指的是將被索引為向量集合並可搜尋的欄位。
您可以選擇使用特殊的_id欄位設定文件的 ID。否則，Marqo 將生成一個。

我們來看看結果：

 # let's print out the results:
import pprint
pprint . pprint ( results )

{
    'hits' : [
        {
            'Title' : 'Extravehicular Mobility Unit (EMU)' ,
            'Description' : 'The EMU is a spacesuit that provides environmental protection, mobility, life support, and'
                           'communications for astronauts' ,
            '_highlights' : [{
                'Description' : 'The EMU is a spacesuit that provides environmental protection, '
                               'mobility, life support, and communications for astronauts'
            }],
            '_id' : 'article_591' ,
            '_score' : 0.61938936
        },
        {
            'Title' : 'The Travels of Marco Polo' ,
            'Description' : "A 13th-century travelogue describing Polo's travels" ,
            '_highlights' : [{ 'Title' : 'The Travels of Marco Polo' }],
            '_id' : 'e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a' ,
            '_score' : 0.60237324
        }
    ],
    'limit' : 10 ,
    'processingTimeMs' : 49 ,
    'query' : 'What is the best outfit to wear on the moon?'
}

每個命中對應於與搜尋查詢相符的文件。
它們按照從最匹配到最不匹配的順序排列。
limit是要回傳的最大命中數。這可以在搜尋期間設定為參數。
每個命中都有一個_highlights字段。這是文件中與查詢最匹配的部分。

其他基本操作

取得文件

透過 ID 檢索文件。

 result = mq . index ( "my-first-index" ). get_document ( document_id = "article_591" )

請注意，使用相同的_id再次使用add_documents新增文件將導致文件更新。

取得索引統計數據

取得有關索引的資訊。

 results = mq . index ( "my-first-index" ). get_stats ()

詞彙搜尋

執行關鍵字搜尋。

 result = mq . index ( "my-first-index" ). search ( 'marco polo' , search_method = marqo . SearchMethods . LEXICAL )

多模態和跨模態搜尋

為了支援圖像和文字搜索，Marqo 允許用戶即插即用 HuggingFace 的 CLIP 模型。請注意，如果您未配置多模式搜索，則圖像 url 將被視為字串。要開始對影像進行索引和搜索，請先使用 CLIP 配置建立索引，如下所示：

 settings = {
    "treat_urls_and_pointers_as_images" : True ,   # allows us to find an image file and index it 
    "model" : "ViT-L/14"
}
response = mq . create_index ( "my-multimodal-index" , ** settings )

然後可以將圖像新增到文件中，如下所示。您可以使用來自網際網路（例如 S3）或電腦磁碟的 URL：

 response = mq . index ( "my-multimodal-index" ). add_documents ([{
    "My_Image" : "https://raw.githubusercontent.com/marqo-ai/marqo-api-tests/mainline/assets/ai_hippo_realistic.png" ,
    "Description" : "The hippopotamus, also called the common hippopotamus or river hippopotamus, is a large semiaquatic mammal native to sub-Saharan Africa" ,
    "_id" : "hippo-facts"
}], tensor_fields = [ "My_Image" ])

然後，您可以使用文字搜尋圖像欄位。

 results = mq . index ( "my-multimodal-index" ). search ( 'animal' )

使用圖像搜尋

透過提供圖像連結可以實現使用圖像搜尋。

 results = mq . index ( "my-multimodal-index" ). search ( 'https://raw.githubusercontent.com/marqo-ai/marqo-api-tests/mainline/assets/ai_hippo_statue.png' )

在查詢中使用權重進行搜索

查詢也可以作為字典提供，其中每個鍵是一個查詢，其對應的值是權重。這允許由多個組件組成的更高級的查詢，這些組件具有朝向或反對它們的權重，查詢可以透過負權重進行否定。

以下的範例顯示了此方法在使用者可能想要提出問題但也否定與特定語義標準相符的結果的場景中的應用。

 import marqo
import pprint

mq = marqo . Client ( url = "http://localhost:8882" )

mq . create_index ( "my-weighted-query-index" )

mq . index ( "my-weighted-query-index" ). add_documents (
    [
        {
            "Title" : "Smartphone" ,
            "Description" : "A smartphone is a portable computer device that combines mobile telephone "
            "functions and computing functions into one unit." ,
        },
        {
            "Title" : "Telephone" ,
            "Description" : "A telephone is a telecommunications device that permits two or more users to"
            "conduct a conversation when they are too far apart to be easily heard directly." ,
        },
        {
            "Title" : "Thylacine" ,
            "Description" : "The thylacine, also commonly known as the Tasmanian tiger or Tasmanian wolf, "
            "is an extinct carnivorous marsupial."
            "The last known of its species died in 1936." ,
        }
    ],
    tensor_fields = [ "Description" ]
)

# initially we ask for a type of communications device which is popular in the 21st century
query = {
    # a weighting of 1.1 gives this query slightly more importance
    "I need to buy a communications device, what should I get?" : 1.1 ,
    # a weighting of 1 gives this query a neutral importance
    # this will lead to 'Smartphone' being the top result
    "The device should work like an intelligent computer." : 1.0 ,
}

results = mq . index ( "my-weighted-query-index" ). search ( q = query )

print ( "Query 1:" )
pprint . pprint ( results )

# now we ask for a type of communications which predates the 21st century
query = {
    # a weighting of 1 gives this query a neutral importance
    "I need to buy a communications device, what should I get?" : 1.0 ,
    # a weighting of -1 gives this query a negation effect
    # this will lead to 'Telephone' being the top result
    "The device should work like an intelligent computer." : - 0.3 ,
}

results = mq . index ( "my-weighted-query-index" ). search ( q = query )

print ( " n Query 2:" )
pprint . pprint ( results )

使用多模式組合欄位建立和搜尋索引

Marqo 允許您擁有具有多模式組合欄位的索引。多模態組合欄位可以將文字和圖像組合到一個欄位中。這允許對組合的文字和圖像欄位中的文件進行評分。它還允許使用單一向量表示，而不需要多個向量表示，從而節省儲存空間。可以為每個文件設定每個組件的相對權重。

下面的範例透過使用多種類型的查詢檢索標題和圖像對來示範這一點。

 import marqo
import pprint

mq = marqo . Client ( url = "http://localhost:8882" )

settings = { "treat_urls_and_pointers_as_images" : True , "model" : "ViT-L/14" }

mq . create_index ( "my-first-multimodal-index" , ** settings )

mq . index ( "my-first-multimodal-index" ). add_documents (
    [
        {
            "Title" : "Flying Plane" ,
            "caption" : "An image of a passenger plane flying in front of the moon." ,
            "image" : "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg" ,
        },
        {
            "Title" : "Red Bus" ,
            "caption" : "A red double decker London bus traveling to Aldwych" ,
            "image" : "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg" ,
        },
        {
            "Title" : "Horse Jumping" ,
            "caption" : "A person riding a horse over a jump in a competition." ,
            "image" : "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image1.jpg" ,
        },
    ],
    # Create the mappings, here we define our captioned_image mapping 
    # which weights the image more heavily than the caption - these pairs 
    # will be represented by a single vector in the index
    mappings = {
        "captioned_image" : {
            "type" : "multimodal_combination" ,
            "weights" : {
                "caption" : 0.3 ,
                "image" : 0.7
            }
        }
    },
    # We specify which fields to create vectors for. 
    # Note that captioned_image is treated as a single field.
    tensor_fields = [ "captioned_image" ]
)

# Search this index with a simple text query
results = mq . index ( "my-first-multimodal-index" ). search (
    q = "Give me some images of vehicles and modes of transport. I am especially interested in air travel and commercial aeroplanes."
)

print ( "Query 1:" )
pprint . pprint ( results )

# search the index with a query that uses weighted components
results = mq . index ( "my-first-multimodal-index" ). search (
    q = {
        "What are some vehicles and modes of transport?" : 1.0 ,
        "Aeroplanes and other things that fly" : - 1.0
    },
)
print ( " n Query 2:" )
pprint . pprint ( results )

results = mq . index ( "my-first-multimodal-index" ). search (
    q = { "Animals of the Perissodactyla order" : - 1.0 }
)
print ( " n Query 3:" )
pprint . pprint ( results )

刪除文檔

刪除文檔。

 results = mq . index ( "my-first-index" ). delete_documents ( ids = [ "article_591" , "article_602" ])

刪除索引

刪除索引。

 results = mq . index ( "my-first-index" ). delete ()

在生產環境中執行 Marqo 開源軟體

我們支援 Marqo 的 Kubernetes 模板，您可以將其部署在您選擇的雲端提供者上。 Marqo 的 Kubernetes 實作可讓您部署具有副本、多個儲存分片和多個推理節點的叢集。這個儲存庫可以在這裡找到：https://github.com/marqo-ai/marqo-on-kubernetes

如果您正在尋找完全託管的雲端服務，可以在此處註冊 Marqo Cloud：https://cloud.marqo.ai。

文件

Marqo 的完整文件可以在這裡找到：https://docs.marqo.ai/。

警告

請注意，您不應在 Marqo 的 Vespa 叢集上執行其他應用程序，因為 Marqo 會自動變更和調整叢集上的設定。

貢獻者

Marqo 是一個社區項目，其目標是讓更廣泛的開發者社群能夠存取張量搜尋。我們很高興您有興趣提供協助！請閱讀本文以開始使用。

開發設定

建立虛擬環境python -m venv ./venv 。
啟動虛擬環境source ./venv/bin/activate 。
從需求檔案安裝需求： pip install -r requirements.txt 。
透過運行 tox 檔案來運行測試。 CD 進入此目錄，然後執行「tox」。
如果更新依賴項，請確保刪除 .tox 目錄並重新執行。

合併說明：

運行完整的測試套件（透過使用此目錄中的命令tox ）。
建立帶有附加 github 問題的拉取請求。

支援

在我們的 Discourse 論壇上向社群提出問題並分享您的創作。
加入我們的 Slack 社區，與其他社區成員討論想法。

展開

marqo

馬可

核心特點

整合

了解有關 Marqo 的更多信息

入門

其他基本操作

取得文件

取得索引統計數據

詞彙搜尋

多模態和跨模態搜尋

使用圖像搜尋

在查詢中使用權重進行搜索

使用多模式組合欄位建立和搜尋索引

刪除文檔

刪除索引

在生產環境中執行 Marqo 開源軟體

文件

警告

貢獻者

開發設定

合併說明：

支援

waymo open dataset

SmartTube

Sunamu

MySchedule.py

viptools for eslam

VITAident

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

SmartTube

Sunamu

waymo open dataset

wp functions

termwind