Marqo 不仅仅是一个矢量数据库,它还是一个用于文本和图像的端到端矢量搜索引擎。矢量生成、存储和检索均通过单个 API 进行开箱即用的处理。无需自带嵌入。
仅靠向量相似度不足以进行向量搜索。矢量搜索需要的不仅仅是矢量数据库,它还需要机器学习 (ML) 部署和管理、输入的预处理和转换,以及无需重新训练模型即可修改搜索行为的能力。 Marqo 包含所有这些部分,使开发人员能够以最小的努力将矢量搜索构建到他们的应用程序中。下面可以找到完整的功能列表。
矢量数据库是矢量相似性的专用组件,仅服务于矢量搜索系统的一个组件。它们是“向量输入-向量输出”。它们仍然需要向量的生成、机器学习模型的管理、相关的编排和输入的处理。 Marqo 通过“文档输入、文档输出”使这一切变得简单。文本和图像的预处理、嵌入内容、存储元数据以及推理和存储的部署都由 Marqo 负责。
以下是使用 Marqo 进行矢量搜索的最小示例的代码片段(请参阅入门):
Marqo 需要 Docker。要安装 Docker,请访问 Docker 官方网站。确保 docker 至少有 8GB 内存和 50GB 存储空间。在 Docker 桌面中,您可以通过单击设置图标、资源并选择 8GB 内存来完成此操作。
使用 docker 运行 Marqo:
docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest
pip install marqo
import marqo
mq = marqo . Client ( url = 'http://localhost:8882' )
mq . create_index ( "my-first-index" , model = "hf/e5-base-v2" )
mq . index ( "my-first-index" ). add_documents ([
"Title" : "The Travels of Marco Polo" ,
"Description" : "A 13th-century travelogue describing Polo's travels"
"Title" : "Extravehicular Mobility Unit (EMU)" ,
"Description" : "The EMU is a spacesuit that provides environmental protection, "
"mobility, life support, and communications for astronauts" ,
"_id" : "article_591"
tensor_fields = [ "Description" ]
results = mq . index ( "my-first-index" ). search (
q = "What is the best outfit to wear on the moon?"
⚡ 性能
Marqo 已集成到流行的人工智能和数据处理框架中,更多功能正在开发中。
Haystack 是一个开源框架,用于构建利用 NLP 技术的应用程序,例如法学硕士、嵌入模型等。通过这种集成,您可以使用 Marqo 作为 Haystack 管道的文档存储,例如检索增强、问答、文档搜索等。
Griptape 可以为企业应用程序安全可靠地部署基于 LLM 的代理,MarqoVectorStoreDriver 使这些代理能够使用您自己的数据进行可扩展搜索。通过这种集成,您可以通过 Marqo 利用开源或自定义微调模型,向您的法学硕士提供相关结果。
通过这种集成,您可以通过 Marqo 将开源或自定义微调模型用于具有矢量搜索组件的 LangChain 应用程序。 Marqo 矢量存储实现可以插入现有链,例如检索 QA 和会话检索 QA。
⋙ 汉密尔顿
通过这种集成,您可以通过 Marqo 将开源或自定义微调模型用于 Hamilton LLM 应用程序。
?特征 | Marqo 的核心功能。 |
API 的客户端。create_index()
使用默认设置创建一个新索引。您可以选择指定要使用的模型。例如, mq.create_index("my-first-index", model="hf/all_datasets_v4_MiniLM-L6")
获取一个文档列表,表示为用于索引的 python 字典。 tensor_fields
字段设置文档的 ID。否则,Marqo 将生成一个。让我们看看结果:
# let's print out the results:
import pprint
pprint . pprint ( results )
'hits' : [
'Title' : 'Extravehicular Mobility Unit (EMU)' ,
'Description' : 'The EMU is a spacesuit that provides environmental protection, mobility, life support, and'
'communications for astronauts' ,
'_highlights' : [{
'Description' : 'The EMU is a spacesuit that provides environmental protection, '
'mobility, life support, and communications for astronauts'
'_id' : 'article_591' ,
'_score' : 0.61938936
'Title' : 'The Travels of Marco Polo' ,
'Description' : "A 13th-century travelogue describing Polo's travels" ,
'_highlights' : [{ 'Title' : 'The Travels of Marco Polo' }],
'_id' : 'e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a' ,
'_score' : 0.60237324
'limit' : 10 ,
'processingTimeMs' : 49 ,
'query' : 'What is the best outfit to wear on the moon?'
字段。这是文档中与查询最匹配的部分。 通过 ID 检索文档。
result = mq . index ( "my-first-index" ). get_document ( document_id = "article_591" )
results = mq . index ( "my-first-index" ). get_stats ()
result = mq . index ( "my-first-index" ). search ( 'marco polo' , search_method = marqo . SearchMethods . LEXICAL )
为了支持图像和文本搜索,Marqo 允许用户即插即用 HuggingFace 的 CLIP 模型。请注意,如果您未配置多模式搜索,则图像 url 将被视为字符串。要开始对图像进行索引和搜索,首先使用 CLIP 配置创建索引,如下所示:
settings = {
"treat_urls_and_pointers_as_images" : True , # allows us to find an image file and index it
"model" : "ViT-L/14"
response = mq . create_index ( "my-multimodal-index" , ** settings )
然后可以将图像添加到文档中,如下所示。您可以使用来自互联网(例如 S3)或计算机磁盘的 URL:
response = mq . index ( "my-multimodal-index" ). add_documents ([{
"My_Image" : "" ,
"Description" : "The hippopotamus, also called the common hippopotamus or river hippopotamus, is a large semiaquatic mammal native to sub-Saharan Africa" ,
"_id" : "hippo-facts"
}], tensor_fields = [ "My_Image" ])
results = mq . index ( "my-multimodal-index" ). search ( 'animal' )
results = mq . index ( "my-multimodal-index" ). search ( '' )
import marqo
import pprint
mq = marqo . Client ( url = "http://localhost:8882" )
mq . create_index ( "my-weighted-query-index" )
mq . index ( "my-weighted-query-index" ). add_documents (
"Title" : "Smartphone" ,
"Description" : "A smartphone is a portable computer device that combines mobile telephone "
"functions and computing functions into one unit." ,
"Title" : "Telephone" ,
"Description" : "A telephone is a telecommunications device that permits two or more users to"
"conduct a conversation when they are too far apart to be easily heard directly." ,
"Title" : "Thylacine" ,
"Description" : "The thylacine, also commonly known as the Tasmanian tiger or Tasmanian wolf, "
"is an extinct carnivorous marsupial."
"The last known of its species died in 1936." ,
tensor_fields = [ "Description" ]
# initially we ask for a type of communications device which is popular in the 21st century
query = {
# a weighting of 1.1 gives this query slightly more importance
"I need to buy a communications device, what should I get?" : 1.1 ,
# a weighting of 1 gives this query a neutral importance
# this will lead to 'Smartphone' being the top result
"The device should work like an intelligent computer." : 1.0 ,
results = mq . index ( "my-weighted-query-index" ). search ( q = query )
print ( "Query 1:" )
pprint . pprint ( results )
# now we ask for a type of communications which predates the 21st century
query = {
# a weighting of 1 gives this query a neutral importance
"I need to buy a communications device, what should I get?" : 1.0 ,
# a weighting of -1 gives this query a negation effect
# this will lead to 'Telephone' being the top result
"The device should work like an intelligent computer." : - 0.3 ,
results = mq . index ( "my-weighted-query-index" ). search ( q = query )
print ( " n Query 2:" )
pprint . pprint ( results )
Marqo 允许您拥有具有多模式组合字段的索引。多模态组合字段可以将文本和图像组合到一个字段中。这允许对组合的文本和图像字段中的文档进行评分。它还允许使用单个向量表示,而不需要多个向量表示,从而节省存储空间。可以为每个文档设置每个组件的相对权重。
import marqo
import pprint
mq = marqo . Client ( url = "http://localhost:8882" )
settings = { "treat_urls_and_pointers_as_images" : True , "model" : "ViT-L/14" }
mq . create_index ( "my-first-multimodal-index" , ** settings )
mq . index ( "my-first-multimodal-index" ). add_documents (
"Title" : "Flying Plane" ,
"caption" : "An image of a passenger plane flying in front of the moon." ,
"image" : "" ,
"Title" : "Red Bus" ,
"caption" : "A red double decker London bus traveling to Aldwych" ,
"image" : "" ,
"Title" : "Horse Jumping" ,
"caption" : "A person riding a horse over a jump in a competition." ,
"image" : "" ,
# Create the mappings, here we define our captioned_image mapping
# which weights the image more heavily than the caption - these pairs
# will be represented by a single vector in the index
mappings = {
"captioned_image" : {
"type" : "multimodal_combination" ,
"weights" : {
"caption" : 0.3 ,
"image" : 0.7
# We specify which fields to create vectors for.
# Note that captioned_image is treated as a single field.
tensor_fields = [ "captioned_image" ]
# Search this index with a simple text query
results = mq . index ( "my-first-multimodal-index" ). search (
q = "Give me some images of vehicles and modes of transport. I am especially interested in air travel and commercial aeroplanes."
print ( "Query 1:" )
pprint . pprint ( results )
# search the index with a query that uses weighted components
results = mq . index ( "my-first-multimodal-index" ). search (
q = {
"What are some vehicles and modes of transport?" : 1.0 ,
"Aeroplanes and other things that fly" : - 1.0
print ( " n Query 2:" )
pprint . pprint ( results )
results = mq . index ( "my-first-multimodal-index" ). search (
q = { "Animals of the Perissodactyla order" : - 1.0 }
print ( " n Query 3:" )
pprint . pprint ( results )
results = mq . index ( "my-first-index" ). delete_documents ( ids = [ "article_591" , "article_602" ])
results = mq . index ( "my-first-index" ). delete ()
我们支持 Marqo 的 Kubernetes 模板,您可以将其部署在您选择的云提供商上。 Marqo 的 Kubernetes 实现允许您部署具有副本、多个存储分片和多个推理节点的集群。该存储库可以在这里找到:
如果您正在寻找完全托管的云服务,可以在此处注册 Marqo Cloud:。
Marqo 的完整文档可以在这里找到:。
请注意,您不应在 Marqo 的 Vespa 集群上运行其他应用程序,因为 Marqo 会自动更改和调整集群上的设置。
Marqo 是一个社区项目,其目标是让更广泛的开发者社区能够访问张量搜索。我们很高兴您有兴趣提供帮助!请阅读本文以开始使用。
创建虚拟环境python -m venv ./venv
激活虚拟环境source ./venv/bin/activate
从需求文件安装需求: pip install -r requirements.txt
通过运行 tox 文件来运行测试。 CD 进入此目录,然后运行“tox”。
如果更新依赖项,请确保删除 .tox 目录并重新运行。
创建带有附加 github 问题的拉取请求。