marqoダウンロード - marqoソースコードのダウンロード

マルコ

Marqo は単なるベクトルデータベースではなく、テキストと画像の両方を対象としたエンドツーエンドのベクトル検索エンジンです。ベクトルの生成、保存、取得は、単一の API を通じてすぐに使用できます。自分で埋め込み材を持ち込む必要はありません。

なぜマルコなのか？

ベクトルの類似性だけではベクトル検索には十分ではありません。ベクトル検索にはベクトルデータベースだけではなく、機械学習 (ML) の導入と管理、入力の前処理と変換、モデルを再トレーニングせずに検索動作を変更する機能も必要です。 Marqo にはこれらすべての要素が含まれており、開発者は最小限の労力でベクトル検索をアプリケーションに組み込むことができます。機能の完全なリストは以下でご覧いただけます。

エンベディング生成とベクトル検索をバンドルする理由は何ですか?

ベクトルデータベースはベクトルの類似性に特化したコンポーネントであり、ベクトル検索システムの 1 つのコンポーネントのみにサービスを提供します。それらは「ベクトル入力 - ベクトル出力」です。依然として、ベクトルの生成、ML モデルの管理、関連するオーケストレーションと入力の処理が必要です。 Marqo は、「ドキュメントを入力し、ドキュメントを出力」することでこれを簡単にします。テキストと画像の前処理、コンテンツの埋め込み、メタデータの保存、推論とストレージの展開はすべて Marqo によって処理されます。

クイックスタート

以下は、Marqo を使用したベクトル検索の最小限の例のコードスニペットです (「はじめに」を参照)。

Marq には Docker が必要です。 Docker をインストールするには、Docker 公式 Web サイトにアクセスしてください。 Docker に少なくとも 8 GB のメモリと 50 GB のストレージがあることを確認してください。 Docker デスクトップでこれを行うには、設定アイコンをクリックし、[リソース] をクリックして、8GB メモリを選択します。
docker を使用して Marqo を実行します。

docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest

Marqo クライアントをインストールします。

pip install marqo

インデックス作成と検索を始めましょう。以下の簡単な例を見てみましょう。

 import marqo

mq = marqo . Client ( url = 'http://localhost:8882' )

mq . create_index ( "my-first-index" , model = "hf/e5-base-v2" )

mq . index ( "my-first-index" ). add_documents ([
    {
        "Title" : "The Travels of Marco Polo" ,
        "Description" : "A 13th-century travelogue describing Polo's travels"
    }, 
    {
        "Title" : "Extravehicular Mobility Unit (EMU)" ,
        "Description" : "The EMU is a spacesuit that provides environmental protection, "
                       "mobility, life support, and communications for astronauts" ,
        "_id" : "article_591"
    }],
    tensor_fields = [ "Description" ]
)

results = mq . index ( "my-first-index" ). search (
    q = "What is the best outfit to wear on the moon?"
)

コア機能

?最先端の埋め込み

PyTorch、Huggingface、OpenAI などの最新の機械学習モデルを使用します。
事前構成されたモデルから始めるか、独自のモデルを持ち込んでください。
CPUとGPUのサポート。

⚡ パフォーマンス

エンベディングはメモリ内の HNSW インデックスに保存され、最先端の検索速度を実現します。
水平インデックスシャーディングにより、数億のドキュメントインデックスに拡張できます。
非同期かつノンブロッキングのデータのアップロードと検索。

?ドキュメントインドキュメントアウト

ベクトルの生成、保存、取得はすぐに利用できる機能を備えています。
テキストと画像を使用して、検索、エンティティ解決、データ探索アプリケーションを構築します。
重み付けされた検索語を組み合わせて、複雑なセマンティッククエリを構築します。
Marqo のクエリ DSL を使用して検索結果をフィルタリングします。
bool、int、キーワードなどのサポートされているさまざまなデータ型を使用して、非構造化データと半構造化メタデータをドキュメントにまとめて保存します。

?マネージドクラウド

低遅延で最適化された Marqo の展開。
ボタンをクリックするだけでスケール推論を実行します。
高可用性。
24時間年中無休のサポート。
アクセス制御。
詳細については、こちらをご覧ください。

統合

Marqo は一般的な AI およびデータ処理フレームワークに統合されており、さらに多くのフレームワークが開発される予定です。

?干し草の山

Haystack は、LLM、埋め込みモデルなどの NLP テクノロジーを利用するアプリケーションを構築するためのオープンソースフレームワークです。この統合により、検索拡張、質問応答、ドキュメント検索などの Haystack パイプラインのドキュメントストアとして Marqo を使用できるようになります。

?グリップテープ

Griptape により、エンタープライズアプリケーション向けの LLM ベースのエージェントの安全かつ信頼性の高い展開が可能になり、MarqoVectorStoreDriver により、これらのエージェントが独自のデータを使用したスケーラブルな検索にアクセスできるようになります。この統合により、Marqo を介してオープンソースまたはカスタムの微調整されたモデルを活用して、関連する結果を LLM に提供できるようになります。

??ラングチェーン

この統合により、ベクトル検索コンポーネントを備えた LangChain アプリケーションに対して Marqo を通じてオープンソースモデルやカスタムの微調整されたモデルを活用できるようになります。 Marqo ベクトルストアの実装は、検索 QA や会話型検索 QA などの既存のチェーンにプラグインできます。

⋙ ハミルトン

この統合により、Marqo を介してオープンソースモデルやカスタムの微調整されたモデルを Hamilton LLM アプリケーションに活用できるようになります。

マルコについて詳しく見る


?クイックスタート	Marqo を使用して最初のアプリケーションを 5 分以内に構築します。
?画像データのMarqo	Marqo を使用して高度な画像検索を構築します。
テキスト用の Marqo	Marqo で多言語データベースを構築します。
? Marqo と GPT の統合	Marqo を知識ベースとして使用することで、GPT を対象分野の専門家にします。
?クリエイティブ AI の Marq	安定した拡散とセマンティック検索を組み合わせて、ホットドッグの 10 万枚の画像を生成して分類します。
? Marqo と音声データ	Marqo と ChatGPT を使用した Q&A の音声を前処理するために、日記と文字起こしを追加します。
コンテンツモデレーションのための Marqo	Marqo を使用して高度な画像検索を構築し、コンテンツを検索して削除します。
☁️ Marqo Cloud を始める	初めてのログインから Marqo を使用した最初のアプリケーションの構築まで、Marqo Cloud をセットアップして実行する方法を説明します。
?電子商取引の Marqo	このプロジェクトは、Python、Flask、ReactJS、Typescript を使用したフロントエンドとバックエンドを備えた Web アプリケーションです。フロントエンドは、Flask アプリケーションであるバックエンドにリクエストを行う ReactJS アプリケーションです。バックエンドは Marqo クラウド API にリクエストを送信します。
?マルコチャットボット	このガイドでは、Marqo と OpenAI の ChatGPT API を使用してチャットボットアプリケーションを構築します。既存のコードベースから始めて、動作をカスタマイズする方法を説明します。
?特徴	Marqo のコア機能。

はじめる

Marq には Docker が必要です。 Docker をインストールするには、Docker 公式 Web サイトにアクセスしてください。 Docker に少なくとも 8 GB のメモリと 50 GB のストレージがあることを確認してください。
docker を使用して Marqo を実行します。

docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -p 8882:8882 marqoai/marqo:latest

注: marqoコンテナーが強制終了され続ける場合は、Docker に割り当てられているメモリの不足が原因である可能性が高くなります。 Docker 設定で Docker のメモリ制限を少なくとも 6GB (8GB を推奨) に増やすと、問題が解決する可能性があります。

Marqo クライアントをインストールします。

pip install marqo

インデックス作成と検索を始めましょう。以下の簡単な例を見てみましょう。

 import marqo

mq = marqo . Client ( url = 'http://localhost:8882' )

mq . create_index ( "my-first-index" )

mq . index ( "my-first-index" ). add_documents ([
    {
        "Title" : "The Travels of Marco Polo" ,
        "Description" : "A 13th-century travelogue describing Polo's travels"
    }, 
    {
        "Title" : "Extravehicular Mobility Unit (EMU)" ,
        "Description" : "The EMU is a spacesuit that provides environmental protection, "
                       "mobility, life support, and communications for astronauts" ,
        "_id" : "article_591"
    }],
    tensor_fields = [ "Description" ]
)

results = mq . index ( "my-first-index" ). search (
    q = "What is the best outfit to wear on the moon?"
)

mq marqo API をラップするクライアントです。
create_index()デフォルト設定で新しいインデックスを作成します。使用するモデルを指定するオプションがあります。たとえば、 mq.create_index("my-first-index", model="hf/all_datasets_v4_MiniLM-L6")デフォルトのテキストモデルhf/all_datasets_v4_MiniLM-L6を使用してインデックスを作成します。特定の使用例に最適な取得を実現するには、さまざまなモデルを試してみる必要があることがよくあります。モデルが異なると、推論速度と関連性の間でトレードオフが生じます。モデルの完全なリストについては、ここを参照してください。
add_documents() 、インデックス作成用の Python dict として表されるドキュメントのリストを受け取ります。 tensor_fieldsベクトルコレクションとしてインデックスが付けられ、検索可能になるフィールドを指します。
オプションで、特別な_idフィールドを使用してドキュメントの ID を設定できます。それ以外の場合は、Marqo が生成します。

結果を見てみましょう:

 # let's print out the results:
import pprint
pprint . pprint ( results )

{
    'hits' : [
        {
            'Title' : 'Extravehicular Mobility Unit (EMU)' ,
            'Description' : 'The EMU is a spacesuit that provides environmental protection, mobility, life support, and'
                           'communications for astronauts' ,
            '_highlights' : [{
                'Description' : 'The EMU is a spacesuit that provides environmental protection, '
                               'mobility, life support, and communications for astronauts'
            }],
            '_id' : 'article_591' ,
            '_score' : 0.61938936
        },
        {
            'Title' : 'The Travels of Marco Polo' ,
            'Description' : "A 13th-century travelogue describing Polo's travels" ,
            '_highlights' : [{ 'Title' : 'The Travels of Marco Polo' }],
            '_id' : 'e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a' ,
            '_score' : 0.60237324
        }
    ],
    'limit' : 10 ,
    'processingTimeMs' : 49 ,
    'query' : 'What is the best outfit to wear on the moon?'
}

各ヒットは、検索クエリに一致したドキュメントに対応します。
これらは、最も一致するものから最も一致しないものの順に並べられます。
limit返されるヒットの最大数です。検索時にパラメータとして設定できます。
各ヒットには_highlightsフィールドがあります。これは、クエリに最もよく一致したドキュメントの部分でした。

その他の基本操作

ドキュメントの取得

ID でドキュメントを取得します。

 result = mq . index ( "my-first-index" ). get_document ( document_id = "article_591" )

同じ_id使用してadd_documents再度使用してドキュメントを追加すると、ドキュメントが更新されることに注意してください。

インデックス統計を取得する

インデックスに関する情報を取得します。

 results = mq . index ( "my-first-index" ). get_stats ()

語彙検索

キーワード検索を実行します。

 result = mq . index ( "my-first-index" ). search ( 'marco polo' , search_method = marqo . SearchMethods . LEXICAL )

マルチモーダルおよびクロスモーダル検索

画像およびテキスト検索を強化するために、Marqo ではユーザーが HuggingFace の CLIP モデルをプラグアンドプレイできるようにしています。マルチモーダル検索を構成しない場合、画像 URL は文字列として扱われることに注意してください。画像のインデックス作成と検索を開始するには、まず以下のように CLIP 構成でインデックスを作成します。

 settings = {
    "treat_urls_and_pointers_as_images" : True ,   # allows us to find an image file and index it 
    "model" : "ViT-L/14"
}
response = mq . create_index ( "my-multimodal-index" , ** settings )

次のようにして、ドキュメント内に画像を追加できます。インターネット (S3 など) またはマシンのディスクからの URL を使用できます。

 response = mq . index ( "my-multimodal-index" ). add_documents ([{
    "My_Image" : "https://raw.githubusercontent.com/marqo-ai/marqo-api-tests/mainline/assets/ai_hippo_realistic.png" ,
    "Description" : "The hippopotamus, also called the common hippopotamus or river hippopotamus, is a large semiaquatic mammal native to sub-Saharan Africa" ,
    "_id" : "hippo-facts"
}], tensor_fields = [ "My_Image" ])

その後、テキストを使用して画像フィールドを検索できます。

 results = mq . index ( "my-multimodal-index" ). search ( 'animal' )

画像を使って検索する

画像リンクを設けることで画像を使った検索が可能になります。

 results = mq . index ( "my-multimodal-index" ). search ( 'https://raw.githubusercontent.com/marqo-ai/marqo-api-tests/mainline/assets/ai_hippo_statue.png' )

クエリで重みを使用した検索

クエリは、各キーがクエリであり、対応する値が重みである辞書として提供することもできます。これにより、複数のコンポーネントで構成されるより高度なクエリが可能になり、それらに向けてまたは反対の重み付けを行うことができます。クエリには負の重み付けによる否定を含めることができます。

以下の例は、ユーザーが質問をしたいが、特定の意味基準に一致する結果を否定するシナリオへのこれの適用を示しています。

 import marqo
import pprint

mq = marqo . Client ( url = "http://localhost:8882" )

mq . create_index ( "my-weighted-query-index" )

mq . index ( "my-weighted-query-index" ). add_documents (
    [
        {
            "Title" : "Smartphone" ,
            "Description" : "A smartphone is a portable computer device that combines mobile telephone "
            "functions and computing functions into one unit." ,
        },
        {
            "Title" : "Telephone" ,
            "Description" : "A telephone is a telecommunications device that permits two or more users to"
            "conduct a conversation when they are too far apart to be easily heard directly." ,
        },
        {
            "Title" : "Thylacine" ,
            "Description" : "The thylacine, also commonly known as the Tasmanian tiger or Tasmanian wolf, "
            "is an extinct carnivorous marsupial."
            "The last known of its species died in 1936." ,
        }
    ],
    tensor_fields = [ "Description" ]
)

# initially we ask for a type of communications device which is popular in the 21st century
query = {
    # a weighting of 1.1 gives this query slightly more importance
    "I need to buy a communications device, what should I get?" : 1.1 ,
    # a weighting of 1 gives this query a neutral importance
    # this will lead to 'Smartphone' being the top result
    "The device should work like an intelligent computer." : 1.0 ,
}

results = mq . index ( "my-weighted-query-index" ). search ( q = query )

print ( "Query 1:" )
pprint . pprint ( results )

# now we ask for a type of communications which predates the 21st century
query = {
    # a weighting of 1 gives this query a neutral importance
    "I need to buy a communications device, what should I get?" : 1.0 ,
    # a weighting of -1 gives this query a negation effect
    # this will lead to 'Telephone' being the top result
    "The device should work like an intelligent computer." : - 0.3 ,
}

results = mq . index ( "my-weighted-query-index" ). search ( q = query )

print ( " n Query 2:" )
pprint . pprint ( results )

マルチモーダル組み合わせフィールドを使用したインデックスの作成と検索

Marqo を使用すると、マルチモーダルな組み合わせフィールドを含むインデックスを作成できます。マルチモーダル結合フィールドでは、テキストと画像を 1 つのフィールドに結合できます。これにより、テキストフィールドと画像フィールドを組み合わせてドキュメントをスコアリングすることができます。また、多数のベクトル表現を必要とする代わりに単一のベクトル表現が可能になり、ストレージを節約できます。各コンポーネントの相対的な重み付けはドキュメントごとに設定できます。

以下の例は、複数の種類のクエリを使用してキャプションと画像のペアを取得することでこれを示しています。

 import marqo
import pprint

mq = marqo . Client ( url = "http://localhost:8882" )

settings = { "treat_urls_and_pointers_as_images" : True , "model" : "ViT-L/14" }

mq . create_index ( "my-first-multimodal-index" , ** settings )

mq . index ( "my-first-multimodal-index" ). add_documents (
    [
        {
            "Title" : "Flying Plane" ,
            "caption" : "An image of a passenger plane flying in front of the moon." ,
            "image" : "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg" ,
        },
        {
            "Title" : "Red Bus" ,
            "caption" : "A red double decker London bus traveling to Aldwych" ,
            "image" : "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg" ,
        },
        {
            "Title" : "Horse Jumping" ,
            "caption" : "A person riding a horse over a jump in a competition." ,
            "image" : "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image1.jpg" ,
        },
    ],
    # Create the mappings, here we define our captioned_image mapping 
    # which weights the image more heavily than the caption - these pairs 
    # will be represented by a single vector in the index
    mappings = {
        "captioned_image" : {
            "type" : "multimodal_combination" ,
            "weights" : {
                "caption" : 0.3 ,
                "image" : 0.7
            }
        }
    },
    # We specify which fields to create vectors for. 
    # Note that captioned_image is treated as a single field.
    tensor_fields = [ "captioned_image" ]
)

# Search this index with a simple text query
results = mq . index ( "my-first-multimodal-index" ). search (
    q = "Give me some images of vehicles and modes of transport. I am especially interested in air travel and commercial aeroplanes."
)

print ( "Query 1:" )
pprint . pprint ( results )

# search the index with a query that uses weighted components
results = mq . index ( "my-first-multimodal-index" ). search (
    q = {
        "What are some vehicles and modes of transport?" : 1.0 ,
        "Aeroplanes and other things that fly" : - 1.0
    },
)
print ( " n Query 2:" )
pprint . pprint ( results )

results = mq . index ( "my-first-multimodal-index" ). search (
    q = { "Animals of the Perissodactyla order" : - 1.0 }
)
print ( " n Query 3:" )
pprint . pprint ( results )

文書の削除

文書を削除します。

 results = mq . index ( "my-first-index" ). delete_documents ( ids = [ "article_591" , "article_602" ])

インデックスの削除

インデックスを削除します。

 results = mq . index ( "my-first-index" ). delete ()

Marq オープンソースを実稼働環境で実行する

私たちは、選択したクラウドプロバイダーにデプロイできる Marqo 用の Kubernetes テンプレートをサポートしています。 Marqo の Kubernetes 実装を使用すると、レプリカ、複数のストレージシャード、および複数の推論ノードを含むクラスターをデプロイできます。リポジトリはここにあります: https://github.com/marqo-ai/marqo-on-kubernetes

フルマネージドのクラウドサービスをお探しの場合は、https://cloud.marqo.ai から Marqo Cloud にサインアップできます。

ドキュメント

Marqo の完全なドキュメントは、https://docs.marqo.ai/ にあります。

警告

Marqo はクラスター上の設定を自動的に変更して適応させるため、Marqo の Vespa クラスター上で他のアプリケーションを実行しないでください。

貢献者

Marqo は、より広範な開発者コミュニティがテンソル検索にアクセスできるようにすることを目的としたコミュニティプロジェクトです。ご協力いただけると幸いです。始めるにはこれをお読みください。

開発セットアップ

仮想環境を作成しますpython -m venv ./venv 。
仮想環境source ./venv/bin/activateをアクティブ化します。
要件ファイルから要件をインストールします: pip install -r requirements.txt 。
tox ファイルを実行してテストを実行します。 CD でこのディレクトリに移動し、「tox」を実行します。
依存関係を更新する場合は、必ず .tox ディレクトリを削除して再実行してください。

マージ手順:

完全なテストスイートを実行します (このディレクトリでコマンドtoxを使用します)。
Github Issue を添付したプルリクエストを作成します。

サポート

私たちの談話フォーラムで質問したり、あなたの作品をコミュニティと共有したりしてください。
Slack コミュニティに参加して、他のコミュニティメンバーとアイデアについてチャットしましょう。

拡大する

marqo

マルコ

コア機能

統合

マルコについて詳しく見る

はじめる

その他の基本操作

ドキュメントの取得

インデックス統計を取得する

語彙検索

マルチモーダルおよびクロスモーダル検索

画像を使って検索する

クエリで重みを使用した検索

マルチモーダル組み合わせフィールドを使用したインデックスの作成と検索

文書の削除

インデックスの削除

Marq オープンソースを実稼働環境で実行する

ドキュメント

警告

貢献者

開発セットアップ

マージ手順:

サポート

waymo open dataset

SmartTube

Sunamu

MySchedule.py

viptools for eslam

VITAident

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

SmartTube

Sunamu

waymo open dataset

wp functions

termwind