clip as serviceダウンロード

clip as service

その他のソースコード

v0.8.3

ダウンロード

クリップとしてのサービスロゴ：非構造化データのデータ構造

クリップとしてのサービスは、画像とテキストを埋め込むための低遅延の高スケーラビリティサービスです。マイクロサービスとしてニューラル検索ソリューションに簡単に統合できます。

fast Fast ：Tensorrt、ONNX Runtime、Pytorchを備えたクリップモデルを800qps ^[*]で提供します。ノンブロッキングデュプレックスストリーミングリクエストと応答に応じて、大規模なデータと長期にわたるタスク用に設計されています。

？弾性：自動ロードバランスを備えた単一GPU上の複数のクリップモデルを水平にスケーリングします。

？使いやすい：学習曲線なし、クライアントとサーバーのミニマリストデザイン。画像と文の埋め込みのための直感的で一貫したAPI。

？モダン：非同期クライアントサポート。 TLSと圧縮を備えたGRPC、HTTP、WebSocketプロトコルを簡単に切り替えます。

？統合：JinaやDocarrayを含むニューラル検索エコシステムとのスムーズな統合。クロスモーダルとマルチモーダルソリューションをすぐに構築します。

^{[*] geforce rtx 3090のデフォルト構成（シングルレプリカ、pytorch no jit）を使用。}

テキストと画像の埋め込み

https経由で？

grpc？⚡⚡

curl 
-X POST https:// < your-inference-address > -http.wolf.jina.ai/post 
-H ' Content-Type: application/json ' 
-H ' Authorization: <your access token> ' 
-d ' {"data":[{"text": "First do it"}, 
    {"text": "then do it right"}, 
    {"text": "then do it better"}, 
    {"uri": "https://picsum.photos/200"}], 
    "execEndpoint":"/"} '

 # pip install clip-client
from clip_client import Client

c = Client (
    'grpcs://<your-inference-address>-grpc.wolf.jina.ai' ,
    credential = { 'Authorization' : '<your access token>' },
)

r = c . encode (
    [
        'First do it' ,
        'then do it right' ,
        'then do it better' ,
        'https://picsum.photos/200' ,
    ]
)
print ( r )

視覚的な推論

4つの基本的な視覚的推論スキルがあります：オブジェクト認識、オブジェクトカウント、カラー認識、空間関係の理解。試してみましょう：

jq （JSONプロセッサ）をインストールして、結果を軽視する必要があります。

画像	https経由で？
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/1/300/300", "matches": [{"text": "there is a woman in the photo"}, {"text": "there is a man in the photo"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " 与える： `"there is a woman in the photo" 0.626907229423523 "there is a man in the photo" 0.37309277057647705`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/133/300/300", "matches": [ {"text": "the blue car is on the left, the red car is on the right"}, {"text": "the blue car is on the right, the red car is on the left"}, {"text": "the blue car is on top of the red car"}, {"text": "the blue car is below the red car"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " 与える： `"the blue car is on the left, the red car is on the right" 0.5232442617416382 "the blue car is on the right, the red car is on the left" 0.32878655195236206 "the blue car is below the red car" 0.11064132302999496 "the blue car is on top of the red car" 0.03732786327600479`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/102/300/300", "matches": [{"text": "this is a photo of one berry"}, {"text": "this is a photo of two berries"}, {"text": "this is a photo of three berries"}, {"text": "this is a photo of four berries"}, {"text": "this is a photo of five berries"}, {"text": "this is a photo of six berries"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " 与える： `"this is a photo of three berries" 0.48507222533226013 "this is a photo of four berries" 0.2377079576253891 "this is a photo of one berry" 0.11304923892021179 "this is a photo of five berries" 0.0731358453631401 "this is a photo of two berries" 0.05045759305357933 "this is a photo of six berries" 0.04057715833187103`

ドキュメント

インストール

クリップとしてのサービスは、2つのPythonパッケージclip-serverとclip-clientで構成され、独立してインストールできます。どちらもPython 3.7+が必要です。

サーバーをインストールします

Pytorchランタイム⚡	onnxランタイム⚡⚡	Tensorrtランタイム⚡⚡⚡
pip install clip-server	pip install " clip-server[onnx] "	pip install nvidia-pyindex pip install " clip-server[tensorrt] "

Google Colabでサーバーをホストして、無料のGPU/TPUを活用することもできます。

クライアントをインストールします

pip install clip-client

クイックチェック

インストール後に簡単な接続チェックを実行できます。

c/s	指示	出力を期待してください
サーバ	python -m clip_server
クライアント	from clip_client import Client c = Client ( 'grpc://0.0.0.0:23456' ) c . profile ()

0.0.0.0イントラネットまたはパブリックIPアドレスに変更して、プライベートおよびパブリックネットワーク上の接続性をテストできます。

始めましょう

基本的な使用法

サーバーを起動： python -m clip_server 。そのアドレスとポートを覚えておいてください。

クライアントを作成します：

 from clip_client import Client

 c = Client ( 'grpc://0.0.0.0:51000' )

文の埋め込みを取得するには：

 r = c . encode ([ 'First do it' , 'then do it right' , 'then do it better' ])

print ( r . shape )  # [3, 512]

画像の埋め込みを取得するには：

 r = c . encode ([ 'apple.png' ,  # local image 
              'https://clip-as-service.jina.ai/_static/favicon.png' ,  # remote image
              'data:image/gif;base64,R0lGODlhEAAQAMQAAORHHOVSKudfOulrSOp3WOyDZu6QdvCchPGolfO0o/XBs/fNwfjZ0frl3/zy7////wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAkAABAALAAAAAAQABAAAAVVICSOZGlCQAosJ6mu7fiyZeKqNKToQGDsM8hBADgUXoGAiqhSvp5QAnQKGIgUhwFUYLCVDFCrKUE1lBavAViFIDlTImbKC5Gm2hB0SlBCBMQiB0UjIQA7' ])  # in image URI

print ( r . shape )  # [3, 512]

より包括的なサーバーとクライアントのユーザーガイドは、ドキュメントにあります。

10行のテキストからイメージのクロスモーダル検索

Clip-As-Serviceを使用して、テキストから画像への検索を作成しましょう。つまり、ユーザーは文を入力でき、プログラムは一致する画像を返します。完全にデータセットとDocarrayパッケージのように見えるものを使用します。 Docarrayは上流の依存関係としてclip-clientに含まれているため、個別にインストールする必要はありません。

画像を読み込みます

最初に画像をロードします。 Jina Cloudから引っ張ることができます。

 from docarray import DocumentArray

da = DocumentArray . pull ( 'ttl-original' , show_progress = True , local_cache = True )

または、TTLデータセット、UNZIP、手動でロードすることをダウンロードします

または、公式ウェブサイトのように見える、画像を解凍し、ロードすることもできます。

 from docarray import DocumentArray

da = DocumentArray . from_files ([ 'left/*.jpg' , 'right/*.jpg' ])

データセットには12,032枚の画像が含まれているため、引っ張るのに時間がかかる場合があります。完了したら、それを視覚化して、それらの画像の最初の味を得ることができます。

 da . plot_image_sprites ()

画像の視覚化は完全にデータセットのように見えます

画像をエンコードします

python -m clip_serverでサーバーを起動します。 GRPCプロトコルを使用して0.0.0.0:51000であるとしましょう（サーバーを実行した後、この情報を取得します）。

Pythonクライアントスクリプトを作成します。

 from clip_client import Client

c = Client ( server = 'grpc://0.0.0.0:51000' )

da = c . encode ( da , show_progress = True )

GPUおよびクライアントサーバーネットワークに応じて、12K画像を埋め込むのに時間がかかる場合があります。私の場合、約2分かかりました。

事前にエンコードされたデータセットをダウンロードします

あなたがせっかちであるか、GPUを持っていない場合、待つことは地獄になる可能性があります。この場合、事前にエンコードされた画像データセットを引くことができます。

 from docarray import DocumentArray

da = DocumentArray . pull ( 'ttl-embedding' , show_progress = True , local_cache = True )

文で検索します

ユーザーが文を入力できるようにするための簡単なプロンプトを作成しましょう。

 while True :
    vec = c . encode ([ input ( 'sentence> ' )])
    r = da . find ( query = vec , limit = 9 )
    r [ 0 ]. plot_image_sprites ()

ショーケース

これで、任意の英語文を入力して、一致する上位の画像を表示できます。検索は高速で本能的です。楽しんでみましょう：

「ハッピーポテト」	「超邪悪なai」	「ハンバーガーを楽しんでいる男」

「キャット教授はとても深刻です」	「自我エンジニアは親と一緒に住んでいます」	「明日はないので、不健康な食べましょう」

次の例の埋め込み結果を保存しましょう。

 da . save_binary ( 'ttl-image' )

10行で画像間クロスモーダル検索

また、最後のプログラムの入力と出力を切り替えて、画像間検索を実現することもできます。正確には、クエリ画像が与えられた場合、画像を最もよく説明する文を見つけます。

本「Pride and Prejudice」のすべての文を使用しましょう。

 from docarray import Document , DocumentArray

d = Document ( uri = 'https://www.gutenberg.org/files/1342/1342-0.txt' ). load_uri_to_text ()
da = DocumentArray (
    Document ( text = s . strip ()) for s in d . text . replace ( ' r n ' , '' ). split ( '.' ) if s . strip ()
)

私たちが得たものを見てみましょう：

 da . summary ()

            Documents Summary            
                                         
  Length                 6403            
  Homogenous Documents   True            
  Common Attributes      ('id', 'text')  
                                         
                     Attributes Summary                     
                                                            
  Attribute   Data type   #Unique values   Has empty value  
 ────────────────────────────────────────────────────────── 
  id          ('str',)    6403             False            
  text        ('str',)    6030             False

文をエンコードします

これらの6,403文をエンコードすると、GPUとネットワークに応じて10秒以内にかかる場合があります。

 from clip_client import Client

c = Client ( 'grpc://0.0.0.0:51000' )

r = c . encode ( da , show_progress = True )

事前にエンコードされたデータセットをダウンロードします

繰り返しますが、焦りがちな人やGPUを持っていない人のために、事前にエンコードされたテキストデータセットを準備しました。

 from docarray import DocumentArray

da = DocumentArray . pull ( 'ttl-textual' , show_progress = True , local_cache = True )

画像を介して検索します

以前に保存されていた画像の埋め込みをロードし、10個の画像ドキュメントをランダムにサンプリングしてから、それぞれのトップ1近くの隣接を見つけましょう。

 from docarray import DocumentArray

img_da = DocumentArray . load_binary ( 'ttl-image' )

for d in img_da . sample ( 10 ):
    print ( da . find ( d . embedding , limit = 1 )[ 0 ]. text )

ショーケース

楽しい時間！注意して、前の例とは異なり、ここでは入力は画像であり、文は出力です。すべての文章は、本「プライドと偏見」から来ています。


その上、彼のルックスには真実がありました	ガーディナーは微笑んだ	彼のお名前は	しかし、ティータイムまでに、用量は十分であり、MR	あなたはよく見えません


「ゲーマン！」彼女は泣いた	ベルで私の名前に言及すれば、あなたは	リジーの髪を見逃さないでください	エリザベスはまもなく氏の妻になります	私は前の晩に彼らを見ました

クリップモデルを介して画像テキストマッチをランク付けします

0.3.0からクリップとしてのサービスは、クリップモデルの共同可能性に応じてクロスモーダルマッチを再ランクする新しい/rankエンドポイントを追加します。たとえば、以下のように、いくつかの事前定義された文が一致する画像ドキュメントが与えられています。

 from clip_client import Client
from docarray import Document

c = Client ( server = 'grpc://0.0.0.0:51000' )
r = c . rank (
    [
        Document (
            uri = '.github/README-img/rerank.png' ,
            matches = [
                Document ( text = f'a photo of a { p } ' )
                for p in (
                    'control room' ,
                    'lecture room' ,
                    'conference room' ,
                    'podium indoor' ,
                    'television studio' ,
                )
            ],
        )
    ]
)

print ( r [ '@m' , [ 'text' , 'scores__clip_score__value' ]])

 [['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'], 
[0.9920725226402283, 0.006038925610482693, 0.0009973491542041302, 0.00078492151806131, 0.00010626466246321797]]

a photo of a television studio 0.992のclip_scoreスコアでトップにランクされていることがわかります。実際には、このエンドポイントを使用して、他の検索システムからマッチング結果を再ランクし、クロスモーダルの検索品質を向上させることができます。

クリップモデルを介してテキストイメージの一致をランク付けします

Dall・E Flowプロジェクトでは、ClipがDall・eから生成された結果をランク付けするために呼び出されます。 clip-clientの上に巻き付けられたエグゼキューターがあり、 .arank() - .rank()の非同期バージョンを呼び出します。

 from clip_client import Client
from jina import Executor , requests , DocumentArray


class ReRank ( Executor ):
    def __init__ ( self , clip_server : str , ** kwargs ):
        super (). __init__ ( ** kwargs )
        self . _client = Client ( server = clip_server )

    @ requests ( on = '/' )
    async def rerank ( self , docs : DocumentArray , ** kwargs ):
        return await self . _client . arank ( docs )

ダレフローで使用されるサービスとしてのクリップ

興味をそそられましたか？それは、サービスとしてのクリップの表面を引っ掻くだけです。詳細については、ドキュメントを読んでください。

サポート

私たちの不一致コミュニティに参加し、アイデアについて他のコミュニティメンバーとチャットしてください。
エンジニアリングをすべて視聴して、ジナの新機能を学び、最新のAIテクニックを最新の状態に保ちます。
YouTubeチャンネルで最新のビデオチュートリアルを購読する

参加しませんか

クリップとしてのサービスは、Jina AIに支えられ、Apache-2.0に基づいてライセンスされています。私たちは、AIエンジニアであるソリューションエンジニアを積極的に採用して、オープンソースで次のニューラル検索エコシステムを構築しています。

拡大する

追加情報

バージョン v0.8.3
タイプその他のソースコード
更新時間 2025-03-02
サイズ 11.55MB
から Github

画像	https経由で？
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/1/300/300", "matches": [{"text": "there is a woman in the photo"}, {"text": "there is a man in the photo"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " 与える： `"there is a woman in the photo" 0.626907229423523 "there is a man in the photo" 0.37309277057647705`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/133/300/300", "matches": [ {"text": "the blue car is on the left, the red car is on the right"}, {"text": "the blue car is on the right, the red car is on the left"}, {"text": "the blue car is on top of the red car"}, {"text": "the blue car is below the red car"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " 与える： `"the blue car is on the left, the red car is on the right" 0.5232442617416382 "the blue car is on the right, the red car is on the left" 0.32878655195236206 "the blue car is below the red car" 0.11064132302999496 "the blue car is on top of the red car" 0.03732786327600479`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/102/300/300", "matches": [{"text": "this is a photo of one berry"}, {"text": "this is a photo of two berries"}, {"text": "this is a photo of three berries"}, {"text": "this is a photo of four berries"}, {"text": "this is a photo of five berries"}, {"text": "this is a photo of six berries"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " 与える： `"this is a photo of three berries" 0.48507222533226013 "this is a photo of four berries" 0.2377079576253891 "this is a photo of one berry" 0.11304923892021179 "this is a photo of five berries" 0.0731358453631401 "this is a photo of two berries" 0.05045759305357933 "this is a photo of six berries" 0.04057715833187103`

clip as service

テキストと画像の埋め込み

視覚的な推論

ドキュメント

インストール

サーバーをインストールします

クライアントをインストールします

クイックチェック

始めましょう

基本的な使用法

10行のテキストからイメージのクロスモーダル検索

画像を読み込みます

画像をエンコードします

文で検索します

ショーケース

10行で画像間クロスモーダル検索

文をエンコードします

画像を介して検索します

ショーケース

クリップモデルを介して画像テキストマッチをランク付けします

クリップモデルを介してテキストイメージの一致をランク付けします

サポート

参加しませんか

Inf CLIP

フルサービス中国語版

穴として

ホールソフトウェアとして

ヘッドASコード

クリップバケット

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

chat.petals.dev

Sunamu

waymo open dataset

termwind

wp functions