clip as service Download - clip as service Source Code Download

clip as service

Anderer Quellcode

v0.8.3

Herunterladen

Clip-as-Service-Logo: Die Datenstruktur für unstrukturierte Daten

Clip-as-Service ist ein hochkalkalierbarer Service mit niedriger Latenz zum Einbetten von Bildern und Text. Es kann leicht als Mikroservice in neuronale Suchlösungen integriert werden.

⚡ Schnell : Servieren Sie Clip -Modelle mit Tensorrt, Onnx -Laufzeit und Pytorch mit JIT mit 800QPs ^[*] . Nicht blockierter Duplex-Streaming auf Anfragen und Antworten, die für große Daten und langlebige Aufgaben entwickelt wurden.

? GLASTISCH : Horizontal skalieren Sie mehrere Clip -Modelle für einzelne GPU mit automatischem Lastausgleich.

? Einfach zu bedienende : Keine Lernkurve, minimalistisches Design auf Client und Server. Intuitive und konsistente API für Bild- und Satzeinbettung.

? Modern : Async Client Support. Wechseln Sie einfach zwischen GRPC, HTTP, WebSocket -Protokollen mit TLS und Komprimierung.

? Integration : Glätte Integration in das Ökosystem der neuronalen Suche einschließlich Jina und Docarray. Bauen Sie in kürzester Zeit Kreuzmodale und multimodale Lösungen auf.

^{[*] mit Standardkonfiguration (Einzelreplik, Pytorch no JIT) auf GeForce RTX 3090.}

Text- und Bildeinbettung

via https?

über GRPC? ⚡⚡

curl 
-X POST https:// < your-inference-address > -http.wolf.jina.ai/post 
-H ' Content-Type: application/json ' 
-H ' Authorization: <your access token> ' 
-d ' {"data":[{"text": "First do it"}, 
    {"text": "then do it right"}, 
    {"text": "then do it better"}, 
    {"uri": "https://picsum.photos/200"}], 
    "execEndpoint":"/"} '

 # pip install clip-client
from clip_client import Client

c = Client (
    'grpcs://<your-inference-address>-grpc.wolf.jina.ai' ,
    credential = { 'Authorization' : '<your access token>' },
)

r = c . encode (
    [
        'First do it' ,
        'then do it right' ,
        'then do it better' ,
        'https://picsum.photos/200' ,
    ]
)
print ( r )

Visuelle Argumentation

Es gibt vier grundlegende Fähigkeiten zum visuellen Denken: Objekterkennung, Objektzählen, Farbkennung und räumliches Verständnis der Beziehung. Versuchen wir einige:

Sie müssen jq (einen JSON -Prozessor) installieren, um die Ergebnisse zu ermutigen.

Bild	via https?
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/1/300/300", "matches": [{"text": "there is a woman in the photo"}, {"text": "there is a man in the photo"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " gibt: `"there is a woman in the photo" 0.626907229423523 "there is a man in the photo" 0.37309277057647705`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/133/300/300", "matches": [ {"text": "the blue car is on the left, the red car is on the right"}, {"text": "the blue car is on the right, the red car is on the left"}, {"text": "the blue car is on top of the red car"}, {"text": "the blue car is below the red car"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " gibt: `"the blue car is on the left, the red car is on the right" 0.5232442617416382 "the blue car is on the right, the red car is on the left" 0.32878655195236206 "the blue car is below the red car" 0.11064132302999496 "the blue car is on top of the red car" 0.03732786327600479`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/102/300/300", "matches": [{"text": "this is a photo of one berry"}, {"text": "this is a photo of two berries"}, {"text": "this is a photo of three berries"}, {"text": "this is a photo of four berries"}, {"text": "this is a photo of five berries"}, {"text": "this is a photo of six berries"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " gibt: `"this is a photo of three berries" 0.48507222533226013 "this is a photo of four berries" 0.2377079576253891 "this is a photo of one berry" 0.11304923892021179 "this is a photo of five berries" 0.0731358453631401 "this is a photo of two berries" 0.05045759305357933 "this is a photo of six berries" 0.04057715833187103`

Dokumentation

Installieren

Clip-as-Service besteht aus zwei Python-Paketen clip-server und clip-client , die unabhängig voneinander installiert werden können. Beide erfordern Python 3.7+.

Server installieren

Pytorch -Laufzeit ⚡	Onnx -Laufzeit ⚡⚡	Tensorrt Runtime ⚡⚡⚡
pip install clip-server	pip install " clip-server[onnx] "	pip install nvidia-pyindex pip install " clip-server[tensorrt] "

Sie können den Server auch auf Google Colab hosten und die kostenlose GPU/TPU nutzen.

Client installieren

pip install clip-client

Schnelle Überprüfung

Sie können nach der Installation eine einfache Konnektivitätsprüfung ausführen.

C/s	Befehl	Erwarten Sie Output
Server	python -m clip_server
Kunde	from clip_client import Client c = Client ( 'grpc://0.0.0.0:23456' ) c . profile ()

Sie können 0.0.0.0 in die Intranet- oder öffentliche IP -Adresse ändern, um die Konnektivität über das private und öffentliche Netzwerk zu testen.

Fangen an

Grundnutzung

Starten Sie den Server: python -m clip_server . Erinnere dich an seine Adresse und Port.

Einen Kunden erstellen:

 from clip_client import Client

 c = Client ( 'grpc://0.0.0.0:51000' )

Strafverbettung erhalten:

 r = c . encode ([ 'First do it' , 'then do it right' , 'then do it better' ])

print ( r . shape )  # [3, 512]

Um Bildeinbettung zu erhalten:

 r = c . encode ([ 'apple.png' ,  # local image 
              'https://clip-as-service.jina.ai/_static/favicon.png' ,  # remote image
              'data:image/gif;base64,R0lGODlhEAAQAMQAAORHHOVSKudfOulrSOp3WOyDZu6QdvCchPGolfO0o/XBs/fNwfjZ0frl3/zy7////wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAkAABAALAAAAAAQABAAAAVVICSOZGlCQAosJ6mu7fiyZeKqNKToQGDsM8hBADgUXoGAiqhSvp5QAnQKGIgUhwFUYLCVDFCrKUE1lBavAViFIDlTImbKC5Gm2hB0SlBCBMQiB0UjIQA7' ])  # in image URI

print ( r . shape )  # [3, 512]

In den DOCs finden Sie umfassendere Leitfäden für Server- und Client -Benutzer.

Text-to-Image-Quermodal-Suche in 10 Zeilen

Erstellen wir eine Text-zu-im-Image-Suche mit Clip-as-Service. Ein Benutzer kann nämlich einen Satz eingeben und das Programm gibt übereinstimmende Bilder zurück. Wir werden das komplett aussehen wie Datensatz- und Docarray -Paket. Beachten Sie, dass DocArray als Upstream-Abhängigkeit in clip-client enthalten ist, sodass Sie sie nicht separat installieren müssen.

Bilder laden

Zuerst laden wir Bilder. Sie können sie einfach von Jina Cloud ziehen:

 from docarray import DocumentArray

da = DocumentArray . pull ( 'ttl-original' , show_progress = True , local_cache = True )

Oder laden Sie den TTL -Datensatz herunter, entpacken Sie manuell laden Sie

Alternativ können Sie sich auf die offizielle Website, die Unzipp- und Laden Sie Bilder aus:

 from docarray import DocumentArray

da = DocumentArray . from_files ([ 'left/*.jpg' , 'right/*.jpg' ])

Der Datensatz enthält 12.032 Bilder, sodass es eine Weile dauern kann, bis sie ziehen. Sobald Sie fertig sind, können Sie es sich vorstellen und den ersten Geschmack dieser Bilder erhalten:

 da . plot_image_sprites ()

Visualisierung des Bildsprites von Total sieht aus wie Datensatz

Codieren Bilder

Starten Sie den Server mit python -m clip_server . Nehmen wir an, es ist bei 0.0.0.0:51000 51000 mit GRPC -Protokoll (Sie erhalten diese Informationen, nachdem Sie den Server ausgeführt haben).

Erstellen Sie ein Python -Client -Skript:

 from clip_client import Client

c = Client ( server = 'grpc://0.0.0.0:51000' )

da = c . encode ( da , show_progress = True )

Abhängig von Ihrem GPU- und Client-Server-Netzwerk kann es eine Weile dauern, bis 12K-Bilder eingebettet sind. In meinem Fall dauerte es ungefähr zwei Minuten.

Laden Sie den vorgezogenen Datensatz herunter

Wenn Sie ungeduldig sind oder keine GPU haben, kann das Warten die Hölle sein. In diesem Fall können Sie einfach unseren vorgekündigten Bilddatensatz ziehen:

 from docarray import DocumentArray

da = DocumentArray . pull ( 'ttl-embedding' , show_progress = True , local_cache = True )

Suche per Satz

Erstellen wir eine einfache Eingabeaufforderung, damit ein Benutzer Satz eingeben kann:

 while True :
    vec = c . encode ([ input ( 'sentence> ' )])
    r = da . find ( query = vec , limit = 9 )
    r [ 0 ]. plot_image_sprites ()

Showcase

Jetzt können Sie beliebige englische Sätze eingeben und die Top-9-Matching-Bilder anzeigen. Die Suche ist schnell und instinktiv. Lassen Sie uns Spaß haben:

"Eine glückliche Kartoffel"	"Eine super böse Ai"	"Ein Mann, der seinen Burger genießt"

"Professor Cat ist sehr ernst"	"Ein Ego -Ingenieur lebt mit Eltern"	"Es wird kein Morgen geben, also lass uns ungesund essen"

Sparen wir das Einbettungsergebnis für unser nächstes Beispiel:

 da . save_binary ( 'ttl-image' )

Bild-zu-Text-Quermodal-Suche in 10 Zeilen

Wir können auch die Eingabe und Ausgabe des letzten Programms wechseln, um eine Bild-zu-Text-Suche zu erzielen. Genau bei einem Abfragebild finden Sie den Satz, der das Bild am besten beschreibt.

Lassen Sie uns alle Sätze aus dem Buch "Pride and Prejudice" verwenden.

 from docarray import Document , DocumentArray

d = Document ( uri = 'https://www.gutenberg.org/files/1342/1342-0.txt' ). load_uri_to_text ()
da = DocumentArray (
    Document ( text = s . strip ()) for s in d . text . replace ( ' r n ' , '' ). split ( '.' ) if s . strip ()
)

Schauen wir uns an, was wir haben:

 da . summary ()

            Documents Summary            
                                         
  Length                 6403            
  Homogenous Documents   True            
  Common Attributes      ('id', 'text')  
                                         
                     Attributes Summary                     
                                                            
  Attribute   Data type   #Unique values   Has empty value  
 ────────────────────────────────────────────────────────── 
  id          ('str',)    6403             False            
  text        ('str',)    6030             False

Sätze codieren

Nehmen Sie nun diese 6.403 Sätze ein. Abhängig von Ihrer GPU und Ihrem Netzwerk kann es 10 Sekunden oder weniger dauern:

 from clip_client import Client

c = Client ( 'grpc://0.0.0.0:51000' )

r = c . encode ( da , show_progress = True )

Laden Sie den vorgezogenen Datensatz herunter

Auch für Personen, die ungeduldig sind oder keine GPU haben, haben wir einen vorgekündigten Textdatensatz vorbereitet:

 from docarray import DocumentArray

da = DocumentArray . pull ( 'ttl-textual' , show_progress = True , local_cache = True )

Suche über Bild

Laden wir unsere zuvor gespeicherte Bildeinbettung, probieren Sie zufällig 10 Bilddokumente und finden Sie dann den nächsten Nachbarn von jeweils Top-1.

 from docarray import DocumentArray

img_da = DocumentArray . load_binary ( 'ttl-image' )

for d in img_da . sample ( 10 ):
    print ( da . find ( d . embedding , limit = 1 )[ 0 ]. text )

Showcase

Lustige Zeit! Beachten Sie, dass im Gegensatz zum vorherigen Beispiel hier die Eingabe ein Bild ist und der Satz die Ausgabe ist. Alle Sätze stammen aus dem Buch "Pride and Prejudice".


Außerdem gab es Wahrheit in seinem Aussehen	Gardiner lächelte	Wie heißt er	Bis zur Teezeit war die Dosis jedoch genug gewesen, und MR	Du siehst nicht gut aus


"Ein Gamester!" sie weinte	Wenn Sie meinen Namen in der Glocke erwähnen, werden Sie sich um Siedungen befinden	Egal, Miss Lizzys Haare	Elizabeth wird bald die Frau von MR sein	Ich habe sie in der Nacht zuvor gesehen

Rang Image-Text übereinstimmt über das Clip-Modell

Ab 0.3.0 CLIP-AS-Service fügt ein neuer /rank -Endpunkt hinzu, der die Quermodalübereinstimmungen entsprechend ihrer gemeinsamen Wahrscheinlichkeit im Clip-Modell erneut ansteigt. Beispielsweise übereinstimmt ein Bilddokument mit einigen vordefinierten Satzungen wie unten:

 from clip_client import Client
from docarray import Document

c = Client ( server = 'grpc://0.0.0.0:51000' )
r = c . rank (
    [
        Document (
            uri = '.github/README-img/rerank.png' ,
            matches = [
                Document ( text = f'a photo of a { p } ' )
                for p in (
                    'control room' ,
                    'lecture room' ,
                    'conference room' ,
                    'podium indoor' ,
                    'television studio' ,
                )
            ],
        )
    ]
)

print ( r [ '@m' , [ 'text' , 'scores__clip_score__value' ]])

 [['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'], 
[0.9920725226402283, 0.006038925610482693, 0.0009973491542041302, 0.00078492151806131, 0.00010626466246321797]]

Man kann jetzt a photo of a television studio sehen, das mit clip_score Score bei 0.992 nach oben rangiert. In der Praxis kann man diesen Endpunkt verwenden, um das passende Ergebnis eines anderen Suchsystems neu zu erreichen, um die quermodale Suchqualität zu verbessern.

Rang Text-Image übereinstimmt über das Clip-Modell

Im Dall · E -Flow -Projekt ist Clip zum Ranking der generierten Ergebnisse von Dall · e gefordert. Es hat einen Testamentsvollstrecker, der über clip-client gewickelt ist, der .arank() - die asynchronisierte Version von .rank() aufruft:

 from clip_client import Client
from jina import Executor , requests , DocumentArray


class ReRank ( Executor ):
    def __init__ ( self , clip_server : str , ** kwargs ):
        super (). __init__ ( ** kwargs )
        self . _client = Client ( server = clip_server )

    @ requests ( on = '/' )
    async def rerank ( self , docs : DocumentArray , ** kwargs ):
        return await self . _client . arank ( docs )

Clip-as-Service im Dalle Flow verwendet

Fasziniert? Das kratzt nur die Oberfläche dessen, wozu Clip-as-Service fähig ist. Lesen Sie unsere Dokumente, um mehr zu erfahren.

Unterstützung

Treten Sie unserer Discord -Community bei und chatten Sie mit anderen Community -Mitgliedern über Ideen.
Beobachten Sie unsere Engineering alle Hände, um die neuen Funktionen von Jina zu lernen und über die neuesten KI-Techniken auf dem Laufenden zu bleiben.
Abonnieren Sie die neuesten Video -Tutorials auf unserem YouTube -Kanal

Begleiten Sie uns

Clip-as-Service wird von Jina AI unterstützt und unter Apache-2.0 lizenziert. Wir stellen aktiv KI-Ingenieure ein, Lösungsingenieure, um das nächste Neural-Such-Ökosystem in Open-Source zu bauen.

Expandieren

Zusätzliche Informationen

Version v0.8.3
Typ Anderer Quellcode
Aktualisierungszeit 2025-03-02
Größe 11.55MB
Kommt von Github

Ähnliche Anwendungen

Inf CLIP

2024-11-03
Vollständige chinesische Version mit Service

2023-10-20
als Loch

2023-05-29
als komplette Software

2023-05-29
Kopf AS-Code

2022-07-24
Clip-Eimer

2011-05-24

Bild	via https?
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/1/300/300", "matches": [{"text": "there is a woman in the photo"}, {"text": "there is a man in the photo"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " gibt: `"there is a woman in the photo" 0.626907229423523 "there is a man in the photo" 0.37309277057647705`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/133/300/300", "matches": [ {"text": "the blue car is on the left, the red car is on the right"}, {"text": "the blue car is on the right, the red car is on the left"}, {"text": "the blue car is on top of the red car"}, {"text": "the blue car is below the red car"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " gibt: `"the blue car is on the left, the red car is on the right" 0.5232442617416382 "the blue car is on the right, the red car is on the left" 0.32878655195236206 "the blue car is below the red car" 0.11064132302999496 "the blue car is on top of the red car" 0.03732786327600479`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/102/300/300", "matches": [{"text": "this is a photo of one berry"}, {"text": "this is a photo of two berries"}, {"text": "this is a photo of three berries"}, {"text": "this is a photo of four berries"}, {"text": "this is a photo of five berries"}, {"text": "this is a photo of six berries"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " gibt: `"this is a photo of three berries" 0.48507222533226013 "this is a photo of four berries" 0.2377079576253891 "this is a photo of one berry" 0.11304923892021179 "this is a photo of five berries" 0.0731358453631401 "this is a photo of two berries" 0.05045759305357933 "this is a photo of six berries" 0.04057715833187103`

clip as service

Text- und Bildeinbettung

Visuelle Argumentation

Dokumentation

Installieren

Server installieren

Client installieren

Schnelle Überprüfung

Fangen an

Grundnutzung

Text-to-Image-Quermodal-Suche in 10 Zeilen

Bilder laden

Codieren Bilder

Suche per Satz

Showcase

Bild-zu-Text-Quermodal-Suche in 10 Zeilen

Sätze codieren

Suche über Bild

Showcase

Rang Image-Text übereinstimmt über das Clip-Modell

Rang Text-Image übereinstimmt über das Clip-Modell

Unterstützung

Begleiten Sie uns

Inf CLIP

Vollständige chinesische Version mit Service

als Loch

als komplette Software

Kopf AS-Code

Clip-Eimer

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

chat.petals.dev

Sunamu

waymo open dataset

termwind

wp functions