clip as service

clip as service

기타 소스코드

v0.8.3

다운로드

클립 서비스 로고 : 구조화되지 않은 데이터의 데이터 구조

클립 서비스는 이미지와 텍스트를 포함시키기위한 저수준 높은 확률 서비스입니다. 마이크로 서비스로 신경 검색 솔루션에 쉽게 통합 될 수 있습니다.

FAST : Tensorrt, Onnx 런타임 및 800qps가있는 Pytorch와 함께 클립 모델을 제공합니다 ^[*] . 큰 데이터 및 장기 실행 작업을 위해 설계된 요청 및 응답에 대한 비 블로킹 듀플렉스 스트리밍.

? 탄성 : 자동로드 밸런싱으로 단일 GPU의 여러 클립 모델을 수평으로 확장 및 아래로 스케일링합니다.

? 사용하기 쉬운 : 학습 곡선 없음, 클라이언트 및 서버에서 미니멀리스트 디자인. 이미지 및 문장 임베딩을위한 직관적이고 일관된 API.

? 현대 : 비동기 클라이언트 지원. TLS를 사용하여 GRPC, HTTP, WebSocket 프로토콜을 쉽게 전환합니다.

? 통합 : Jina 및 Docarray를 포함한 신경 검색 생태계와의 원활한 통합. 교차 모달 및 다중 모달 솔루션을 즉시 구축하십시오.

^{[*] Geforce RTX 3090의 기본 구성 (단일 복제, Pytorch no Jit).}

텍스트 및 이미지 임베딩

HTTPS를 통해?

Grpc를 통해??

curl 
-X POST https:// < your-inference-address > -http.wolf.jina.ai/post 
-H ' Content-Type: application/json ' 
-H ' Authorization: <your access token> ' 
-d ' {"data":[{"text": "First do it"}, 
    {"text": "then do it right"}, 
    {"text": "then do it better"}, 
    {"uri": "https://picsum.photos/200"}], 
    "execEndpoint":"/"} '

 # pip install clip-client
from clip_client import Client

c = Client (
    'grpcs://<your-inference-address>-grpc.wolf.jina.ai' ,
    credential = { 'Authorization' : '<your access token>' },
)

r = c . encode (
    [
        'First do it' ,
        'then do it right' ,
        'then do it better' ,
        'https://picsum.photos/200' ,
    ]
)
print ( r )

시각적 추론

객체 인식, 객체 계산, 색상 인식 및 공간 관계 이해의 네 가지 기본적인 시각적 추론 기술이 있습니다. 일부 시도해 봅시다 :

결과를 예측하려면 jq (JSON 프로세서)를 설치해야합니다.

영상	HTTPS를 통해?
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/1/300/300", "matches": [{"text": "there is a woman in the photo"}, {"text": "there is a man in the photo"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " 제공 : `"there is a woman in the photo" 0.626907229423523 "there is a man in the photo" 0.37309277057647705`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/133/300/300", "matches": [ {"text": "the blue car is on the left, the red car is on the right"}, {"text": "the blue car is on the right, the red car is on the left"}, {"text": "the blue car is on top of the red car"}, {"text": "the blue car is below the red car"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " 제공 : `"the blue car is on the left, the red car is on the right" 0.5232442617416382 "the blue car is on the right, the red car is on the left" 0.32878655195236206 "the blue car is below the red car" 0.11064132302999496 "the blue car is on top of the red car" 0.03732786327600479`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/102/300/300", "matches": [{"text": "this is a photo of one berry"}, {"text": "this is a photo of two berries"}, {"text": "this is a photo of three berries"}, {"text": "this is a photo of four berries"}, {"text": "this is a photo of five berries"}, {"text": "this is a photo of six berries"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " 제공 : `"this is a photo of three berries" 0.48507222533226013 "this is a photo of four berries" 0.2377079576253891 "this is a photo of one berry" 0.11304923892021179 "this is a photo of five berries" 0.0731358453631401 "this is a photo of two berries" 0.05045759305357933 "this is a photo of six berries" 0.04057715833187103`

선적 서류 비치

설치하다

클립 서비스는 독립적으로 설치할 수있는 두 개의 파이썬 패키지 clip-server 및 clip-client 로 구성됩니다. 둘 다 Python 3.7+가 필요합니다.

서버를 설치하십시오

Pytorch 런타임 ch	Onnx 런타임 time	Tensorrt 런타임 ⚡⚡⚡
pip install clip-server	pip install " clip-server[onnx] "	pip install nvidia-pyindex pip install " clip-server[tensorrt] "

무료 GPU/TPU를 활용하여 Google Colab에서 서버를 호스팅 할 수도 있습니다.

클라이언트를 설치하십시오

pip install clip-client

빠른 점검

설치 후 간단한 연결 확인을 실행할 수 있습니다.

c/s	명령	출력을 기대합니다
섬기는 사람	python -m clip_server
고객	from clip_client import Client c = Client ( 'grpc://0.0.0.0:23456' ) c . profile ()

인트라넷 또는 공개 IP 주소로 0.0.0.0 변경하여 개인 및 공개 네트워크를 통한 연결을 테스트 할 수 있습니다.

시작하세요

기본 사용

서버를 시작하십시오 : python -m clip_server . 주소와 포트를 기억하십시오.

클라이언트 생성 :

 from clip_client import Client

 c = Client ( 'grpc://0.0.0.0:51000' )

문장 임베딩을 얻으려면 :

 r = c . encode ([ 'First do it' , 'then do it right' , 'then do it better' ])

print ( r . shape )  # [3, 512]

이미지 임베딩을 얻으려면 :

 r = c . encode ([ 'apple.png' ,  # local image 
              'https://clip-as-service.jina.ai/_static/favicon.png' ,  # remote image
              'data:image/gif;base64,R0lGODlhEAAQAMQAAORHHOVSKudfOulrSOp3WOyDZu6QdvCchPGolfO0o/XBs/fNwfjZ0frl3/zy7////wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAkAABAALAAAAAAQABAAAAVVICSOZGlCQAosJ6mu7fiyZeKqNKToQGDsM8hBADgUXoGAiqhSvp5QAnQKGIgUhwFUYLCVDFCrKUE1lBavAViFIDlTImbKC5Gm2hB0SlBCBMQiB0UjIQA7' ])  # in image URI

print ( r . shape )  # [3, 512]

보다 포괄적 인 서버 및 클라이언트 사용자 가이드는 문서에서 찾을 수 있습니다.

10 줄의 텍스트-이미지 교차 모달 검색

클립 서비스를 사용하여 텍스트-이미지 검색을 작성해 봅시다. 즉, 사용자는 문장을 입력 할 수 있으며 프로그램은 일치하는 이미지를 반환합니다. 우리는 완전히 데이터 세트 및 docarray 패키지와 같은 모습을 사용합니다. Docarray는 clip-client 내에 업스트림 의존성으로 포함되므로 별도로 설치할 필요가 없습니다.

이미지로드

먼저 이미지를로드합니다. Jina Cloud에서 단순히 가져올 수 있습니다.

 from docarray import DocumentArray

da = DocumentArray . pull ( 'ttl-original' , show_progress = True , local_cache = True )

또는 TTL 데이터 세트를 다운로드, 실선 해제, 수동으로로드하십시오

또는 공식 웹 사이트, 압축 및로드 이미지처럼 완전히 보일 수 있습니다.

 from docarray import DocumentArray

da = DocumentArray . from_files ([ 'left/*.jpg' , 'right/*.jpg' ])

데이터 세트에는 12,032 개의 이미지가 포함되어 있으므로 당기는 데 시간이 걸릴 수 있습니다. 완료되면 시각화하고 해당 이미지의 첫 번째 맛을 얻을 수 있습니다.

 da . plot_image_sprites ()

이미지 스프라이트의 시각화는 데이터 세트처럼 보입니다.

이미지를 인코딩합니다

python -m clip_server 로 서버를 시작하십시오. GRPC 프로토콜을 사용하여 0.0.0.0:51000 이라고 가정 해 봅시다 (서버를 실행 한 후이 정보를 얻을 수 있음).

Python 클라이언트 스크립트 작성 :

 from clip_client import Client

c = Client ( server = 'grpc://0.0.0.0:51000' )

da = c . encode ( da , show_progress = True )

GPU 및 클라이언트 서버 네트워크에 따라 12K 이미지를 포함하는 데 시간이 걸릴 수 있습니다. 제 경우에는 약 2 분이 걸렸습니다.

사전 인코딩 된 데이터 세트를 다운로드하십시오

당신이 참을성이 없거나 GPU가 없다면 기다리는 것은 지옥이 될 수 있습니다. 이 경우 사전 인코딩 된 이미지 데이터 세트를 가져올 수 있습니다.

 from docarray import DocumentArray

da = DocumentArray . pull ( 'ttl-embedding' , show_progress = True , local_cache = True )

문장을 통해 검색하십시오

사용자가 문장을 입력 할 수 있도록 간단한 프롬프트를 작성하겠습니다.

 while True :
    vec = c . encode ([ input ( 'sentence> ' )])
    r = da . find ( query = vec , limit = 9 )
    r [ 0 ]. plot_image_sprites ()

유리 진열장

이제 임의의 영어 문장을 입력하고 상위 9 개의 일치하는 이미지를 볼 수 있습니다. 검색은 빠르고 본능적입니다. 재미있게 보내자 :

"행복한 감자"	"Super Evil ai"	"버거를 즐기는 남자"

"고양이 교수는 매우 심각하다"	"자아 엔지니어는 부모와 함께 산다"	"내일은 없을 것이므로 건강에 해로운 음식을 먹자"

다음 예제에 대한 임베딩 결과를 저장하겠습니다.

 da . save_binary ( 'ttl-image' )

10 줄의 이미지-텍스트 크로스 모달 검색

또한 마지막 프로그램의 입력 및 출력을 전환하여 이미지-텍스트 검색을 달성 할 수 있습니다. 정확하게, 쿼리 이미지가 주어지면 이미지를 가장 잘 설명하는 문장을 찾으십시오.

"프라이드와 편견"이라는 책의 모든 문장을 사용합시다.

 from docarray import Document , DocumentArray

d = Document ( uri = 'https://www.gutenberg.org/files/1342/1342-0.txt' ). load_uri_to_text ()
da = DocumentArray (
    Document ( text = s . strip ()) for s in d . text . replace ( ' r n ' , '' ). split ( '.' ) if s . strip ()
)

우리가 무엇을 얻었는지 살펴 보겠습니다.

 da . summary ()

            Documents Summary            
                                         
  Length                 6403            
  Homogenous Documents   True            
  Common Attributes      ('id', 'text')  
                                         
                     Attributes Summary                     
                                                            
  Attribute   Data type   #Unique values   Has empty value  
 ────────────────────────────────────────────────────────── 
  id          ('str',)    6403             False            
  text        ('str',)    6030             False

문장을 인코딩합니다

이제이 6,403 개의 문장을 인코딩하면 GPU 및 네트워크에 따라 10 초 이하가 될 수 있습니다.

 from clip_client import Client

c = Client ( 'grpc://0.0.0.0:51000' )

r = c . encode ( da , show_progress = True )

사전 인코딩 된 데이터 세트를 다운로드하십시오

다시 말하지만, 참을성이 없거나 GPU가없는 사람들의 경우, 우리는 사전 인코딩 된 텍스트 데이터 세트를 준비했습니다.

 from docarray import DocumentArray

da = DocumentArray . pull ( 'ttl-textual' , show_progress = True , local_cache = True )

이미지를 통해 검색하십시오

이전에 저장된 이미지 임베딩을로드하고 무작위로 10 개의 이미지 문서를 샘플링 한 다음 각각 가장 가까운 이웃을 찾으십시오.

 from docarray import DocumentArray

img_da = DocumentArray . load_binary ( 'ttl-image' )

for d in img_da . sample ( 10 ):
    print ( da . find ( d . embedding , limit = 1 )[ 0 ]. text )

유리 진열장

즐거운 시간! 이전 예제와 달리 입력은 이미지이며 문장은 출력입니다. 모든 문장은 "자부심과 편견"이라는 책에서 나옵니다.


게다가, 그의 외모에는 진실이있었습니다	가디너는 웃었다	그의 이름은 무엇입니까?	그러나 티 타임으로, 복용량은 충분했고 MR은	당신은 잘 보이지 않습니다


"게임 레스터!" 그녀는 울었다	벨에서 내 이름을 언급하면 참석할 것입니다.	Lizzy의 머리카락을 신경 쓰지 마십시오	엘리자베스는 곧 씨의 아내가 될 것입니다	나는 지난 밤에 그들을 보았다

클립 모델을 통해 이미지 텍스트 일치 순위

0.3.0 에서 클립 서비스는 클립 모델에서 공동 가능성에 따라 크로스 모달과 일치하는 새로운 /rank 종점을 추가합니다. 예를 들어, 미리 정의 된 문장이있는 이미지 문서가 다음과 같이 일치합니다.

 from clip_client import Client
from docarray import Document

c = Client ( server = 'grpc://0.0.0.0:51000' )
r = c . rank (
    [
        Document (
            uri = '.github/README-img/rerank.png' ,
            matches = [
                Document ( text = f'a photo of a { p } ' )
                for p in (
                    'control room' ,
                    'lecture room' ,
                    'conference room' ,
                    'podium indoor' ,
                    'television studio' ,
                )
            ],
        )
    ]
)

print ( r [ '@m' , [ 'text' , 'scores__clip_score__value' ]])

 [['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'], 
[0.9920725226402283, 0.006038925610482693, 0.0009973491542041302, 0.00078492151806131, 0.00010626466246321797]]

이제 a photo of a television studio 0.992 의 clip_score 점수로 최고 순위에 올랐습니다. 실제로,이 엔드 포인트를 사용하여 다른 검색 시스템의 일치 결과를 다시 순위로 인정하여 교차 모달 검색 품질을 향상시킬 수 있습니다.

클립 모델을 통해 텍스트 이미지 일치 순위

Dall · e 흐름 프로젝트에서 클립은 Dall · e의 생성 된 결과 순위를 매기는 것을 요구합니다. clip-client 위에 갇힌 집행자가 .rank() .arank()

 from clip_client import Client
from jina import Executor , requests , DocumentArray


class ReRank ( Executor ):
    def __init__ ( self , clip_server : str , ** kwargs ):
        super (). __init__ ( ** kwargs )
        self . _client = Client ( server = clip_server )

    @ requests ( on = '/' )
    async def rerank ( self , docs : DocumentArray , ** kwargs ):
        return await self . _client . arank ( docs )

Dalle 흐름에 사용되는 클립 서비스

흥미? 그것은 클립 서비스가 할 수있는 것의 표면을 긁는 것입니다. 더 많은 것을 배우려면 문서를 읽으십시오.

지원하다

불화 커뮤니티에 가입하고 아이디어에 대해 다른 커뮤니티 회원들과 채팅하십시오.
Jina의 새로운 기능을 배우고 최신 AI 기술로 최신 정보를 얻으려면 엔지니어링 모든 손을 지켜보십시오.
YouTube 채널에서 최신 비디오 자습서 구독

우리와 함께하십시오

클립 서비스는 Jina AI가 지원하고 Apache-2.0에 따라 라이센스를 부여합니다. 우리는 AI 엔지니어, 솔루션 엔지니어를 적극적으로 고용하여 Open-Source에서 다음 신경 검색 생태계를 구축하고 있습니다.

확장하다

추가 정보

버전 v0.8.3
유형 기타 소스코드
업데이트 시간 2025-03-02
크기 11.55MB
출처 Github

영상	HTTPS를 통해?
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/1/300/300", "matches": [{"text": "there is a woman in the photo"}, {"text": "there is a man in the photo"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " 제공 : `"there is a woman in the photo" 0.626907229423523 "there is a man in the photo" 0.37309277057647705`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/133/300/300", "matches": [ {"text": "the blue car is on the left, the red car is on the right"}, {"text": "the blue car is on the right, the red car is on the left"}, {"text": "the blue car is on top of the red car"}, {"text": "the blue car is below the red car"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " 제공 : `"the blue car is on the left, the red car is on the right" 0.5232442617416382 "the blue car is on the right, the red car is on the left" 0.32878655195236206 "the blue car is below the red car" 0.11064132302999496 "the blue car is on top of the red car" 0.03732786327600479`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/102/300/300", "matches": [{"text": "this is a photo of one berry"}, {"text": "this is a photo of two berries"}, {"text": "this is a photo of three berries"}, {"text": "this is a photo of four berries"}, {"text": "this is a photo of five berries"}, {"text": "this is a photo of six berries"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " 제공 : `"this is a photo of three berries" 0.48507222533226013 "this is a photo of four berries" 0.2377079576253891 "this is a photo of one berry" 0.11304923892021179 "this is a photo of five berries" 0.0731358453631401 "this is a photo of two berries" 0.05045759305357933 "this is a photo of six berries" 0.04057715833187103`

텍스트 및 이미지 임베딩

시각적 추론

선적 서류 비치

설치하다

서버를 설치하십시오

클라이언트를 설치하십시오

빠른 점검

시작하세요

기본 사용

10 줄의 텍스트-이미지 교차 모달 검색

이미지로드

이미지를 인코딩합니다

문장을 통해 검색하십시오

유리 진열장

10 줄의 이미지-텍스트 크로스 모달 검색

문장을 인코딩합니다

이미지를 통해 검색하십시오

유리 진열장

클립 모델을 통해 이미지 텍스트 일치 순위

클립 모델을 통해 텍스트 이미지 일치 순위

지원하다

우리와 함께하십시오

Inf CLIP

풀 서비스 중국어 버전

구멍으로

구멍 소프트웨어로

헤드 AS 코드

클립 버킷

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

chat.petals.dev

Sunamu

waymo open dataset

termwind

wp functions