Unduh t2v_metrics - Unduh kode sumber t2v

VQAScore untuk Mengevaluasi Model Text-to-Visual [Halaman Proyek]

VQAScore memungkinkan peneliti mengevaluasi model teks-ke-gambar/video/3D secara otomatis menggunakan satu baris kode Python!

[Halaman VQAScore] [Demo VQAScore] [Halaman GenAI-Bench] [Demo GenAI-Bench] [CLIP-FlanT5 Model Zoo]

VQAScore: Mengevaluasi Pembuatan Teks-ke-Visual dengan Pembuatan Gambar-ke-Teks (ECCV 2024) [Makalah] [HF]
Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan

GenAI-Bench: Mengevaluasi dan Meningkatkan Komposisi Pembuatan Teks-ke-Visual (CVPR 2024, Makalah Pendek Terbaik @ SynData Workshop ) [Makalah] [HF]
Baiqi Li*, Zhiqiu Lin*, Deepak Pathak, Jiayao Li, Yixin Fei, Kewen Wu, Tiffany Ling, Xide Xia*, Pengchuan Zhang*, Graham Neubig*, Deva Ramanan* (*Rekan Pertama dan rekan penulis senior)

Berita

[2024/08/13] VQAScore disorot dalam laporan Imagen3 Google sebagai pengganti CLIPScore terkuat untuk evaluasi otomatis! GenAI-Bench dipilih sebagai salah satu tolok ukur utama untuk menampilkan penyelarasan gambar cepat Imagen3 yang unggul. Kudos to Google atas pencapaian ini! [Kertas]
[2024/07/01] VQAScore telah diterima di ECCV 2024!
[2024/06/20] GenAI-Bench meraih Best Short Paper pada Workshop SynData CVPR'24! [Situs Lokakarya].

VQAScore secara signifikan mengungguli metrik sebelumnya seperti CLIPScore dan PickScore pada perintah teks komposisi, dan jauh lebih sederhana dibandingkan penemuan sebelumnya (misalnya, ImageReward, HPSv2, TIFA, Davidsonian, VPEval, VIEScore) yang memanfaatkan umpan balik manusia atau model kepemilikan seperti ChatGPT dan GPT -4Visi.

Mulai cepat

Instal paket melalui:

git clone https://github.com/linzhiqiu/t2v_metrics
cd t2v_metrics

conda create -n t2v python=3.10 -y
conda activate t2v
conda install pip -y

pip install torch torchvision torchaudio
pip install git+https://github.com/openai/CLIP.git
pip install -e . # local pip install

Atau Anda dapat menginstal melalui pip install t2v-metrics .

Sekarang, hanya kode Python berikut yang Anda perlukan untuk menghitung VQAScore untuk penyelarasan gambar-teks (skor yang lebih tinggi menunjukkan kesamaan yang lebih besar):

 import t2v_metrics
clip_flant5_score = t2v_metrics . VQAScore ( model = 'clip-flant5-xxl' ) # our recommended scoring model

### For a single (image, text) pair
image = "images/0.png" # an image path in string format
text = "someone talks on the phone angrily while another person sits happily"
score = clip_flant5_score ( images = [ image ], texts = [ text ])

### Alternatively, if you want to calculate the pairwise similarity scores 
### between M images and N texts, run the following to return a M x N score tensor.
images = [ "images/0.png" , "images/1.png" ]
texts = [ "someone talks on the phone angrily while another person sits happily" ,
         "someone talks on the phone happily while another person sits angrily" ]
scores = clip_flant5_score ( images = images , texts = texts ) # scores[i][j] is the score between image i and text j

Catatan tentang GPU dan cache

Penggunaan GPU : Secara default, kode ini menggunakan perangkat cuda pertama di mesin Anda. Kami merekomendasikan GPU 40GB untuk model VQAScore terbesar seperti clip-flant5-xxl dan llava-v1.5-13b . Jika Anda memiliki memori GPU terbatas, pertimbangkan model yang lebih kecil seperti clip-flant5-xl dan llava-v1.5-7b .
Direktori cache : Anda dapat mengubah folder cache yang menyimpan semua pos pemeriksaan model (defaultnya adalah ./hf_cache/ ) dengan memperbarui HF_CACHE_DIR di t2v_metrics/constants.py.

Penggunaan Tingkat Lanjut

Pemrosesan batch untuk lebih banyak pasangan gambar-teks
Periksa semua model yang didukung
Menyesuaikan template tanya jawab (untuk VQAScore)
Mereproduksi hasil kertas VQAScore
Mereproduksi hasil makalah GenAI-Bench
Menggunakan GPT-4o untuk VQAScore
Menerapkan metrik penilaian Anda sendiri
Pembuatan teks (VQA) menggunakan CLIP-FlanT5

Pemrosesan batch untuk lebih banyak pasangan gambar-teks

Dengan kumpulan besar M gambar x N teks, Anda dapat mempercepatnya menggunakan fungsi batch_forward() .

 import t2v_metrics
clip_flant5_score = t2v_metrics . VQAScore ( model = 'clip-flant5-xxl' )

# The number of images and texts per dictionary must be consistent.
# E.g., the below example shows how to evaluate 4 generated images per text
dataset = [
  { 'images' : [ "images/0/DALLE3.png" , "images/0/Midjourney.jpg" , "images/0/SDXL.jpg" , "images/0/DeepFloyd.jpg" ], 'texts' : [ "The brown dog chases the black dog around the tree." ]},
  { 'images' : [ "images/1/DALLE3.png" , "images/1/Midjourney.jpg" , "images/1/SDXL.jpg" , "images/1/DeepFloyd.jpg" ], 'texts' : [ "Two cats sit at the window, the blue one intently watching the rain, the red one curled up asleep." ]},
  #...
]
scores = clip_flant5_score . batch_forward ( dataset = dataset , batch_size = 16 ) # (n_sample, 4, 1) tensor

Periksa semua model yang didukung

Saat ini kami mendukung menjalankan VQAScore dengan CLIP-FlanT5, LLaVA-1.5, dan InstructBLIP. Untuk ablasi, kami juga menyertakan CLIPScore, BLIPv2Score, PickScore, HPSv2Score, dan ImageReward:

 llava_score = t2v_metrics . VQAScore ( model = 'llava-v1.5-13b' )
instructblip_score = t2v_metrics . VQAScore ( model = 'instructblip-flant5-xxl' )
clip_score = t2v_metrics . CLIPScore ( model = 'openai:ViT-L-14-336' )
blip_itm_score = t2v_metrics . ITMScore ( model = 'blip2-itm' ) 
pick_score = t2v_metrics . CLIPScore ( model = 'pickscore-v1' )
hpsv2_score = t2v_metrics . CLIPScore ( model = 'hpsv2' ) 
image_reward_score = t2v_metrics . ITMScore ( model = 'image-reward-v1' )

Anda dapat memeriksa semua model yang didukung dengan menjalankan perintah di bawah ini:

 print ( "VQAScore models:" )
t2v_metrics . list_all_vqascore_models ()

print ( "ITMScore models:" )
t2v_metrics . list_all_itmscore_models ()

print ( "CLIPScore models:" )
t2v_metrics . list_all_clipscore_models ()

Menyesuaikan template tanya jawab (untuk VQAScore)

Tanya jawab sedikit mempengaruhi skor akhir, seperti terlihat pada Lampiran makalah kami. Kami menyediakan template default sederhana untuk setiap model dan tidak menyarankan mengubahnya demi reproduktifitas. Namun, kami ingin menunjukkan bahwa pertanyaan dan jawaban dapat dengan mudah dimodifikasi. Misalnya, CLIP-FlanT5 dan LLaVA-1.5 menggunakan templat berikut, yang dapat ditemukan di t2v_metrics/models/vqascore_models/clip_t5_model.py:

 # {} will be replaced by the caption
default_question_template = 'Does this figure show "{}"? Please answer yes or no.'
default_answer_template = 'Yes'

Anda dapat menyesuaikan templat dengan meneruskan parameter question_template dan answer_template ke dalam fungsi forward() atau batch_forward() :

 # Use a different question for VQAScore
scores = clip_flant5_score ( images = images ,
                           texts = texts ,
                           question_template = 'Is this figure showing "{}"? Please answer yes or no.' ,
                           answer_template = 'Yes' )

Anda juga dapat menghitung P(caption | image) (VisualGPTScore) alih-alih P(answer | image, question):

 scores = clip_flant5_score ( images = images ,
                           texts = texts ,
                           question_template = "" , # no question
                           answer_template = "{}" ) # this computes P(caption | image)

Mereproduksi hasil kertas VQAScore

Eval.py kami memungkinkan Anda menjalankan 10 tolok ukur penyelarasan gambar/visi/3D dengan mudah (misalnya, Winoground/TIFA160/SeeTrue/StanfordT23D/T2VScore):

python eval.py --model clip-flant5-xxl # for VQAScore
python eval.py --model openai:ViT-L-14 # for CLIPScore

# You can optionally specify question/answer template, for example:
python eval.py --model clip-flant5-xxl --question " Is the figure showing '{}'? " --answer " Yes "

Mereproduksi hasil makalah GenAI-Bench

genai_image_eval.py dan genai_video_eval.py kami dapat mereproduksi hasil GenAI-Bench. Sebagai tambahan, genai_image_ranking.py dapat mereproduksi hasil Peringkat GenAI:

 # GenAI-Bench
python genai_image_eval.py --model clip-flant5-xxl
python genai_video_eval.py --model clip-flant5-xxl

# GenAI-Rank
python genai_image_ranking.py --model clip-flant5-xxl --gen_model DALLE_3
python genai_image_ranking.py --model clip-flant5-xxl --gen_model SDXL_Base

Menggunakan GPT-4o untuk VQAScore!

Kami menerapkan VQAScore menggunakan GPT-4o untuk mencapai kinerja baru yang canggih. Silakan lihat t2v_metrics/gpt4_eval.py sebagai contoh. Berikut cara menggunakannya di baris perintah:

 openai_key = # Your OpenAI key
score_func = t2v_metrics . get_score_model ( model = "gpt-4o" , device = "cuda" , openai_key = openai_key , top_logprobs = 20 ) # We find top_logprobs=20 to be sufficient for most (image, text) samples. Consider increase this number if you get errors (the API cost will not increase).

Menerapkan metrik penilaian Anda sendiri

Anda dapat dengan mudah menerapkan metrik penilaian Anda sendiri. Misalnya, jika Anda memiliki model VQA yang Anda yakini lebih efektif, Anda dapat memasukkannya ke dalam direktori di t2v_metrics/models/vqascore_models. Untuk panduan, lihat contoh implementasi LLaVA-1.5 dan InstructBLIP kami sebagai titik awal.

Pembuatan teks (VQA) menggunakan CLIP-FlanT5

Untuk menghasilkan teks (captioning atau tugas VQA) menggunakan CLIP-FlanT5, silakan gunakan kode di bawah ini:

 import t2v_metrics
clip_flant5_score = t2v_metrics . VQAScore ( model = 'clip-flant5-xxl' )

images = [ "images/0.png" , "images/0.png" ] # A list of images
prompts = [ "Please describe this image: " , "Does the image show 'someone talks on the phone angrily while another person sits happily'?" ] # Corresponding prompts
clip_flant5_score . model . generate ( images = images , prompts = prompts )

Kutipan

Jika Anda merasa repositori ini berguna untuk penelitian Anda, silakan gunakan yang berikut ini (UNTUK MEMPERBARUI dengan ID ArXiv).

 @article{lin2024evaluating,
  title={Evaluating Text-to-Visual Generation with Image-to-Text Generation},
  author={Lin, Zhiqiu and Pathak, Deepak and Li, Baiqi and Li, Jiayao and Xia, Xide and Neubig, Graham and Zhang, Pengchuan and Ramanan, Deva},
  journal={arXiv preprint arXiv:2404.01291},
  year={2024}
}