This repository contains the Composed Image Retrieval on Real-life images (CIRR) dataset.
For details please see our ICCV 2021 paper - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models.
You are currently viewing the Dataset repository. Site navigation > Project homepage | Code repository
News and Upcoming Updates
Our dataset is structured in a similar way as Fashion-IQ, an existing dataset on this task. The files include annotations, raw images, and the optional pre-extracted image features.
Obtain the annotations by:
# create a `data` folder at your desired location
mkdir data
cd data
# clone the cirr_dataset branch to the local data/cirr folder
git clone -b cirr_dataset [email protected]:Cuberick-Orion/CIRR.git cirr
The data/cirr
folder contains all relevant annotations. The file structure is described below.
Updated October 2024 -- Please contact us if you are having trouble gaining access to the raw images from NLVR2.
Starting from late 2023, we have been made aware by multiple research groups that the NLVR2 team is not responding to their requests. To this end, please see the following steps in obtaining the raw images:
Important
The NLVR2 repository provides another way to obtain the images, which is to download the images by URLs. But we do not recommend it, as many of the links are broken, and the downloaded files lack the sub-folder structure in the /train
folder.
Instead, please follow the above instruction to directly download the raw images.
The available types of image features are:
Each zip
file we provide contains a folder of individual image feature files .pkl
.
Once downloaded, unzip it into data/cirr/
, following the file structure below.
data
└─── cirr
├─── captions
│ cap.VER.test1.json
│ cap.VER.train.json
│ cap.VER.val.json
├─── captions_ext
│ cap.ext.VER.test1.json
│ cap.ext.VER.train.json
│ cap.ext.VER.val.json
├─── image_splits
│ split.VER.test1.json
│ split.VER.train.json
│ split.VER.val.json
├─── img_raw
│ ├── train
│ │ ├── 0 # sub-level folder structure inherited from NLVR2 (carries no special meaning in CIRR)
│ │ │ <IMG0_ID>.png
│ │ │ <IMG0_ID>.png
│ │ │ ...
│ │ ├── 1
│ │ │ <IMG0_ID>.png
│ │ │ <IMG0_ID>.png
│ │ │ ...
│ │ ├── 2
│ │ │ <IMG0_ID>.png
│ │ │ <IMG0_ID>.png
│ │ └── ...
│ ├── dev
│ │ <IMG0_ID>.png
│ │ <IMG1_ID>.png
│ │ ...
│ └── test1
│ <IMG0_ID>.png
│ <IMG1_ID>.png
│ ...
├─── img_feat_res152
│ <Same subfolder structure as above>
└─── img_feat_frcnn
<Same subfolder structure as above>
captions/cap.VER.SPLIT.json
A list of elements, where each element contains core information on a query-target pair.
Details on each entry can be found in the supp. mat. Sec. G of our paper.
{"pairid": 12063,
"reference": "test1-147-1-img1",
"target_hard": "test1-83-0-img1",
"target_soft": {"test1-83-0-img1": 1.0},
"caption": "remove all but one dog and add a woman hugging it",
"img_set": {"id": 1,
"members": ["test1-147-1-img1",
"test1-1001-2-img0",
"test1-83-1-img1",
"test1-359-0-img1",
"test1-906-0-img1",
"test1-83-0-img1"],
"reference_rank": 3,
"target_rank": 4}
}
captions_ext/cap.ext.VER.SPLIT.json
A list of elements, where each element contains auxiliary annotations on a query-target pair.
Details on the auxiliary annotations can be found in the supp. mat. Sec. C of our paper.
{"pairid": 12063,
"reference": "test1-147-1-img1",
"target_hard": "test1-83-0-img1",
"caption_extend": {"0": "being a photo of dogs",
"1": "add a big dog",
"2": "more focused on the hugging",
"3": "background should contain grass"}
}
image_splits/split.VER.SPLIT.json
"test1-147-1-img1": "./test1/test1-147-1-img1.png",
# or
"train-11041-2-img0": "./train/34/train-11041-2-img0.png"
img_feat_<...>/
<IMG0_ID> = "test1-147-1-img1.png".replace('.png','.pkl')
test1-147-1-img1.pkl
, so that each file can be directly indexed by its name.We do not publish the ground truth for the test split of CIRR. Instead, an evaluation server is hosted here, should you prefer to publish results on the test-split. The functions of the test-split server will be incrementally updated.
See test-split server instructions.
The server is hosted independently at CECS ANU, so please email us if the site is down.
We have licensed the annotations of CIRR under the MIT License. Please refer to the LICENSE file for details.
Following NLVR2 Licensing, we do not license the images used in CIRR, as we do not hold the copyright to them.
The images used in CIRR are sourced from the NLVR2 dataset. Users shall be bounded by its Terms of Service.
Please cite our paper if it helps your research:
@InProceedings{Liu_2021_ICCV,
author = {Liu, Zheyuan and Rodriguez-Opazo, Cristian and Teney, Damien and Gould, Stephen},
title = {Image Retrieval on Real-Life Images With Pre-Trained Vision-and-Language Models},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {2125-2134}
}
If you have any questions regarding our dataset, model, or publication, please create an issue in the project repository, or email us.