MMDialog Download - MMDialog Source code download

MMDialog

AI Source Code

1.0.0

Download

MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation

This repository is the official site of ACL'23 paper: MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation

About the dataset

A Dialogue Case of MMDialog:

Dataset ADialogueCase

Statistics:

Dataset Statistics

If you use it in your work, please cite our paper:

@inproceedings{feng-etal-2023-MMDialog,
    title = "{MMD}ialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation",
    author = "Feng, Jiazhan and Sun, Qingfeng and Xu, Can and Zhao, Pu and Yang, Yaming and Tao, Chongyang and Zhao, Dongyan and Lin, Qingwei",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.405",
    doi = "10.18653/v1/2023.acl-long.405",
    pages = "7348--7363"
}

Dataset Folder Format:

Dataset Format

File: conversations.json

Dialogue Case

Note:

Training set do not contains "negative_candidate_media_keys" and "negative_candidate_texts", which only exists in test and validation set. Each "negative_candidate_xxx" contains 999 negative candidates for retrieval task.
All image filenames are in "media_key.jpg" format.
Words like :smiling_face_with_smiling_eyes: and :raising_hands: are emotion tokens, please refer to https://github.com/carpedm20/emoji
To compute the CLIP scores in metric MM-Relevance, we provide a demo in compute_mmrel.py.
We also provide an evaluation example for metrics evaluated within a single modality (e.g., BLEU, Recall) in EvaluationExample.md.

How to get the dataset

To get this dataset, you and your organization require:

Who it's for: You are either a master’s student, doctoral candidate, post-doc, faculty, or research-focused employee at an academic institution or university.
Non-commercial use: You should only use this access for non-commercial purposes.
Clearly Plan: You have a clearly defined research objective, and you have specific plans for how you intend to use and analyze this data from your research.
Promise your behavior: You should promise you would not share this dataset without our qualification review and permission.

If you don't meet all of the requirements above, we would not share you the dataset.

We need you to fill in the form below:

Item	Description
Your Name	[Your name here]
Your Role	[master’s student / doctoral candidate / post-doc / faculty / research-focused employee / others]
Your Study or Work Organization	e.g. Microsoft Research, DeepMind, Cornell University, ...
Your Personal Academic Homepage With Publications	Your [Google Scholar] or [Homepage_URL running on your organization website (e.g. yourname.people.xxx.edu / yourname.xxx.people.msr.microsoft.com)] with publications.
Non-commercial Use	I [promise / cannot promise] that I will not apply this MMDialog dataset to commercial scenarios or products.
Sharing Limitation	I [promise / cannot promise] I would not share this MMDialog dataset without your qualification review and permission.
Your Plan	(Describe your research plan and how you intend to use and analyze this data from your research. >= 50 words)

Then use your edu or research email account to send the form to [[email protected]] for a review, if you meet all the requirements, we would share you a cloud folder which stores the pre-processed dataset within a week.

Expand

Additional Information

Version 1.0.0
Type AI Source Code
Update Time 2025-01-01
size 50MB
From Github

Related Applications

node telegram bot api

2024-12-14
typebot.io

2024-12-14
python wechaty getting started

2024-12-14
TranscriberBot

2024-12-14
genal chat

2024-12-14
Facemoji

2024-12-14

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
node telegram bot api

AI Source Code

v0.50.0
typebot.io

AI Source Code

v3.1.2
python wechaty getting started

AI Source Code

1.0.0
waymo open dataset

Other source code

December 2023 Update
termwind

Other categories

v2.3.0
wp functions

Other categories

1.0.0

Related Information All