MODE: a multimodal open-domain dialogue dataset with explanation

Hang Yin; Pinren Lu; Ziang Li; Bin Sun; Kan Li

doi:10.1007/s10489-024-05479-x

MODE: a multimodal open-domain dialogue dataset with explanation

Hang Yin, Pinren Lu, Ziang Li, Bin Sun, Kan Li^*

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

The need for high-quality data has been a key issue hindering the research of dialogue tasks. Recent studies try to build datasets through manual, web crawling and so on. However, man-made data is expensive and data collected from the internet often includes generic responses, meaningless statements even toxic information. With the development of LLM (large language models), generating data through LLM has broad application potential. For open-domain multimodal dialogue tasks, there are still three drawbacks: 1) There is currently a lack of a unified and effective framework for collecting high-quality multimodal dialogue data; 2) The output of LLM in Multimodal dialogue generation lacks scene explanation, affecting human understanding; 3) Previous work has not quantitatively examined the impact of data quality on model performance. To improve data quality and reduce expenditure in the data collection process, we propose the Multimodal Data Construction Framework (MDCF). MDCF utilizes the modal conversion module and designs proper prompts to the LLM to generate well-formed and high-quality content. It also provides explanation for the multimodal dialogue, helping to understand conversation scenarios and facilitate manual subsequent quality inspection. Based on this, we release a Multimodal Open-domain Dialogue dataset with Explanation(MODE). We mainly compared open domain datasets such as Image-Chat. Both human evaluation and experiments show that high-quality datasets enable models to have greater understanding and generation capabilities.

Original language	English
Journal	Applied Intelligence
DOIs	https://doi.org/10.1007/s10489-024-05479-x
Publication status	Accepted/In press - 2024

Keywords

AIGC
Explainability
Multimodal data construction
Open-domain dialogue

Access to Document

10.1007/s10489-024-05479-x

Cite this

Yin, H., Lu, P., Li, Z., Sun, B., & Li, K. (Accepted/In press). MODE: a multimodal open-domain dialogue dataset with explanation. Applied Intelligence. https://doi.org/10.1007/s10489-024-05479-x

@article{31858958c05942428d5a40164c09a395,

title = "MODE: a multimodal open-domain dialogue dataset with explanation",

abstract = "The need for high-quality data has been a key issue hindering the research of dialogue tasks. Recent studies try to build datasets through manual, web crawling and so on. However, man-made data is expensive and data collected from the internet often includes generic responses, meaningless statements even toxic information. With the development of LLM (large language models), generating data through LLM has broad application potential. For open-domain multimodal dialogue tasks, there are still three drawbacks: 1) There is currently a lack of a unified and effective framework for collecting high-quality multimodal dialogue data; 2) The output of LLM in Multimodal dialogue generation lacks scene explanation, affecting human understanding; 3) Previous work has not quantitatively examined the impact of data quality on model performance. To improve data quality and reduce expenditure in the data collection process, we propose the Multimodal Data Construction Framework (MDCF). MDCF utilizes the modal conversion module and designs proper prompts to the LLM to generate well-formed and high-quality content. It also provides explanation for the multimodal dialogue, helping to understand conversation scenarios and facilitate manual subsequent quality inspection. Based on this, we release a Multimodal Open-domain Dialogue dataset with Explanation(MODE). We mainly compared open domain datasets such as Image-Chat. Both human evaluation and experiments show that high-quality datasets enable models to have greater understanding and generation capabilities.",

keywords = "AIGC, Explainability, Multimodal data construction, Open-domain dialogue",

author = "Hang Yin and Pinren Lu and Ziang Li and Bin Sun and Kan Li",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.",

year = "2024",

doi = "10.1007/s10489-024-05479-x",

language = "English",

journal = "Applied Intelligence",

issn = "0924-669X",

publisher = "Springer Netherlands",

}

TY - JOUR

T1 - MODE

T2 - a multimodal open-domain dialogue dataset with explanation

AU - Yin, Hang

AU - Lu, Pinren

AU - Li, Ziang

AU - Sun, Bin

AU - Li, Kan

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

PY - 2024

Y1 - 2024

N2 - The need for high-quality data has been a key issue hindering the research of dialogue tasks. Recent studies try to build datasets through manual, web crawling and so on. However, man-made data is expensive and data collected from the internet often includes generic responses, meaningless statements even toxic information. With the development of LLM (large language models), generating data through LLM has broad application potential. For open-domain multimodal dialogue tasks, there are still three drawbacks: 1) There is currently a lack of a unified and effective framework for collecting high-quality multimodal dialogue data; 2) The output of LLM in Multimodal dialogue generation lacks scene explanation, affecting human understanding; 3) Previous work has not quantitatively examined the impact of data quality on model performance. To improve data quality and reduce expenditure in the data collection process, we propose the Multimodal Data Construction Framework (MDCF). MDCF utilizes the modal conversion module and designs proper prompts to the LLM to generate well-formed and high-quality content. It also provides explanation for the multimodal dialogue, helping to understand conversation scenarios and facilitate manual subsequent quality inspection. Based on this, we release a Multimodal Open-domain Dialogue dataset with Explanation(MODE). We mainly compared open domain datasets such as Image-Chat. Both human evaluation and experiments show that high-quality datasets enable models to have greater understanding and generation capabilities.

AB - The need for high-quality data has been a key issue hindering the research of dialogue tasks. Recent studies try to build datasets through manual, web crawling and so on. However, man-made data is expensive and data collected from the internet often includes generic responses, meaningless statements even toxic information. With the development of LLM (large language models), generating data through LLM has broad application potential. For open-domain multimodal dialogue tasks, there are still three drawbacks: 1) There is currently a lack of a unified and effective framework for collecting high-quality multimodal dialogue data; 2) The output of LLM in Multimodal dialogue generation lacks scene explanation, affecting human understanding; 3) Previous work has not quantitatively examined the impact of data quality on model performance. To improve data quality and reduce expenditure in the data collection process, we propose the Multimodal Data Construction Framework (MDCF). MDCF utilizes the modal conversion module and designs proper prompts to the LLM to generate well-formed and high-quality content. It also provides explanation for the multimodal dialogue, helping to understand conversation scenarios and facilitate manual subsequent quality inspection. Based on this, we release a Multimodal Open-domain Dialogue dataset with Explanation(MODE). We mainly compared open domain datasets such as Image-Chat. Both human evaluation and experiments show that high-quality datasets enable models to have greater understanding and generation capabilities.

KW - AIGC

KW - Explainability

KW - Multimodal data construction

KW - Open-domain dialogue

UR - http://www.scopus.com/inward/record.url?scp=85191808364&partnerID=8YFLogxK

U2 - 10.1007/s10489-024-05479-x

DO - 10.1007/s10489-024-05479-x

M3 - Article

AN - SCOPUS:85191808364

SN - 0924-669X

JO - Applied Intelligence

JF - Applied Intelligence

ER -

MODE: a multimodal open-domain dialogue dataset with explanation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this