TY - JOUR
T1 - MODE
T2 - a multimodal open-domain dialogue dataset with explanation
AU - Yin, Hang
AU - Lu, Pinren
AU - Li, Ziang
AU - Sun, Bin
AU - Li, Kan
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
PY - 2024
Y1 - 2024
N2 - The need for high-quality data has been a key issue hindering the research of dialogue tasks. Recent studies try to build datasets through manual, web crawling and so on. However, man-made data is expensive and data collected from the internet often includes generic responses, meaningless statements even toxic information. With the development of LLM (large language models), generating data through LLM has broad application potential. For open-domain multimodal dialogue tasks, there are still three drawbacks: 1) There is currently a lack of a unified and effective framework for collecting high-quality multimodal dialogue data; 2) The output of LLM in Multimodal dialogue generation lacks scene explanation, affecting human understanding; 3) Previous work has not quantitatively examined the impact of data quality on model performance. To improve data quality and reduce expenditure in the data collection process, we propose the Multimodal Data Construction Framework (MDCF). MDCF utilizes the modal conversion module and designs proper prompts to the LLM to generate well-formed and high-quality content. It also provides explanation for the multimodal dialogue, helping to understand conversation scenarios and facilitate manual subsequent quality inspection. Based on this, we release a Multimodal Open-domain Dialogue dataset with Explanation(MODE). We mainly compared open domain datasets such as Image-Chat. Both human evaluation and experiments show that high-quality datasets enable models to have greater understanding and generation capabilities.
AB - The need for high-quality data has been a key issue hindering the research of dialogue tasks. Recent studies try to build datasets through manual, web crawling and so on. However, man-made data is expensive and data collected from the internet often includes generic responses, meaningless statements even toxic information. With the development of LLM (large language models), generating data through LLM has broad application potential. For open-domain multimodal dialogue tasks, there are still three drawbacks: 1) There is currently a lack of a unified and effective framework for collecting high-quality multimodal dialogue data; 2) The output of LLM in Multimodal dialogue generation lacks scene explanation, affecting human understanding; 3) Previous work has not quantitatively examined the impact of data quality on model performance. To improve data quality and reduce expenditure in the data collection process, we propose the Multimodal Data Construction Framework (MDCF). MDCF utilizes the modal conversion module and designs proper prompts to the LLM to generate well-formed and high-quality content. It also provides explanation for the multimodal dialogue, helping to understand conversation scenarios and facilitate manual subsequent quality inspection. Based on this, we release a Multimodal Open-domain Dialogue dataset with Explanation(MODE). We mainly compared open domain datasets such as Image-Chat. Both human evaluation and experiments show that high-quality datasets enable models to have greater understanding and generation capabilities.
KW - AIGC
KW - Explainability
KW - Multimodal data construction
KW - Open-domain dialogue
UR - http://www.scopus.com/inward/record.url?scp=85191808364&partnerID=8YFLogxK
U2 - 10.1007/s10489-024-05479-x
DO - 10.1007/s10489-024-05479-x
M3 - Article
AN - SCOPUS:85191808364
SN - 0924-669X
JO - Applied Intelligence
JF - Applied Intelligence
ER -