Abstract
Open-domain automatic dialogue evaluation plays an important role in dialogue systems. While recent efforts are being put into making learning-based evaluation metrics correlate better with human evaluation, robust metrics for parallel corpora and multiple domains remain unexplored. Parallel corpora refer to corpora that express the same idea in different ways (e.g., translation, paraphrasing and back-translation). In this paper, we propose Parallel Corpora Alignment Framework (PCAF), which improves the consistency and robustness of model evaluation on parallel corpora. Firstly, parallel corpora are aligned in semantic space through parallel-corpora-aligned contrastive learning. Then, parallel-corpora-aligned distillation on multiple datasets is applied to further improve model’s generalization ability across multiple data domains. Our approach ranks second on the final test data of DSTC11 track4 sub-task1 ("Multilingual Automatic Evaluation Metrics", turn-level) and third on the sub-task2 ("Robust Automatic Evaluation Metrics", turn-level), which proves the strong generalization ability and robustness of our proposed approach.
Original language | English |
---|---|
Pages | 123-132 |
Number of pages | 10 |
Publication status | Published - 2023 |
Event | 11th Dialog System Technology Challenge, DSTC 2023 - Prague, Czech Republic Duration: 11 Sept 2023 → … |
Conference
Conference | 11th Dialog System Technology Challenge, DSTC 2023 |
---|---|
Country/Territory | Czech Republic |
City | Prague |
Period | 11/09/23 → … |