T2Ranking: A Large-scale Chinese Benchmark for Passage Ranking

Xiaohui Xie, Qian Dong, Bingning Wang, Feiyang Lv, Ting Yao, Weinan Gan, Zhijing Wu, Xiangsheng Li, Haitao Li, Yiqun Liu, Jin Ma

科研成果: 书/报告/会议事项章节会议稿件同行评审

4 引用 (Scopus)

摘要

Passage ranking involves two stages: passage retrieval and passage re-ranking, which are important and challenging topics for both academics and industries in the area of Information Retrieval (IR). However, the commonly-used datasets for passage ranking usually focus on the English language. For non-English scenarios, such as Chinese, the existing datasets are limited in terms of data scale, fine-grained relevance annotation and false negative issues. To address this problem, we introduce T2Ranking, a large-scale Chinese benchmark for passage ranking. T2Ranking comprises more than 300K queries and over 2M unique passages from real-world search engines. Expert annotators are recruited to provide 4-level graded relevance scores (fine-grained) for query-passage pairs instead of binary relevance judgments (coarse-grained). To ease the false negative issues, more passages with higher diversities are considered when performing relevance annotations, especially in the test set, to ensure a more accurate evaluation. Apart from the textual query and passage data, other auxiliary resources are also provided, such as query types and XML files of documents which passages are generated from, to facilitate further studies. To evaluate the dataset, commonly used ranking models are implemented and tested on T2Ranking as baselines. The experimental results show that T2Ranking is challenging and there is still scope for improvement. The full data and all codes are available at https://github.com/THUIR/T2Ranking/.

源语言英语
主期刊名SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
出版商Association for Computing Machinery, Inc
2681-2690
页数10
ISBN(电子版)9781450394086
DOI
出版状态已出版 - 19 7月 2023
活动46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023 - Taipei, 中国台湾
期限: 23 7月 202327 7月 2023

出版系列

姓名SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

会议

会议46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023
国家/地区中国台湾
Taipei
时期23/07/2327/07/23

指纹

探究 'T2Ranking: A Large-scale Chinese Benchmark for Passage Ranking' 的科研主题。它们共同构成独一无二的指纹。

引用此

Xie, X., Dong, Q., Wang, B., Lv, F., Yao, T., Gan, W., Wu, Z., Li, X., Li, H., Liu, Y., & Ma, J. (2023). T2Ranking: A Large-scale Chinese Benchmark for Passage Ranking. 在 SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (页码 2681-2690). (SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery, Inc. https://doi.org/10.1145/3539618.3591874