TY - GEN
T1 - T2Ranking
T2 - 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023
AU - Xie, Xiaohui
AU - Dong, Qian
AU - Wang, Bingning
AU - Lv, Feiyang
AU - Yao, Ting
AU - Gan, Weinan
AU - Wu, Zhijing
AU - Li, Xiangsheng
AU - Li, Haitao
AU - Liu, Yiqun
AU - Ma, Jin
N1 - Publisher Copyright:
© 2023 Copyright held by the owner/author(s).
PY - 2023/7/19
Y1 - 2023/7/19
N2 - Passage ranking involves two stages: passage retrieval and passage re-ranking, which are important and challenging topics for both academics and industries in the area of Information Retrieval (IR). However, the commonly-used datasets for passage ranking usually focus on the English language. For non-English scenarios, such as Chinese, the existing datasets are limited in terms of data scale, fine-grained relevance annotation and false negative issues. To address this problem, we introduce T2Ranking, a large-scale Chinese benchmark for passage ranking. T2Ranking comprises more than 300K queries and over 2M unique passages from real-world search engines. Expert annotators are recruited to provide 4-level graded relevance scores (fine-grained) for query-passage pairs instead of binary relevance judgments (coarse-grained). To ease the false negative issues, more passages with higher diversities are considered when performing relevance annotations, especially in the test set, to ensure a more accurate evaluation. Apart from the textual query and passage data, other auxiliary resources are also provided, such as query types and XML files of documents which passages are generated from, to facilitate further studies. To evaluate the dataset, commonly used ranking models are implemented and tested on T2Ranking as baselines. The experimental results show that T2Ranking is challenging and there is still scope for improvement. The full data and all codes are available at https://github.com/THUIR/T2Ranking/.
AB - Passage ranking involves two stages: passage retrieval and passage re-ranking, which are important and challenging topics for both academics and industries in the area of Information Retrieval (IR). However, the commonly-used datasets for passage ranking usually focus on the English language. For non-English scenarios, such as Chinese, the existing datasets are limited in terms of data scale, fine-grained relevance annotation and false negative issues. To address this problem, we introduce T2Ranking, a large-scale Chinese benchmark for passage ranking. T2Ranking comprises more than 300K queries and over 2M unique passages from real-world search engines. Expert annotators are recruited to provide 4-level graded relevance scores (fine-grained) for query-passage pairs instead of binary relevance judgments (coarse-grained). To ease the false negative issues, more passages with higher diversities are considered when performing relevance annotations, especially in the test set, to ensure a more accurate evaluation. Apart from the textual query and passage data, other auxiliary resources are also provided, such as query types and XML files of documents which passages are generated from, to facilitate further studies. To evaluate the dataset, commonly used ranking models are implemented and tested on T2Ranking as baselines. The experimental results show that T2Ranking is challenging and there is still scope for improvement. The full data and all codes are available at https://github.com/THUIR/T2Ranking/.
KW - Passage ranking
KW - Passage re-ranking
KW - Passage retrieval
KW - Search evaluation
KW - Test collection
UR - http://www.scopus.com/inward/record.url?scp=85168668676&partnerID=8YFLogxK
U2 - 10.1145/3539618.3591874
DO - 10.1145/3539618.3591874
M3 - Conference contribution
AN - SCOPUS:85168668676
T3 - SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 2681
EP - 2690
BT - SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
Y2 - 23 July 2023 through 27 July 2023
ER -