TY - GEN
T1 - Make Every Penny Count
T2 - 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025
AU - Wang, Xinglin
AU - Feng, Shaoxiong
AU - Li, Yiwei
AU - Yuan, Peiwen
AU - Zhang, Yueqi
AU - Tan, Chuyi
AU - Pan, Boyuan
AU - Hu, Yao
AU - Li, Kan
N1 - Publisher Copyright:
©2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Self-consistency (SC), a widely used decoding strategy for chain-of-thought reasoning, shows significant gains across various multi-step reasoning tasks but comes with a high cost due to multiple sampling with the preset size. Its variants, Adaptive self-consistency (ASC) and Early-stopping self-consistency (ESC), dynamically adjust the number of samples based on the posterior distribution of a set of pre-samples, reducing the cost of SC with minimal impact on performance. Both methods, however, do not exploit the prior information about question difficulty. It often results in unnecessary repeated sampling for easy questions that could be accurately answered with just one attempt, wasting resources. To tackle this problem, we propose Difficulty-Adaptive Self-Consistency (DSC), which leverages the difficulty information of batch queries from both prior and posterior perspectives to adaptively allocate inference resources, further reducing the overall cost of SC. To demonstrate the effectiveness of DSC, we conduct extensive experiments on three popular categories of reasoning tasks: arithmetic, commonsense and symbolic reasoning on six benchmarks. The empirical results show that DSC consistently surpasses the strong baseline ASC and ESC in terms of costs by a significant margin, while attaining comparable performances.
AB - Self-consistency (SC), a widely used decoding strategy for chain-of-thought reasoning, shows significant gains across various multi-step reasoning tasks but comes with a high cost due to multiple sampling with the preset size. Its variants, Adaptive self-consistency (ASC) and Early-stopping self-consistency (ESC), dynamically adjust the number of samples based on the posterior distribution of a set of pre-samples, reducing the cost of SC with minimal impact on performance. Both methods, however, do not exploit the prior information about question difficulty. It often results in unnecessary repeated sampling for easy questions that could be accurately answered with just one attempt, wasting resources. To tackle this problem, we propose Difficulty-Adaptive Self-Consistency (DSC), which leverages the difficulty information of batch queries from both prior and posterior perspectives to adaptively allocate inference resources, further reducing the overall cost of SC. To demonstrate the effectiveness of DSC, we conduct extensive experiments on three popular categories of reasoning tasks: arithmetic, commonsense and symbolic reasoning on six benchmarks. The empirical results show that DSC consistently surpasses the strong baseline ASC and ESC in terms of costs by a significant margin, while attaining comparable performances.
UR - https://www.scopus.com/pages/publications/105028694717
U2 - 10.18653/v1/2025.findings-naacl.383
DO - 10.18653/v1/2025.findings-naacl.383
M3 - Conference contribution
AN - SCOPUS:105028694717
T3 - 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Proceedings of the Conference Findings, NAACL 2025
SP - 6919
EP - 6932
BT - 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics
A2 - Chiruzzo, Luis
A2 - Ritter, Alan
A2 - Wang, Lu
PB - Association for Computational Linguistics (ACL)
Y2 - 29 April 2025 through 4 May 2025
ER -