TY - GEN
T1 - Memory or Reasoning? Explore How LLMs Compute Mixed Arithmetic Expressions
AU - Li, Chengzhi
AU - Huang, Heyan
AU - Jian, Ping
AU - Yang, Zhen
AU - Wang, Chenxu
AU - Wang, Yifan
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Large language models (LLMs) can solve complex multi-step math reasoning problems, but little is known about how these computations are implemented internally. Many recent studies have investigated the mechanisms of LLMs on simple arithmetic tasks (e.g., a + b, a × b), but how LLMs solve mixed arithmetic tasks still remains unexplored. This gap highlights the limitation of these findings in reflecting real-world scenarios. In this work, we take a step further to explore how LLMs compute mixed arithmetic expressions. We find that LLMs follow a similar workflow to mixed arithmetic calculations: first parsing the complete expression, then using attention heads to aggregate information to the last token position for result generation, without step-by-step reasoning at the token dimension. However, for some specific expressions, the model generates the final result depends on the generation of intermediate results at the last token position, which is similar to human thinking. Furthermore, we propose a Causal Effect Driven Fine-tuning method (CEDF) to adaptively enhance the identified key components used to execute mixed arithmetic calculations to improve LLMs' reasoning ability.
AB - Large language models (LLMs) can solve complex multi-step math reasoning problems, but little is known about how these computations are implemented internally. Many recent studies have investigated the mechanisms of LLMs on simple arithmetic tasks (e.g., a + b, a × b), but how LLMs solve mixed arithmetic tasks still remains unexplored. This gap highlights the limitation of these findings in reflecting real-world scenarios. In this work, we take a step further to explore how LLMs compute mixed arithmetic expressions. We find that LLMs follow a similar workflow to mixed arithmetic calculations: first parsing the complete expression, then using attention heads to aggregate information to the last token position for result generation, without step-by-step reasoning at the token dimension. However, for some specific expressions, the model generates the final result depends on the generation of intermediate results at the last token position, which is similar to human thinking. Furthermore, we propose a Causal Effect Driven Fine-tuning method (CEDF) to adaptively enhance the identified key components used to execute mixed arithmetic calculations to improve LLMs' reasoning ability.
UR - https://www.scopus.com/pages/publications/105028602842
U2 - 10.18653/v1/2025.findings-acl.299
DO - 10.18653/v1/2025.findings-acl.299
M3 - Conference contribution
AN - SCOPUS:105028602842
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 5742
EP - 5763
BT - Findings of the Association for Computational Linguistics
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Y2 - 27 July 2025 through 1 August 2025
ER -