TY - JOUR
T1 - LLM-Guided Reinforcement Learning for Interactive Environments
AU - Yang, Fuxue
AU - Liu, Jiawen
AU - Li, Kan
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/6
Y1 - 2025/6
N2 - We propose herein LLM-Guided Reinforcement Learning (LGRL), a novel framework that leverages large language models (LLMs) to decompose high-level objectives into a sequence of manageable subgoals in interactive environments. Our approach decouples high-level planning from low-level action execution by dynamically generating context-aware subgoals that guide the reinforcement learning (RL) agent. During training, intermediate subgoals—each associated with partial rewards—are generated based on the agent’s current progress, providing fine-grained feedback that facilitates structured exploration and accelerates convergence. At inference, a chain-of-thought strategy is employed, enabling the LLM to adaptively update subgoals in response to evolving environmental states. Although demonstrated on a representative interactive setting, our method is generalizable to a wide range of complex, goal-oriented tasks. Experimental results show that LGRL achieves higher success rates, improved efficiency, and faster convergence compared to baseline approaches.
AB - We propose herein LLM-Guided Reinforcement Learning (LGRL), a novel framework that leverages large language models (LLMs) to decompose high-level objectives into a sequence of manageable subgoals in interactive environments. Our approach decouples high-level planning from low-level action execution by dynamically generating context-aware subgoals that guide the reinforcement learning (RL) agent. During training, intermediate subgoals—each associated with partial rewards—are generated based on the agent’s current progress, providing fine-grained feedback that facilitates structured exploration and accelerates convergence. At inference, a chain-of-thought strategy is employed, enabling the LLM to adaptively update subgoals in response to evolving environmental states. Although demonstrated on a representative interactive setting, our method is generalizable to a wide range of complex, goal-oriented tasks. Experimental results show that LGRL achieves higher success rates, improved efficiency, and faster convergence compared to baseline approaches.
KW - chain of thought
KW - large language models
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=105008936133&partnerID=8YFLogxK
U2 - 10.3390/math13121932
DO - 10.3390/math13121932
M3 - Article
AN - SCOPUS:105008936133
SN - 2227-7390
VL - 13
JO - Mathematics
JF - Mathematics
IS - 12
M1 - 1932
ER -