跳到主要导航 跳到搜索 跳到主要内容

From Tokens to Latent States: Leveraging Pre-trained Language Models for Improving Partially Observable Reinforcement Learning

  • Meiju Li
  • , Ruixiang Sun
  • , Xin Li*
  • , Mingzhong Wang
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • University of the Sunshine Coast

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Partially observable Markov decision processes (POMDPs) present significant challenges for reinforcement learning, as agents must learn optimal policies while maintaining belief states over unobserved environment states based on partial observations. We observe a compelling analogy: large language models (LLMs) autoregressively generate token probability distributions based on preceding context, mirroring how belief states are maintained and updated in POMDPs. This insight motivates leveraging the rich prior knowledge embedded in pre-trained LLMs for latent states estimation from observation-action histories. However, two critical challenges emerge: on the one hand, modality misalignment prevents LLMs from directly encoding visual observations and discrete actions; on the other hand, semantic misalignment exists between observation-action sequences and token sequences. To address these challenges, we introduce a novel framework ELSLLM that employs a Johnson-Lindenstrauss projection (JLP) module to transform input dimensions while preserving state similarity with theoretical guarantees, and utilizes modern Hopfield networks (MHN) to store all word embeddings from pre-trained LLMs as a knowledge repository. Through retrieval and querying mechanisms, ELSLLM achieves token-level knowledge alignment without requiring fine-tuning of the pre-trained LLMs. Extensive experiments on partially observable environments demonstrate that ELSLLM achieves state-of-the-art performance, significantly outperforming baseline methods with and without LSTM memory mechanisms. Our work opens new avenues for integrating pre-trained LLMs with reinforcement learning in partially observable settings.

源语言英语
主期刊名Proceedings of the AAAI Conference on Artificial Intelligence
编辑Sven Koenig, Chad Jenkins, Matthew E. Taylor
出版商Association for the Advancement of Artificial Intelligence
23003-23011
页数9
版本27
ISBN(印刷版)9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067
DOI
出版状态已出版 - 2026
活动40th AAAI Conference on Artificial Intelligence, AAAI 2026 - Singapore, 新加坡
期限: 20 1月 202627 1月 2026

出版系列

姓名Proceedings of the AAAI Conference on Artificial Intelligence
编号27
40
ISSN(印刷版)2159-5399
ISSN(电子版)2374-3468

会议

会议40th AAAI Conference on Artificial Intelligence, AAAI 2026
国家/地区新加坡
Singapore
时期20/01/2627/01/26

指纹

探究 'From Tokens to Latent States: Leveraging Pre-trained Language Models for Improving Partially Observable Reinforcement Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此