Abstract
Low state-action coverage (SACo) and abundant random behaviors in datasets pose significant challenges for offline reinforcement learning (RL). To mitigate the impact of the data quality in datasets on offline RL, we propose an offline actor-critic policy improvement algorithm with historical state-action pairs (PIH). By applying Box-Cox transformation to logarithmic probabilities of dataset samples to obtain the offline policy gradient, PIH overcomes the drawback that existing offline RL methods are prone to generating extrapolation errors when using low-quality datasets for policy learning. This approach enables efficient and stable policy evaluation and improvement simultaneously using the same state-action pairs in the dataset, even when the datasets contain abundant random behaviors or the SACo is low. To calculate the advantage functions used in the offline policy gradient, a unified critic network is designed to jointly approximate state-value and action-value functions, enhancing policy learning. Extensive experiments across six benchmark environments’ datasets demonstrate that only PIH can learn policies efficiently and stably compared to state-of-the-art algorithms (CQL, TD3+BC, IQL, AWAC, etc.) when learning from datasets with low SACo and high randomness. Moreover, in the evaluations of random-zero datasets, PIH achieves a 15.5% improvement in average return compared to the mean performance of other algorithms.
| Original language | English |
|---|---|
| Article number | 8 |
| Journal | International Journal of Machine Learning and Cybernetics |
| Volume | 17 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - Jan 2026 |
| Externally published | Yes |
Keywords
- Extrapolation errors
- Intelligent control
- Intelligent decision
- Offline reinforcement learning
- Policy gradient
Fingerprint
Dive into the research topics of 'An offline actor-critic policy improvement algorithm with historical state-action pairs'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver