Procedural Memory Augmented Deep Reinforcement Learning

  • Ying Ma*
  • , Joseph Brooks
  • , Hongming Li
  • , Jose C. Principe
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

Inspired by the human brain, we propose an external memory-augmented decision-making architecture for video processing. A self-organizing object detector is employed as a frontend to deconstruct the environment. This is done by extracting events from the flow of time and detecting objects within the frames. By employing an extra working memory where objects are temporarily stored, the system can extract properties of the stored objects related to the task. We propose a deep reinforcement learning (RL) neural network to learn affordances, i.e., a sequence of actions to manipulate these objects. The RL network and object detector are trained alternatively. After both the network and detector are trained, the objects and their affordances are transferred to an external memory. They are then utilized when the same objects are detected in input frames. Here, we use a combination of a dictionary and a linked list for the external memory that can be accessed by either content or temporal order. This dual access is motivated by the temporal property of human procedural memory. The proposed memory-augmented RL framework brings advantages of transferability, explainability and computational efficiency with respect to conventional deep learning architectures. We validate the framework on the video game Super Mario Brothers to show superiority to some classical deep RL architectures and exemplify these three advantages. Impact Statement—Reinforcement learning is critical for the design of next generation machine learning algorithms because it decreases label requirements. However, currently, the method still requires a considerable number of interactions with the environment, and the learned network cannot be generalized to other environments because of catastrophic forgetting, so it is still not very practical. This paper proposes a different approach to stochastic search inspired by cognitive science. The goal is to deconstruct the world into objects, store them in external memory, and learn object properties when they interact with the agent. The preliminary results show that this approach decreases the number of interactions with the environment, maintaining performance and improving generalizability to other environments.

Original languageEnglish
Pages (from-to)105-120
Number of pages16
JournalIEEE Transactions on Artificial Intelligence
Volume1
Issue number2
DOIs
Publication statusPublished - Oct 2020
Externally publishedYes

Keywords

  • Deep reinforcement learning
  • external memory
  • procedural memory

Fingerprint

Dive into the research topics of 'Procedural Memory Augmented Deep Reinforcement Learning'. Together they form a unique fingerprint.

Cite this