PAC Reinforcement Learning With an Imperfect Model

Nan Jiang

doi:10.1609/aaai.v32i1.11594

Authors

Nan Jiang Microsoft Research

DOI:

https://doi.org/10.1609/aaai.v32i1.11594

Keywords:

reinforcement learning, PAC-MDP, exploration, sim-to-real transfer

Abstract

Reinforcement learning (RL) methods have proved to be successful in many simulated environments. The common approaches, however, are often too sample intensive to be applied directly in the real world. A promising approach to addressing this issue is to train an RL agent in a simulator and transfer the solution to the real environment. When a high-fidelity simulator is available we would expect significant reduction in the amount of real trajectories needed for learning. In this work we aim at better understanding the theoretical nature of this approach. We start with a perhaps surprising result that, even if the approximate model (e.g., a simulator) only differs from the real environment in a single state-action pair (but which one is unknown), such a model could be information-theoretically useless and the sample complexity (in terms of real trajectories) still scales with the total number of states in the worst case. We investigate the hard instances and come up with natural conditions that avoid the pathological situations. We then propose two conceptually simple algorithms that enjoy polynomial sample complexity guarantees with no dependence on the size of the state-action space, and prove some foundational results to provide insights into this important problem.

PAC Reinforcement Learning With an Imperfect Model

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription