PAC Reinforcement Learning With an Imperfect Model

Authors

  • Nan Jiang Microsoft Research

DOI:

https://doi.org/10.1609/aaai.v32i1.11594

Keywords:

reinforcement learning, PAC-MDP, exploration, sim-to-real transfer

Abstract

Reinforcement learning (RL) methods have proved to be successful in many simulated environments. The common approaches, however, are often too sample intensive to be applied directly in the real world. A promising approach to addressing this issue is to train an RL agent in a simulator and transfer the solution to the real environment. When a high-fidelity simulator is available we would expect significant reduction in the amount of real trajectories needed for learning. In this work we aim at better understanding the theoretical nature of this approach. We start with a perhaps surprising result that, even if the approximate model (e.g., a simulator) only differs from the real environment in a single state-action pair (but which one is unknown), such a model could be information-theoretically useless and the sample complexity (in terms of real trajectories) still scales with the total number of states in the worst case. We investigate the hard instances and come up with natural conditions that avoid the pathological situations. We then propose two conceptually simple algorithms that enjoy polynomial sample complexity guarantees with no dependence on the size of the state-action space, and prove some foundational results to provide insights into this important problem.

Downloads

Published

2018-04-29

How to Cite

Jiang, N. (2018). PAC Reinforcement Learning With an Imperfect Model. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11594