Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

Zhou, Runlong; Zhang, Zihan; Du, Simon S.

Computer Science > Machine Learning

arXiv:2301.13446 (cs)

[Submitted on 31 Jan 2023 (v1), last revised 21 May 2023 (this version, v3)]

Title:Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

Authors:Runlong Zhou, Zihan Zhang, Simon S. Du

View PDF

Abstract:We study variance-dependent regret bounds for Markov decision processes (MDPs). Algorithms with variance-dependent regret guarantees can automatically exploit environments with low variance (e.g., enjoying constant regret on deterministic MDPs). The existing algorithms are either variance-independent or suboptimal. We first propose two new environment norms to characterize the fine-grained variance properties of the environment. For model-based methods, we design a variant of the MVP algorithm (Zhang et al., 2021a). We apply new analysis techniques to demonstrate that this algorithm enjoys variance-dependent bounds with respect to the norms we propose. In particular, this bound is simultaneously minimax optimal for both stochastic and deterministic MDPs, the first result of its kind. We further initiate the study on model-free algorithms with variance-dependent regret bounds by designing a reference-function-based algorithm with a novel capped-doubling reference update schedule. Lastly, we also provide lower bounds to complement our upper bounds.

Comments:	ICML 2023
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2301.13446 [cs.LG]
	(or arXiv:2301.13446v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2301.13446

Submission history

From: Runlong Zhou [view email]
[v1] Tue, 31 Jan 2023 06:54:06 UTC (169 KB)
[v2] Wed, 26 Apr 2023 21:26:02 UTC (170 KB)
[v3] Sun, 21 May 2023 20:44:06 UTC (168 KB)

Computer Science > Machine Learning

Title:Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators