Full Gradient Deep Reinforcement Learning for Average-Reward Criterion

Tejas Pagare, Vivek Borkar, Konstantin Avrachenkov
Proceedings of The 5th Annual Learning for Dynamics and Control Conference, PMLR 211:235-247, 2023.

Abstract

We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems. We experimentally compare widely used RVI Q-Learning with recently proposed Differential Q-Learning in the neural function approximation setting with Full Gradient DQN and DQN. We also extend this to learn Whittle indices for Markovian restless multi-armed bandits. We observe a better convergence rate of the proposed Full Gradient variant across different tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v211-pagare23a, title = {Full Gradient Deep Reinforcement Learning for Average-Reward Criterion}, author = {Pagare, Tejas and Borkar, Vivek and Avrachenkov, Konstantin}, booktitle = {Proceedings of The 5th Annual Learning for Dynamics and Control Conference}, pages = {235--247}, year = {2023}, editor = {Matni, Nikolai and Morari, Manfred and Pappas, George J.}, volume = {211}, series = {Proceedings of Machine Learning Research}, month = {15--16 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v211/pagare23a/pagare23a.pdf}, url = {https://proceedings.mlr.press/v211/pagare23a.html}, abstract = {We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems. We experimentally compare widely used RVI Q-Learning with recently proposed Differential Q-Learning in the neural function approximation setting with Full Gradient DQN and DQN. We also extend this to learn Whittle indices for Markovian restless multi-armed bandits. We observe a better convergence rate of the proposed Full Gradient variant across different tasks.} }
Endnote
%0 Conference Paper %T Full Gradient Deep Reinforcement Learning for Average-Reward Criterion %A Tejas Pagare %A Vivek Borkar %A Konstantin Avrachenkov %B Proceedings of The 5th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2023 %E Nikolai Matni %E Manfred Morari %E George J. Pappas %F pmlr-v211-pagare23a %I PMLR %P 235--247 %U https://proceedings.mlr.press/v211/pagare23a.html %V 211 %X We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems. We experimentally compare widely used RVI Q-Learning with recently proposed Differential Q-Learning in the neural function approximation setting with Full Gradient DQN and DQN. We also extend this to learn Whittle indices for Markovian restless multi-armed bandits. We observe a better convergence rate of the proposed Full Gradient variant across different tasks.
APA
Pagare, T., Borkar, V. & Avrachenkov, K.. (2023). Full Gradient Deep Reinforcement Learning for Average-Reward Criterion. Proceedings of The 5th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 211:235-247 Available from https://proceedings.mlr.press/v211/pagare23a.html.

Related Material