iBet uBet web content aggregator. Adding the entire web to your favor.

Link to original content: https://api.crossref.org/works/10.1145/3512798.3512816

{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,20]],"date-time":"2024-09-20T16:52:14Z","timestamp":1726851134215},"reference-count":6,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGMETRICS Perform. Eval. Rev."],"published-print":{"date-parts":[[2022,1,17]]},"abstract":"The Whittle index policy is a heuristic that has shown remarkable good performance (with guaranted asymptotic optimality) when applied to the class of problems known as multi-armed restless bandits. In this paper we develop QWI, an algorithm based on Q-learning in order to learn theWhittle indices. The key feature is the deployment of two timescales, a relatively faster one to update the state-action Qfunctions, and a relatively slower one to update the Whittle indices. In our main result, we show that the algorithm converges to the Whittle indices of the problem. Numerical computations show that our algorithm converges much faster than both the standard Q-learning algorithm as well as neural-network based approximate Q-learning.<\/jats:p>","DOI":"10.1145\/3512798.3512816","type":"journal-article","created":{"date-parts":[[2022,1,20]],"date-time":"2022-01-20T18:13:23Z","timestamp":1642702403000},"page":"47-50","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["QWI"],"prefix":"10.1145","volume":"49","author":[{"given":"Francisco","family":"Robledo","sequence":"first","affiliation":[]},{"given":"Vivek","family":"Borkar","sequence":"additional","affiliation":[]},{"given":"Urtzi","family":"Ayesta","sequence":"additional","affiliation":[]},{"given":"Konstantin","family":"Avrachenkov","sequence":"additional","affiliation":[]}],"member":"320","published-online":{"date-parts":[[2022,1,20]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"K. Avrachenkov and V.S. Borkar Whittle index based Q-learning for restless bandits with average reward arXiv preprint arXiv:2004.14427 (2020) K. Avrachenkov and V.S. Borkar Whittle index based Q-learning for restless bandits with average reward arXiv preprint arXiv:2004.14427 (2020)"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"C. Lakshminarayanan and S. Bhatnagar A stability criterion for two timescale stochastic approximation schemes Automatica 79 108--114 (2017) C. Lakshminarayanan and S. Bhatnagar A stability criterion for two timescale stochastic approximation schemes Automatica 79 108--114 (2017)","DOI":"10.1016\/j.automatica.2016.12.014"},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"P. Whittle Restless bandits: Activity allocation in a changing world. Journal of applied probability 287--298 (1988) P. Whittle Restless bandits: Activity allocation in a changing world. Journal of applied probability 287--298 (1988)","DOI":"10.2307\/3214163"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.2307\/3214547"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"}],"container-title":["ACM SIGMETRICS Performance Evaluation Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3512798.3512816","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T23:34:30Z","timestamp":1672616070000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3512798.3512816"}},"subtitle":["Q-learning with Whittle Index"],"short-title":[],"issued":{"date-parts":[[2022,1,17]]},"references-count":6,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,1,17]]}},"alternative-id":["10.1145\/3512798.3512816"],"URL":"http:\/\/dx.doi.org\/10.1145\/3512798.3512816","relation":{},"ISSN":["0163-5999"],"issn-type":[{"value":"0163-5999","type":"print"}],"subject":[],"published":{"date-parts":[[2022,1,17]]},"assertion":[{"value":"2022-01-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}