Abstract
By merging agents’ individually learned optimal value functions, agents can learn their optimal policies in a multiagent system. Pre-knowledge of the task is used to decompose it into several subtasks and this decomposition greatly reduces the state and action spaces. The optimal value functions of each subtask are learned by MAXQ-Q[1] algorithm. By defining the lower and upper bound on the value functions of the whole task, we propose a novel online multiagent learning algorithm LU-Q, and LU-Q accelerates learning of coordination between multiple agents by task decomposition and action pruning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dietterich, T.G.: Hierarchical Reinforcement Learning With the MAXQ Value Function Decomposition. J. of Artificial Intelligence Research 13, 227–303 (2000)
Watkins, C.J.C.H.: Learning from Delayed Rewards. Cambridge University. Ph.D. thesis, Cambridge, UK (1989)
Singh, S., Cohn, D.: How to Dynamically Merge Markov Decision Processes. In: 17th Int. Conference on Neural Information Processing Systems (1999)
Ghavamzadeh, M., Mahadevan, S.: A Multiagent Reinforcement Learning Algorithm by Dynamically Merging Markov Decision Processes. In: 1st Int. Joint Conference on Autonomous Agents and Multiagent Systems, Bologna (2002)
Boutilier, C.: Sequential Optimality and Coordination in Multiagent Systems. In: 16th Int. Joint Conference on Artificial Intelligence, Stockholm, pp. 478–485 (1999)
Littman, M.L.: Markov Games as a Framework for Multi-Agent Reinforcement Learning. In: 11th Int. Conference of Machine Learning, New Brunswick, pp. 157–163 (1994)
Hu, J., Wellman, M.P.: Nash Q-Learning for General-Sum Stochastic Games. J. of Machine Learning Research 1, 1–30 (2003)
Greenwald, A., Hall, K., Serrano, R.: Correlated-Q Learning. In: 20th Int. Conference on Neural Information Processing Systems, Workshop on Multiagent Learning (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, H., Huang, S. (2004). Merging Individually Learned Optimal Results to Accelerate Coordination. In: Li, Q., Wang, G., Feng, L. (eds) Advances in Web-Age Information Management. WAIM 2004. Lecture Notes in Computer Science, vol 3129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27772-9_64
Download citation
DOI: https://doi.org/10.1007/978-3-540-27772-9_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22418-1
Online ISBN: 978-3-540-27772-9
eBook Packages: Springer Book Archive