Abstract
Reinforcement learning techniques like the Q-Learning one as well as the Multiple-Lookahead-Levels one that we introduced in our prior work require the agent to complete an initial exploratory path followed by as many hypothetical and physical paths as necessary to find the optimal path to the goal. This paper introduces a reinforcement learning technique that uses a distance measure to the goal as a primary gauge for an autonomous agent’s action selection. In this paper, we take advantage of the first random walk to acquire initial information about the goal. Once the agent’s goal is reached, the agent’s first perceived internal model of the environment is updated to reflect and include said goal. This is done by the agent tracing back its steps to its origin starting point. We show in this paper, no exploratory or hypothetical paths are required after the goal is initially reached or detected, and the agent requires a maximum of two physical paths to find the optimal path to the goal. The agent’s state occurrence frequency is introduced as well and used to support the proposed Distance-Only technique. A computation speed performance analysis is carried out, and the Distance-and-Frequency technique is shown to require less computation time than the Q-Learning one. Furthermore, we present and demonstrate how multiple agents using the Distance-and-Frequency technique can share knowledge of the environment and study the effect of that knowledge sharing on the agents’ learning process.
Similar content being viewed by others
References
Al-Dayaa HS, Megherbi DB (2006) A fast reinforcement learning technique via multiple lookahead levels. In: Proceedings of the international conference on machine learning; applications, models, and technologies, Las Vegas, Nevada, USA, June, 2006
Al-Dayaa HS, Megherbi DB (2006) Fast reinforcement learning techniques using the Euclidean distance and agent state occurrence frequency. In: Proceedings of the international conference on machine learning; applications, models, and technologies, Las Vegas, Nevada, USA, June, 2006
American Association for Artificial Intelligence (2010) [online], Machine Learning. Available: http://www.aaai.org/AITopics/html/machine.html, March 01, 2010 [date accessed]
Daneshfar F, Bevrani H (2010) Load-frequency control: a GA-based multi-agent reinforcement learning. IEEE/IET Gener Trans Distrib 4(1):13–26
De Mantaras R (1991) A distance-based attribute selection measure for decision tree induction. Mach Learn 6:81–92
Jiming L (2001) Autonomous agents and multi-agent systems: explorations in learning, self-organization and adaptive computation. World Scientific, Singapore
Kreyszig E (1993) Advanced engineering mathematics, 7th edn. Wiley, New York
Liu S, Tian Y (2002) Multi-agent learning methods in an uncertain environment. In: International conference on machine learning and cybernetics, Beijing, 2002
Lozano E, Acuna E (2005) Parallel algorithms for distance-based and density-based outliers. In: Fifth IEEE international conference on data mining, 2005, pp 729–732
Lozano E, Acuna E (2005) Parallel algorithms for distance-based and density-based outliers. In: Fifth IEEE international conference on data mining 2005, pp 729–732
Makar R, Mahadevan S, Ghavamzadeh M (2001) Hierarchical multi-agent reinforcement learning. In: Proceedings of the fifth international conference on autonomous agents, Montreal, Quebec, Canada, 2001, pp 247–253
McArthur SDJ, Davidson EM, Catterson VM, Dimeas AL, Hatziargyriou ND, Ponci F, Funabashi T (2007) Multi-agent systems for power engineering applications—part II: technologies, standards, and tools for building multi-agent systems. IEEE Trans Power Syst 22(4):1743–1752
McArthur SDJ, Davidson EM, Catterson VM, Dimeas AL, Hatziargyriou ND, Ponci F, Funabashi T (2007) Multi-agent systems for power engineering applications—part I: concepts, approaches, and technical challenges. IEEE Trans Power Syst 22(4):1753–1759
Megherbi DB, Al-Dayaa HS (2007) A Lyapunov-stability-based system hardware architecture for a real-time multiple-look-ahead-levels reinforcement learning. In: Proceedings of the 2006 international conference on machine learning; models, technologies & applications, Nevada, USA, 2007
Megherbi DB, Teirelbar A, Boulenouar AJ (2001) A time-varying-environment machine learning technique for autonomous agent shortest path planning. In: Proceedings of the SPIE international conference on defense sensing. Unmanned Ground vehicle Technology, Orlando, Florida, USA, April, 2001, pp 419–428
Mitchell TM (1997) Machine learning. McGraw-Hill, New York
Murray RM, Li Z, Sastry SS (1994) A mathematical introduction to robotic manipulation. CRC Press LLC, Boca Raton
Rudek R, Koszalka L, Pozniak-Koszalka I (2005) Introduction to multi-agent modified Q-learning routing for computer networks. In: IEEE advanced industrial conference on telecommunications, 2005
Sutton RS (1990) Integrated architectures for learning, planning, and reaction based on approximating dynamic programming. In: Proceedings of the seventh int conf on machine learning, 1990, pp 216–224
Sutton RS (1991) Dyna an integrated architecture for learning, planning, and reacting. In: Working notes of 1991 AAAI spring symposium, 1991, pp 151–155
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Tabuada P, Pappas GJ, Lima P (2005) Motion feasibility of multi-agent formations. IEEE Trans Robot 21(3):387–392
Vig L, Adams JA (2006) Multi-robot coalition formation. IEEE Trans Robot 22(4):637–649
Watkins C, Dayan P (1992) Q-Learning. Mach Learn 8:279–292
Weiss G (1999) Multiagent systems: a modern approach to distributed artificial intelligence. MIT Press, Cambridge
Yamamura T, Umano M, Seta K (2006) Reinforcement learning of agent with a staged view in distance and direction for the pursuit problem. In: IEEE international conference on fuzzy systems, Vancouver, BC, Canada, July 2006
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Al-Dayaa, H.S., Megherbi, D.B. Reinforcement learning technique using agent state occurrence frequency with analysis of knowledge sharing on the agent’s learning process in multiagent environments. J Supercomput 59, 526–547 (2012). https://doi.org/10.1007/s11227-010-0451-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-010-0451-x