iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1007/S11227-010-0451-X
Reinforcement learning technique using agent state occurrence frequency with analysis of knowledge sharing on the agent’s learning process in multiagent environments | The Journal of Supercomputing Skip to main content
Log in

Reinforcement learning technique using agent state occurrence frequency with analysis of knowledge sharing on the agent’s learning process in multiagent environments

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Reinforcement learning techniques like the Q-Learning one as well as the Multiple-Lookahead-Levels one that we introduced in our prior work require the agent to complete an initial exploratory path followed by as many hypothetical and physical paths as necessary to find the optimal path to the goal. This paper introduces a reinforcement learning technique that uses a distance measure to the goal as a primary gauge for an autonomous agent’s action selection. In this paper, we take advantage of the first random walk to acquire initial information about the goal. Once the agent’s goal is reached, the agent’s first perceived internal model of the environment is updated to reflect and include said goal. This is done by the agent tracing back its steps to its origin starting point. We show in this paper, no exploratory or hypothetical paths are required after the goal is initially reached or detected, and the agent requires a maximum of two physical paths to find the optimal path to the goal. The agent’s state occurrence frequency is introduced as well and used to support the proposed Distance-Only technique. A computation speed performance analysis is carried out, and the Distance-and-Frequency technique is shown to require less computation time than the Q-Learning one. Furthermore, we present and demonstrate how multiple agents using the Distance-and-Frequency technique can share knowledge of the environment and study the effect of that knowledge sharing on the agents’ learning process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Al-Dayaa HS, Megherbi DB (2006) A fast reinforcement learning technique via multiple lookahead levels. In: Proceedings of the international conference on machine learning; applications, models, and technologies, Las Vegas, Nevada, USA, June, 2006

  2. Al-Dayaa HS, Megherbi DB (2006) Fast reinforcement learning techniques using the Euclidean distance and agent state occurrence frequency. In: Proceedings of the international conference on machine learning; applications, models, and technologies, Las Vegas, Nevada, USA, June, 2006

  3. American Association for Artificial Intelligence (2010) [online], Machine Learning. Available: http://www.aaai.org/AITopics/html/machine.html, March 01, 2010 [date accessed]

  4. Daneshfar F, Bevrani H (2010) Load-frequency control: a GA-based multi-agent reinforcement learning. IEEE/IET Gener Trans Distrib 4(1):13–26

    Article  Google Scholar 

  5. De Mantaras R (1991) A distance-based attribute selection measure for decision tree induction. Mach Learn 6:81–92

    Article  Google Scholar 

  6. Jiming L (2001) Autonomous agents and multi-agent systems: explorations in learning, self-organization and adaptive computation. World Scientific, Singapore

    Google Scholar 

  7. Kreyszig E (1993) Advanced engineering mathematics, 7th edn. Wiley, New York

    MATH  Google Scholar 

  8. Liu S, Tian Y (2002) Multi-agent learning methods in an uncertain environment. In: International conference on machine learning and cybernetics, Beijing, 2002

  9. Lozano E, Acuna E (2005) Parallel algorithms for distance-based and density-based outliers. In: Fifth IEEE international conference on data mining, 2005, pp 729–732

  10. Lozano E, Acuna E (2005) Parallel algorithms for distance-based and density-based outliers. In: Fifth IEEE international conference on data mining 2005, pp 729–732

  11. Makar R, Mahadevan S, Ghavamzadeh M (2001) Hierarchical multi-agent reinforcement learning. In: Proceedings of the fifth international conference on autonomous agents, Montreal, Quebec, Canada, 2001, pp 247–253

  12. McArthur SDJ, Davidson EM, Catterson VM, Dimeas AL, Hatziargyriou ND, Ponci F, Funabashi T (2007) Multi-agent systems for power engineering applications—part II: technologies, standards, and tools for building multi-agent systems. IEEE Trans Power Syst 22(4):1743–1752

    Article  Google Scholar 

  13. McArthur SDJ, Davidson EM, Catterson VM, Dimeas AL, Hatziargyriou ND, Ponci F, Funabashi T (2007) Multi-agent systems for power engineering applications—part I: concepts, approaches, and technical challenges. IEEE Trans Power Syst 22(4):1753–1759

    Article  Google Scholar 

  14. Megherbi DB, Al-Dayaa HS (2007) A Lyapunov-stability-based system hardware architecture for a real-time multiple-look-ahead-levels reinforcement learning. In: Proceedings of the 2006 international conference on machine learning; models, technologies & applications, Nevada, USA, 2007

  15. Megherbi DB, Teirelbar A, Boulenouar AJ (2001) A time-varying-environment machine learning technique for autonomous agent shortest path planning. In: Proceedings of the SPIE international conference on defense sensing. Unmanned Ground vehicle Technology, Orlando, Florida, USA, April, 2001, pp 419–428

  16. Mitchell TM (1997) Machine learning. McGraw-Hill, New York

    MATH  Google Scholar 

  17. Murray RM, Li Z, Sastry SS (1994) A mathematical introduction to robotic manipulation. CRC Press LLC, Boca Raton

    MATH  Google Scholar 

  18. Rudek R, Koszalka L, Pozniak-Koszalka I (2005) Introduction to multi-agent modified Q-learning routing for computer networks. In: IEEE advanced industrial conference on telecommunications, 2005

  19. Sutton RS (1990) Integrated architectures for learning, planning, and reaction based on approximating dynamic programming. In: Proceedings of the seventh int conf on machine learning, 1990, pp 216–224

  20. Sutton RS (1991) Dyna an integrated architecture for learning, planning, and reacting. In: Working notes of 1991 AAAI spring symposium, 1991, pp 151–155

  21. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  22. Tabuada P, Pappas GJ, Lima P (2005) Motion feasibility of multi-agent formations. IEEE Trans Robot 21(3):387–392

    Article  Google Scholar 

  23. Vig L, Adams JA (2006) Multi-robot coalition formation. IEEE Trans Robot 22(4):637–649

    Article  Google Scholar 

  24. Watkins C, Dayan P (1992) Q-Learning. Mach Learn 8:279–292

    MATH  Google Scholar 

  25. Weiss G (1999) Multiagent systems: a modern approach to distributed artificial intelligence. MIT Press, Cambridge

    Google Scholar 

  26. Yamamura T, Umano M, Seta K (2006) Reinforcement learning of agent with a staged view in distance and direction for the pursuit problem. In: IEEE international conference on fuzzy systems, Vancouver, BC, Canada, July 2006

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. B. Megherbi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Dayaa, H.S., Megherbi, D.B. Reinforcement learning technique using agent state occurrence frequency with analysis of knowledge sharing on the agent’s learning process in multiagent environments. J Supercomput 59, 526–547 (2012). https://doi.org/10.1007/s11227-010-0451-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-010-0451-x

Keywords

Navigation