Abstract
The fastest Learning Automata (LA) algorithms currently available come from the family of estimator algorithms. The Pursuit algorithm (PST), a pioneering scheme in the estimator family, obtains its superior learning speed by using Maximum Likelihood (ML) estimates to pursue the action currently perceived as being optimal. Recently, a Bayesian LA (BLA) was introduced, and empirical results that demonstrated its advantages over established top performers, including the PST scheme, were reported. The BLA scheme is inherently Bayesian in nature, but it succeeds in avoiding the computational intractability by merely relying on updating the hyper-parameters of sibling conjugate priors, and on random sampling from the resulting posteriors.
In this paper, we integrate the foundational learning principles motivating the design of the BLA, with the principles of the PST. By doing this, we have succeeded in obtaining a completely novel, and rather pioneering, approach to solving LA-like problems, namely, by designing the Bayesian Pursuit algorithm (BPST). As in the BLA, the estimates are truly Bayesian (as opposed to ML) in nature. However, the action selection probability vector of the PST is used for its exploration purposes. Also, unlike the ML estimate, which is usually a single value, the use of a posterior distribution permits us to choose any one of a spectrum of values in the posterior, as the appropriate estimate. Thus, in this paper, we have chosen a 95% percentile value of the posterior (instead of the mean) to pursue the most promising actions. Further, as advocated in [7], the pursuit has been done using both the Linear Reward-Penalty and Reward-Inaction philosophies, leading to the corresponding BPST RP and BPST RI schemes respectively.
It turns out that the BPST is superior to the PST, with the BPST RI being even more robust than the BPST RP . Moreover, by controlling the learning speed of the BPST, the BPST schemes perform either better or comparable to the BLA. We thus believe that the BPST constitutes a new avenue of research, in which the performance benefits of the PST and the BLA are mutually augmented, opening up for improved performance in a number of applications, currently being tested.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Thompson, W.R.: On the Likelihood that One Unknown Probability Exceeds Another in view of the Evidence of Two Samples. Biometrika 25, 285–294 (1933)
Thathachar, M.A.L., Sastry, P.S.: Estimator Algorithms for Learning Automata. In: Proc. of the Platinum Jubilee Conference on Systems and Signal Processing, Bangalore, India (December 1986)
Granmo, O.-C.: Solving Two-Armed Bernoulli Bandit Problems Using a Bayesian Learning Automaton. International Journal of Intelligent Computing and Cybernetics (IJICC) 3(2), 207–234 (2010)
Norheim, T., Brådland, T., Granmo, O.-C., Oommen, B.J.: A Generic Solution to Multi-Armed Bernoulli Bandit Problems Based on Random Sampling from Sibling Conjugate Priors. In: Proc. of International Conference on Agents and Artificial Intelligence, Valencia, Spain, pp. 36–44 (January 2010)
Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice-Hall, Englewood Cliffs (1989)
Thathachar, M.A.L., Sastry, P.S.: Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic Publishers, Dordrecht (2004)
Oommen, B.J., Agache, M.: Continuous and Discretized Pursuit Learning Schemes: Various Algorithms and Their Comparison. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics 31(3), 277–287 (2001)
Jamei, S.M.: Proposing a New Energy Efficient Routing Protocol for Wireless Sensor Networks. International Journal of Computer Science and Network Security 10(2), 241–245 (2010)
Navid, A.H.F.: Scalable and Energy-Aware Learning Automata-Based Routing Protocols for Wireless Sensor Networks. In: Proc. of IEEE International Conference on Sensor Technologies and Applications, Venice/Mestre, Italy, pp. 570–576 (July 2010)
Navid, A.H.F., Javadi, H.H.S.: ICLEAR: Energy aware Routing Protocol for WSN using Irregular Cellular Learning Automata. In: Proc. of IEEE Symposium on Industrial Electronics and Applications, Kuala Lumpur, pp. 463–468 (October 2009)
Song, Y., Zhang, C., Fang, Y.: Stochastic Traffic Engineering in Multihop Cognitive Wireless Mesh Networks. IEEE Transaction on Mobile Computing 9(3), 305–316 (2010)
Gao, F., Zhang, R., Liang, Y.-C., Wang, X.: Design of Learning-Based MIMO Cognitive Radio Systems. IEEE Transaction on Vehicular Technology 59(4), 1707–1720 (2010)
Mamaghani, A.S., Mahi, M., Meybodi, M.R.: A Learning Automaton Based Approach for Data Fragments Allocation in Distributed Database Systems. In: Proc. of IEEE International Conference on Computer and Information Technology, Bradford, pp. 8–12 (September 2010)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, X., Granmo, OC., Oommen, B.J. (2011). The Bayesian Pursuit Algorithm: A New Family of Estimator Learning Automata. In: Mehrotra, K.G., Mohan, C.K., Oh, J.C., Varshney, P.K., Ali, M. (eds) Modern Approaches in Applied Intelligence. IEA/AIE 2011. Lecture Notes in Computer Science(), vol 6704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21827-9_53
Download citation
DOI: https://doi.org/10.1007/978-3-642-21827-9_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21826-2
Online ISBN: 978-3-642-21827-9
eBook Packages: Computer ScienceComputer Science (R0)