Abstract
Autonomous systems need to be able dynamically adapt to changing requirements and environmental conditions without redeployment and without interruption of the systems functionality. The EU project ASCENS has developed a comprehensive suite of foundational theories and methods for building autonomic systems. In this paper we specialise the EDLC process model of ASCENS to deal with planning and reinforcement learning techniques. We present the “AIDL” life cycle and illustrate it with two case studies: simulation-based online planning and the PSyCo reinforcement learning approach for synthesizing agent policies from hard and soft requirements. Related work and potential avenues for future research are discussed.
Dedicated to Manuel Hermenegildo.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
also called internal model or simulation model in the literature.
- 2.
- 3.
- 4.
- 5.
References
ASCENS: Autonomic Component Ensembles. Integrated Project, 01 Oct 2010–31 Mar 2015, Grant agreement no: 257414, EU 7th Framework Programme. http://www.ascens-ist.eu/. Accessed 21 April 2020
Gartner Inc.: Market Guide for AIOps Platforms (2019). https://www.bmc.com/forms/tools-and-strategies-for-effective-aiops.html. Accessed 07 Oct 2020
Google Cloud Solutions: MLOps: Continuous delivery and automation pipelines in machine learning. https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning. Accessed 07 Oct 2020
OpenAI. Spinning Up in Deep RL! Part 2: Kinds of RL Algorithms (2018). https://spinningup.openai.com. Accessed 07 July 2020
Abeywickrama, D., Bicocchi, N., Mamei, M., Zambonelli, F.: The SOTA approach to engineering collective adaptive systems. Int. J. Softw. Tools Technol. Transf. 22(4), 399–415 (2020)
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI, pp. 2669–2678. AAAI Press (2018)
Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. CoRR, abs/1606.06565 (2016)
Beavis, B., Dobbs, I.: Optimisation and Stability Theory for Economic Analysis. Cambridge University Press, Cambridge (1990)
Belzner, L., Hennicker, R., Wirsing, M.: OnPlan: a framework for simulation-based online planning. In: Braga, C., Ölveczky, P.C. (eds.) FACS 2015. LNCS, vol. 9539, pp. 1–30. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28934-2_1
Belzner, L., Hölzl, M.M., Koch, N., Wirsing, M.: Collective autonomic systems: towards engineering principles and their foundations. Trans. Found. Mastering Chang. 1, 180–200 (2016)
Belzner, L., Wirsing, M.: Synthesizing safe policies under probabilistic constraints with reinforcement learning and Bayesian model checking. Sci. Comput. Program. 206, 102620 (2021)
Bernardo, M., De Nicola, R., Hillston, J.: Formal Methods for the Quantitative Evaluation of Collective Adaptive Systems, SFM 2016, vol. 9700, Lecture Notes in Computer Science. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34096-8
Bresciani, P., Perini, A., Giorgini, P., Giunchiglia, F., Mylopoulos, J.: Tropos: an agent-oriented software development methodology. JAAMAS 8(3), 203–236 (2004)
Browne, C., et al.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Brun, Y., et al.: Engineering self-adaptive systems through feedback loops. In: Cheng, B.H.C., de Lemos, R., Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering for Self-Adaptive Systems. LNCS, vol. 5525, pp. 48–70. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02161-9_3
Bureš, T., et al.: A life cycle for the development of autonomic systems: the e-mobility showcase. In: SASO Workshops, pp. 71–76 (2013)
Clavera, I., Rothfuss, J., Schulman, J., Fujita, Y., Asfour, T., Abbeel, P.: Model-based reinforcement learning via meta-policy optimization. In: CoRL 2018, Proceedings of Machine Learning Research, vol, 87, pp. 617–629. PMLR (2018)
Nicola, R. D., Loreti, M., Pugliese, R., Tiezzi, F.: A formal approach to autonomic systems programming: the SCEL language. ACM Trans. Auton. Adapt. 9(2), 7:1–7:29 (2014)
Drugan, M.M.: Reinforcement learning versus evolutionary computation: a survey on hybrid algorithms. Swarm Evol. Comput. 44, 228–246 (2019)
Fernandez-Marquez, J.L., Serugendo, G.D.M., Montagna, S., Viroli, M., Arcos, J.L.: Description and composition of bio-inspired design patterns: a complete overview. Nat. Comput. 12(1), 43–67 (2013)
Gabor, T., et al.: The scenario coevolution paradigm: adaptive quality assurance for adaptive systems. Int. J. Softw. Tools Technol. Transf. 22, 457–476 (2020)
Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Asp. Comput. 6(5), 512–535 (1994)
Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. In: AAMAS, pp. 483–491. International Foundation for Autonomous Agents and Multiagent Systems (2020)
Hoch, N., Bensler, H.-P., Abeywickrama, D., Bureš, T., Montanari, U.: The E-mobility case study. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 513–533. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_17
Horn, P.: Autonomic computing: IBM perspective on the state of information technology. IBM T.J. Watson Labs, NY (2001)
Hölzl, M., Koch, N., Puviani, M., Wirsing, M., Zambonelli, F.: The ensemble development life cycle and best practices for collective autonomic systems. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 325–354. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_9
Hölzl, M., Rauschmayer, A., Wirsing, M.: Engineering of software-intensive systems: state of the art and research challenges. In: Wirsing, M., Banâtre, J.-P., Hölzl, M., Rauschmayer, A. (eds.) Software-Intensive Systems and New Computing Paradigms. LNCS, vol. 5380, pp. 1–44. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89437-7_1
Hölzl, M., Wirsing, M.: Towards a system model for ensembles. In: Agha, G., Danvy, O., Meseguer, J. (eds.) Formal Modeling: Actors, Open Systems, Biological Systems. LNCS, vol. 7000, pp. 241–261. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24933-4_12
IBM: An architectural blueprint for autonomic computing. Technical report, IBM Corporation (2005)
Inverardi, P., Mori, M.: A software lifecycle process to support consistent evolutions. In: de Lemos, R., Giese, H., Müller, H.A., Shaw, M. (eds.) Software Engineering for Self-Adaptive Systems II. LNCS, vol. 7475, pp. 239–264. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35813-5_10
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)
Kernbach, S., Schmickl, T., Timmis, J.: Collective adaptive systems: challenges beyond evolvability. CoRR abs/1108.5643 (2011)
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Krutisch, R., Meier, P., Wirsing, M.: The AgentComponent approach, combining agents, and components. In: Schillo, M., Klusch, M., Müller, J., Tianfield, H. (eds.) MATES 2003. LNCS (LNAI), vol. 2831, pp. 1–12. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39869-1_1
Legay, A., Delahaye, B., Bensalem, S.: Statistical model checking: an overview. In: Barringer, H., et al. (eds.) RV 2010. LNCS, vol. 6418, pp. 122–135. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16612-9_11
Loreti, M., Hillston, J.: Modelling and analysis of collective adaptive systems with CARMA and its tools. In: Bernardo, M., De Nicola, R., Hillston, J. (eds.) SFM 2016. LNCS, vol. 9700, pp. 83–119. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34096-8_4
Mayer, P., et al.: The autonomic cloud. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 495–512. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_16
Moerland, T.M., Broekens, J., Jonker, C.M.: A framework for reinforcement learning and planning. CoRR, abs/2006.15009 (2020)
Moerland, T.M., Broekens, J., Jonker, C.M.: Model-based reinforcement learning: a survey. CoRR, abs/2006.16712 (2020)
Moerland, T.M., Deichler, A., Baldi, S., Broekens, J., Jonker, C.M.: Think too fast nor too slow: The computational trade-off between planning and reinforcement learning. CoRR, abs/2005.07404 (2020)
Nagabandi, A., et al.: Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In: ICLR 2019. OpenReview.net (2019)
Ong, S.C.W., Png, S.W., Hsu, D., Lee, W.S.: Planning under uncertainty for robotic tasks with mixed observability. Int. J. Robot. Res. 29(8), 1053–1068 (2010)
Pinciroli, C., Bonani, M., Mondada, F., Dorigo, M.: Adaptation and awareness in robot ensembles: scenarios and algorithms. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 471–494. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_15
Puviani, M., Cabri, G., Zambonelli, F.: Patterns for self-adaptive systems: agent-based simulations. EAI Endorsed Trans. Self-Adapt. Syst. 1(1), e4 (2015)
Rao, A.S., Georgeff, M.P.: Modeling rational agents within a BDI-architecture. In: Proceedings of the Knowledge Representation and Reasoning, pp. 473–484 (1991)
Ray, A., Achiam, J., Amodei, D.: Benchmarking safe exploration in deep reinforcement learning. Technical report, Open AI (2019)
Roche, J.: Adopting DevOps practices in quality assurance. Commun. ACM 56(11), 38–43 (2013)
Ross, S., Pineau, J., Paquet, S., Chaib-draa, B.: Online planning algorithms for POMDPs. J. Artif. Intell. Res. 32, 663–704 (2008)
Sebastio, S., Vandin, A.: MultiVeStA: statistical model checking for discrete event simulators. In: ValueTools 2013, pp. 310–315. ICST/ACM (2013)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Silver, D.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning - an Introduction. Adaptive Computation and Machine Learning, 2nd edn. MIT Press, Cambridge (2018)
Szepesvári, C.: Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, pp. 1–103. Morgan & Claypool Publishers, California (2010)
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)
Thrun, S., Pratt, L.Y.: Learning to learn: introduction and overview. In: Thrun, S., Pratt, L.Y. (eds.) Learning to Learn, pp. 3–17. Springer, Boston (1998). https://doi.org/10.1007/978-1-4615-5529-2_1
Tschaikowski, M., Tribastone, M.: A unified framework for differential aggregations in Markovian process algebra. J. Log. Alg. Meth. Prog. 84(2), 238–258 (2015)
Vassev, E., Hinchey, M.: Engineering requirements for autonomy features. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 379–403. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_11
Vilalta, R., Giraud-Carrier, C., Brazdil, P., Soares, C.: Inductive transfer. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning and Data Mining, pp. 666–671. Springer, Boston (2017). https://doi.org/10.1007/978-1-4899-7687-1_138
Šerbedžija, N., Fairclough, S.: Biocybernetic loop: from awareness to evolution. In: IEEE Evolutionary Computation 2009, pp. 2063–2069. IEEE (2009)
Wang, T., et al.: Benchmarking model-based reinforcement learning. CoRR, abs/1907.02057 (2019)
Weinstein, A., Littman, M.: Open-loop planning in large-scale stochastic domains. In: AAI 2013. AAAI Press (2013)
Weyns, D., et al.: On patterns for decentralized control in self-adaptive systems. In: de Lemos, R., Giese, H., Müller, H.A., Shaw, M. (eds.) Software Engineering for Self-Adaptive Systems II. LNCS, vol. 7475, pp. 76–107. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35813-5_4
Wirsing, M., Banâtre, J.-P., Hölzl, M., Rauschmayer, A.: Software-Intensive Systems and New Computing Paradigms - Challenges and Visions, vol. 5380. Lecture Notes in Computer Science. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89437-7
M. Wirsing, M. M. Hölzl, N. Koch, and P. Mayer, editors. Software Engineering for Collective Autonomic Systems - The ASCENS Approach, volume 8998 of Lecture Notes in Computer Science. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9
Wirsing, M., Hölzl, M., Tribastone, M., Zambonelli, F.: ASCENS: engineering autonomic service-component ensembles. In: Beckert, B., Damiani, F., de Boer, F.S., Bonsangue, M.M. (eds.) FMCO 2011. LNCS, vol. 7542, pp. 1–24. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35887-6_1
Wooldridge, M.J., Jennings, N.R.: Intelligent agents: theory and practice. Knowl. Eng. Rev. 10(2), 115–152 (1995)
Zambonelli, F., Jennings, N.R., Wooldridge, M.J.: Developing multiagent systems: the Gaia method. ACM Trans. Softw. Eng. Meth. 12(3), 317–370 (2003)
Zuliani, P., Platzer, A., Clarke, E.M.: Bayesian statistical model checking with application to Simulink verification. Formal Meth. Syst. Des. 43(2), 338–367 (2013)
Acknowledgement
We thank the anonymous reviewer for constructive criticisms and helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Markov Decision Processes
A Markov Decision Processes
A Markov Decision Process (MDP) M defines a domain as a set S of states consisting of all states of the environment and the agent, a set of A of agent actions, and a probability distribution \(T : p(S \vert S, A)\) describing the transition probabilities of reaching some successor state when executing an action in a given state. For expressing optimisation goals the labelled transition system is extended by a reward function \(R : S \times A \times S \rightarrow \mathbb {R}\) which gives the expected immediate reward gained by the agent for taking each action in each state. Moreover, an initial state distribution \(\rho : p(S)\) is given.
An episode \(\textbf{e} \in E\) is a finite or infinite sequence of transitions \((s_i, a_i, s_{i + 1}, r_i)\), \(s_i, s_{i + 1} \in S\), \(a_i \in A, r_i = R(s_i, a, s_{i + 1})\) in the MDP. For a given discount parameter \(\gamma \in [0,1]\) and any finite or infinite episode \(\textbf{e}\), the cumulative return \(\mathcal {R}\) is the discounted sum of rewards \(\mathcal {R} = \sum _{i = 1}^{|\textbf{e}|} \gamma ^{i} r_i\). Depending on the application, the agent behaves in an environment according to a memoryless stationary policy \(\pi : S \rightarrow p(A)\) or according to a deterministic memoryless policy \(\pi : S \rightarrow A\) with the goal to maximise the expectation of the cumulative return \(\mathbb {E}(\mathcal {R})\).
A partially observable Markov Decision Process (POMDP) [32] is a Markov decision process together with a set \(\varOmega \) of observations and an observation probability distribution \(O : p(\varOmega \vert S, A)\).
A Constrained Markov Decision Process (CMDP) has an additional cost function \(C : S \times A \times S \rightarrow \mathbb {R}\) which can be used for expressing constraints and safety goals.
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Wirsing, M., Belzner, L. (2023). Towards Systematically Engineering Autonomous Systems Using Reinforcement Learning and Planning. In: Lopez-Garcia, P., Gallagher, J.P., Giacobazzi, R. (eds) Analysis, Verification and Transformation for Declarative Programming and Intelligent Systems. Lecture Notes in Computer Science, vol 13160. Springer, Cham. https://doi.org/10.1007/978-3-031-31476-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-31476-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31475-9
Online ISBN: 978-3-031-31476-6
eBook Packages: Computer ScienceComputer Science (R0)