Towards Systematically Engineering Autonomous Systems Using Reinforcement Learning and Planning

Wirsing, Martin; Belzner, Lenz

doi:10.1007/978-3-031-31476-6_16

Martin Wirsing¹⁰ &
Lenz Belzner¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13160))

302 Accesses
2 Citations

Abstract

Autonomous systems need to be able dynamically adapt to changing requirements and environmental conditions without redeployment and without interruption of the systems functionality. The EU project ASCENS has developed a comprehensive suite of foundational theories and methods for building autonomic systems. In this paper we specialise the EDLC process model of ASCENS to deal with planning and reinforcement learning techniques. We present the “AIDL” life cycle and illustrate it with two case studies: simulation-based online planning and the PSyCo reinforcement learning approach for synthesizing agent policies from hard and soft requirements. Related work and potential avenues for future research are discussed.

Dedicated to Manuel Hermenegildo.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Online Reinforcement Learning for Self-adaptive Information Systems

Integrating Learning and Planning

Reinforcement Learning and Adaptive Control

Notes

1.
also called internal model or simulation model in the literature.
2.
https://de.mathworks.com/products/reinforcement-learning.html.
3.
https://gym.openai.com/.
4.
https://pytorch.org/.
5.
https://www.tensorflow.org/.

References

ASCENS: Autonomic Component Ensembles. Integrated Project, 01 Oct 2010–31 Mar 2015, Grant agreement no: 257414, EU 7th Framework Programme. http://www.ascens-ist.eu/. Accessed 21 April 2020
Gartner Inc.: Market Guide for AIOps Platforms (2019). https://www.bmc.com/forms/tools-and-strategies-for-effective-aiops.html. Accessed 07 Oct 2020
Google Cloud Solutions: MLOps: Continuous delivery and automation pipelines in machine learning. https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning. Accessed 07 Oct 2020
OpenAI. Spinning Up in Deep RL! Part 2: Kinds of RL Algorithms (2018). https://spinningup.openai.com. Accessed 07 July 2020
Abeywickrama, D., Bicocchi, N., Mamei, M., Zambonelli, F.: The SOTA approach to engineering collective adaptive systems. Int. J. Softw. Tools Technol. Transf. 22(4), 399–415 (2020)
Article Google Scholar
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI, pp. 2669–2678. AAAI Press (2018)
Google Scholar
Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)
Google Scholar
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. CoRR, abs/1606.06565 (2016)
Google Scholar
Beavis, B., Dobbs, I.: Optimisation and Stability Theory for Economic Analysis. Cambridge University Press, Cambridge (1990)
Google Scholar
Belzner, L., Hennicker, R., Wirsing, M.: OnPlan: a framework for simulation-based online planning. In: Braga, C., Ölveczky, P.C. (eds.) FACS 2015. LNCS, vol. 9539, pp. 1–30. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28934-2_1
Chapter Google Scholar
Belzner, L., Hölzl, M.M., Koch, N., Wirsing, M.: Collective autonomic systems: towards engineering principles and their foundations. Trans. Found. Mastering Chang. 1, 180–200 (2016)
Article Google Scholar
Belzner, L., Wirsing, M.: Synthesizing safe policies under probabilistic constraints with reinforcement learning and Bayesian model checking. Sci. Comput. Program. 206, 102620 (2021)
Article Google Scholar
Bernardo, M., De Nicola, R., Hillston, J.: Formal Methods for the Quantitative Evaluation of Collective Adaptive Systems, SFM 2016, vol. 9700, Lecture Notes in Computer Science. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34096-8
Bresciani, P., Perini, A., Giorgini, P., Giunchiglia, F., Mylopoulos, J.: Tropos: an agent-oriented software development methodology. JAAMAS 8(3), 203–236 (2004)
MATH Google Scholar
Browne, C., et al.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Article Google Scholar
Brun, Y., et al.: Engineering self-adaptive systems through feedback loops. In: Cheng, B.H.C., de Lemos, R., Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering for Self-Adaptive Systems. LNCS, vol. 5525, pp. 48–70. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02161-9_3
Chapter Google Scholar
Bureš, T., et al.: A life cycle for the development of autonomic systems: the e-mobility showcase. In: SASO Workshops, pp. 71–76 (2013)
Google Scholar
Clavera, I., Rothfuss, J., Schulman, J., Fujita, Y., Asfour, T., Abbeel, P.: Model-based reinforcement learning via meta-policy optimization. In: CoRL 2018, Proceedings of Machine Learning Research, vol, 87, pp. 617–629. PMLR (2018)
Google Scholar
Nicola, R. D., Loreti, M., Pugliese, R., Tiezzi, F.: A formal approach to autonomic systems programming: the SCEL language. ACM Trans. Auton. Adapt. 9(2), 7:1–7:29 (2014)
Google Scholar
Drugan, M.M.: Reinforcement learning versus evolutionary computation: a survey on hybrid algorithms. Swarm Evol. Comput. 44, 228–246 (2019)
Article Google Scholar
Fernandez-Marquez, J.L., Serugendo, G.D.M., Montagna, S., Viroli, M., Arcos, J.L.: Description and composition of bio-inspired design patterns: a complete overview. Nat. Comput. 12(1), 43–67 (2013)
Article MathSciNet Google Scholar
Gabor, T., et al.: The scenario coevolution paradigm: adaptive quality assurance for adaptive systems. Int. J. Softw. Tools Technol. Transf. 22, 457–476 (2020)
Article Google Scholar
Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Asp. Comput. 6(5), 512–535 (1994)
Article MATH Google Scholar
Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. In: AAMAS, pp. 483–491. International Foundation for Autonomous Agents and Multiagent Systems (2020)
Google Scholar
Hoch, N., Bensler, H.-P., Abeywickrama, D., Bureš, T., Montanari, U.: The E-mobility case study. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 513–533. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_17
Chapter Google Scholar
Horn, P.: Autonomic computing: IBM perspective on the state of information technology. IBM T.J. Watson Labs, NY (2001)
Google Scholar
Hölzl, M., Koch, N., Puviani, M., Wirsing, M., Zambonelli, F.: The ensemble development life cycle and best practices for collective autonomic systems. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 325–354. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_9
Chapter Google Scholar
Hölzl, M., Rauschmayer, A., Wirsing, M.: Engineering of software-intensive systems: state of the art and research challenges. In: Wirsing, M., Banâtre, J.-P., Hölzl, M., Rauschmayer, A. (eds.) Software-Intensive Systems and New Computing Paradigms. LNCS, vol. 5380, pp. 1–44. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89437-7_1
Chapter MATH Google Scholar
Hölzl, M., Wirsing, M.: Towards a system model for ensembles. In: Agha, G., Danvy, O., Meseguer, J. (eds.) Formal Modeling: Actors, Open Systems, Biological Systems. LNCS, vol. 7000, pp. 241–261. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24933-4_12
Chapter Google Scholar
IBM: An architectural blueprint for autonomic computing. Technical report, IBM Corporation (2005)
Google Scholar
Inverardi, P., Mori, M.: A software lifecycle process to support consistent evolutions. In: de Lemos, R., Giese, H., Müller, H.A., Shaw, M. (eds.) Software Engineering for Self-Adaptive Systems II. LNCS, vol. 7475, pp. 239–264. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35813-5_10
Chapter Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)
Article MathSciNet MATH Google Scholar
Kernbach, S., Schmickl, T., Timmis, J.: Collective adaptive systems: challenges beyond evolvability. CoRR abs/1108.5643 (2011)
Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Krutisch, R., Meier, P., Wirsing, M.: The AgentComponent approach, combining agents, and components. In: Schillo, M., Klusch, M., Müller, J., Tianfield, H. (eds.) MATES 2003. LNCS (LNAI), vol. 2831, pp. 1–12. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39869-1_1
Chapter Google Scholar
Legay, A., Delahaye, B., Bensalem, S.: Statistical model checking: an overview. In: Barringer, H., et al. (eds.) RV 2010. LNCS, vol. 6418, pp. 122–135. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16612-9_11
Chapter Google Scholar
Loreti, M., Hillston, J.: Modelling and analysis of collective adaptive systems with CARMA and its tools. In: Bernardo, M., De Nicola, R., Hillston, J. (eds.) SFM 2016. LNCS, vol. 9700, pp. 83–119. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34096-8_4
Chapter Google Scholar
Mayer, P., et al.: The autonomic cloud. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 495–512. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_16
Chapter Google Scholar
Moerland, T.M., Broekens, J., Jonker, C.M.: A framework for reinforcement learning and planning. CoRR, abs/2006.15009 (2020)
Google Scholar
Moerland, T.M., Broekens, J., Jonker, C.M.: Model-based reinforcement learning: a survey. CoRR, abs/2006.16712 (2020)
Google Scholar
Moerland, T.M., Deichler, A., Baldi, S., Broekens, J., Jonker, C.M.: Think too fast nor too slow: The computational trade-off between planning and reinforcement learning. CoRR, abs/2005.07404 (2020)
Google Scholar
Nagabandi, A., et al.: Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In: ICLR 2019. OpenReview.net (2019)
Google Scholar
Ong, S.C.W., Png, S.W., Hsu, D., Lee, W.S.: Planning under uncertainty for robotic tasks with mixed observability. Int. J. Robot. Res. 29(8), 1053–1068 (2010)
Article Google Scholar
Pinciroli, C., Bonani, M., Mondada, F., Dorigo, M.: Adaptation and awareness in robot ensembles: scenarios and algorithms. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 471–494. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_15
Chapter Google Scholar
Puviani, M., Cabri, G., Zambonelli, F.: Patterns for self-adaptive systems: agent-based simulations. EAI Endorsed Trans. Self-Adapt. Syst. 1(1), e4 (2015)
Article Google Scholar
Rao, A.S., Georgeff, M.P.: Modeling rational agents within a BDI-architecture. In: Proceedings of the Knowledge Representation and Reasoning, pp. 473–484 (1991)
Google Scholar
Ray, A., Achiam, J., Amodei, D.: Benchmarking safe exploration in deep reinforcement learning. Technical report, Open AI (2019)
Google Scholar
Roche, J.: Adopting DevOps practices in quality assurance. Commun. ACM 56(11), 38–43 (2013)
Article Google Scholar
Ross, S., Pineau, J., Paquet, S., Chaib-draa, B.: Online planning algorithms for POMDPs. J. Artif. Intell. Res. 32, 663–704 (2008)
Article MathSciNet MATH Google Scholar
Sebastio, S., Vandin, A.: MultiVeStA: statistical model checking for discrete event simulators. In: ValueTools 2013, pp. 310–315. ICST/ACM (2013)
Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Silver, D.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning - an Introduction. Adaptive Computation and Machine Learning, 2nd edn. MIT Press, Cambridge (2018)
Google Scholar
Szepesvári, C.: Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, pp. 1–103. Morgan & Claypool Publishers, California (2010)
Google Scholar
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)
MathSciNet MATH Google Scholar
Thrun, S., Pratt, L.Y.: Learning to learn: introduction and overview. In: Thrun, S., Pratt, L.Y. (eds.) Learning to Learn, pp. 3–17. Springer, Boston (1998). https://doi.org/10.1007/978-1-4615-5529-2_1
Chapter MATH Google Scholar
Tschaikowski, M., Tribastone, M.: A unified framework for differential aggregations in Markovian process algebra. J. Log. Alg. Meth. Prog. 84(2), 238–258 (2015)
MathSciNet MATH Google Scholar
Vassev, E., Hinchey, M.: Engineering requirements for autonomy features. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 379–403. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_11
Chapter Google Scholar
Vilalta, R., Giraud-Carrier, C., Brazdil, P., Soares, C.: Inductive transfer. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning and Data Mining, pp. 666–671. Springer, Boston (2017). https://doi.org/10.1007/978-1-4899-7687-1_138
Šerbedžija, N., Fairclough, S.: Biocybernetic loop: from awareness to evolution. In: IEEE Evolutionary Computation 2009, pp. 2063–2069. IEEE (2009)
Google Scholar
Wang, T., et al.: Benchmarking model-based reinforcement learning. CoRR, abs/1907.02057 (2019)
Google Scholar
Weinstein, A., Littman, M.: Open-loop planning in large-scale stochastic domains. In: AAI 2013. AAAI Press (2013)
Google Scholar
Weyns, D., et al.: On patterns for decentralized control in self-adaptive systems. In: de Lemos, R., Giese, H., Müller, H.A., Shaw, M. (eds.) Software Engineering for Self-Adaptive Systems II. LNCS, vol. 7475, pp. 76–107. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35813-5_4
Chapter Google Scholar
Wirsing, M., Banâtre, J.-P., Hölzl, M., Rauschmayer, A.: Software-Intensive Systems and New Computing Paradigms - Challenges and Visions, vol. 5380. Lecture Notes in Computer Science. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89437-7
M. Wirsing, M. M. Hölzl, N. Koch, and P. Mayer, editors. Software Engineering for Collective Autonomic Systems - The ASCENS Approach, volume 8998 of Lecture Notes in Computer Science. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9
Wirsing, M., Hölzl, M., Tribastone, M., Zambonelli, F.: ASCENS: engineering autonomic service-component ensembles. In: Beckert, B., Damiani, F., de Boer, F.S., Bonsangue, M.M. (eds.) FMCO 2011. LNCS, vol. 7542, pp. 1–24. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35887-6_1
Chapter Google Scholar
Wooldridge, M.J., Jennings, N.R.: Intelligent agents: theory and practice. Knowl. Eng. Rev. 10(2), 115–152 (1995)
Article Google Scholar
Zambonelli, F., Jennings, N.R., Wooldridge, M.J.: Developing multiagent systems: the Gaia method. ACM Trans. Softw. Eng. Meth. 12(3), 317–370 (2003)
Article Google Scholar
Zuliani, P., Platzer, A., Clarke, E.M.: Bayesian statistical model checking with application to Simulink verification. Formal Meth. Syst. Des. 43(2), 338–367 (2013)
Article MATH Google Scholar

Download references

Acknowledgement

We thank the anonymous reviewer for constructive criticisms and helpful suggestions.

Author information

Authors and Affiliations

Ludwig-Maximilians-Universität München, Munich, Germany
Martin Wirsing
Technische Hochschule Ingolstadt, Ingolstadt, Germany
Lenz Belzner

Authors

Martin Wirsing
View author publications
You can also search for this author in PubMed Google Scholar
Lenz Belzner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Wirsing .

Editor information

Editors and Affiliations

IMDEA Software Institute, Pozuelo de Alarcón, Madrid, Spain
Pedro Lopez-Garcia
Roskilde University, Roskilde, Denmark
John P. Gallagher
Università di Verona, Verona, Italy
Roberto Giacobazzi

A Markov Decision Processes

A Markov Decision Process (MDP) M defines a domain as a set S of states consisting of all states of the environment and the agent, a set of A of agent actions, and a probability distribution $T : p(S \vert S, A)$ describing the transition probabilities of reaching some successor state when executing an action in a given state. For expressing optimisation goals the labelled transition system is extended by a reward function $R : S \times A \times S \rightarrow \mathbb {R}$ which gives the expected immediate reward gained by the agent for taking each action in each state. Moreover, an initial state distribution $\rho : p(S)$ is given.

An episode $\textbf{e} \in E$ is a finite or infinite sequence of transitions $(s_i, a_i, s_{i + 1}, r_i)$, $s_i, s_{i + 1} \in S$, $a_i \in A, r_i = R(s_i, a, s_{i + 1})$ in the MDP. For a given discount parameter $\gamma \in [0,1]$ and any finite or infinite episode $\textbf{e}$, the cumulative return $\mathcal {R}$ is the discounted sum of rewards $\mathcal {R} = \sum _{i = 1}^{|\textbf{e}|} \gamma ^{i} r_i$. Depending on the application, the agent behaves in an environment according to a memoryless stationary policy $\pi : S \rightarrow p(A)$ or according to a deterministic memoryless policy $\pi : S \rightarrow A$ with the goal to maximise the expectation of the cumulative return $\mathbb {E}(\mathcal {R})$.

A partially observable Markov Decision Process (POMDP) [32] is a Markov decision process together with a set $\varOmega $ of observations and an observation probability distribution $O : p(\varOmega \vert S, A)$.

A Constrained Markov Decision Process (CMDP) has an additional cost function $C : S \times A \times S \rightarrow \mathbb {R}$ which can be used for expressing constraints and safety goals.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wirsing, M., Belzner, L. (2023). Towards Systematically Engineering Autonomous Systems Using Reinforcement Learning and Planning. In: Lopez-Garcia, P., Gallagher, J.P., Giacobazzi, R. (eds) Analysis, Verification and Transformation for Declarative Programming and Intelligent Systems. Lecture Notes in Computer Science, vol 13160. Springer, Cham. https://doi.org/10.1007/978-3-031-31476-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-31476-6_16
Published: 17 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31475-9
Online ISBN: 978-3-031-31476-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Systematically Engineering Autonomous Systems Using Reinforcement Learning and Planning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Online Reinforcement Learning for Self-adaptive Information Systems

Integrating Learning and Planning

Reinforcement Learning and Adaptive Control

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Markov Decision Processes

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards Systematically Engineering Autonomous Systems Using Reinforcement Learning and Planning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Online Reinforcement Learning for Self-adaptive Information Systems

Integrating Learning and Planning

Reinforcement Learning and Adaptive Control

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Markov Decision Processes

A Markov Decision Processes

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation