Abstract
Safe and optimal controller synthesis for switched-controlled hybrid systems, which combine differential equations and discrete changes of the system’s state, is known to be intricately hard. Reinforcement learning has been leveraged to construct near-optimal controllers, but their behavior is not guaranteed to be safe, even when it is encouraged by reward engineering. One way of imposing safety to a learned controller is to use a shield, which is correct by design. However, obtaining a shield for non-linear and hybrid environments is itself intractable. In this paper, we propose the construction of a shield using the so-called barbaric method, where an approximate finite representation of an underlying partition-based two-player safety game is extracted via systematically picked samples of the true transition function. While hard safety guarantees are out of reach, we experimentally demonstrate strong statistical safety guarantees with a prototype implementation and Uppaal Stratego. Furthermore, we study the impact of the synthesized shield when applied as either a pre-shield (applied before learning a controller) or a post-shield (only applied after learning a controller). We experimentally demonstrate superiority of the pre-shielding approach. We apply our technique on a range of case studies, including two industrial examples, and further study post-optimization of the post-shielding approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
For SHS with an upper bound on the number of discrete jumps up to a given time bound T, the equation is well-defined.
- 2.
We assume that at most one bounce can take place within the period P.
References
Reproducibility package - shielded reinforcement learning for hybrid systems. https://github.com/AsgerHB/Shielded-Learning-for-Hybrid-Systems
Abate, A., Amin, S., Prandini, M., Lygeros, J., Sastry, S.: Computational approaches to reachability analysis of stochastic hybrid systems. In: Bemporad, A., Bicchi, A., Buttazzo, G. (eds.) HSCC 2007. LNCS, vol. 4416, pp. 4–17. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71493-4_4
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI, pp. 2669–2678. AAAI Press (2018). https://doi.org/10.1609/aaai.v32i1.11797
Ashok, P., Křetínský, J., Larsen, K.G., Le Coënt, A., Taankvist, J.H., Weininger, M.: SOS: safe, optimal and small strategies for hybrid markov decision processes. In: Parker, D., Wolf, V. (eds.) QEST 2019. LNCS, vol. 11785, pp. 147–164. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30281-8_9
Badings, T.S., et al.: Robust control for dynamical systems with non-Gaussian noise via formal abstractions. J. Artif. Intell. Res. 76, 341–391 (2023). https://doi.org/10.1613/jair.1.14253
Bastani, O., Li, S.: Safe reinforcement learning via statistical model predictive shielding. In: Robotics (2021). https://doi.org/10.15607/RSS.2021.XVII.026
Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: NeurIPS, pp. 908–918 (2017). https://proceedings.neurips.cc/paper/2017/hash/766ebcd59621e305170616ba3d3dac32-Abstract.html
Bernet, J., Janin, D., Walukiewicz, I.: Permissive strategies: from parity games to safety games. RAIRO Theor. Informatics Appl. 36(3), 261–275 (2002). https://doi.org/10.1051/ita:2002013
Bloem, R., Könighofer, B., Könighofer, R., Wang, C.: Shield Synthesis: In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 533–548. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_51
Bogomolov, S., Forets, M., Frehse, G., Potomkin, K., Schilling, C.: JuliaReach: a toolbox for set-based reachability. In: HSCC, pp. 39–44. ACM (2019). https://doi.org/10.1145/3302504.3311804
Bujorianu, L.M.: Stochastic reachability analysis of hybrid systems. Springer Science & Business Media (2012)
Busoniu, L., de Bruin, T., Tolic, D., Kober, J., Palunko, I.: Reinforcement learning for control: Performance, stability, and deep approximators. Annu. Rev. Control. 46, 8–28 (2018). https://doi.org/10.1016/j.arcontrol.2018.09.005
Carr, S., Jansen, N., Junges, S., Topcu, U.: Safe reinforcement learning via shielding under partial observability. In: AAAI, pp. 14748–14756. AAAI Press (2023). https://doi.org/10.1609/aaai.v37i12.26723
Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: AAAI, pp. 3387–3395. AAAI Press (2019). https://doi.org/10.1609/aaai.v33i01.33013387
Chow, Y., Nachum, O., Duéñez-Guzmán, E.A., Ghavamzadeh, M.: A Lyapunov-based approach to safe reinforcement learning. In: NeurIPS, pp. 8103–8112 (2018), https://proceedings.neurips.cc/paper/2018/hash/4fe5149039b52765bde64beb9f674940-Abstract.html
Davenport, J.H., Heintz, J.: Real quantifier elimination is doubly exponential. J. Symb. Comput. 5(1), 29–35 (1988). https://doi.org/10.1016/S0747-7171(88)80004-X
David, A., et al.: Statistical model checking for stochastic hybrid systems. In: HSBm EPTCS, vol. 92, pp. 122–136 (2012). https://doi.org/10.4204/EPTCS.92.9
David, A., Jensen, P.G., Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Uppaal Stratego. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 206–211. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_16
Donzé, A.: Breach, a toolbox for verification and parameter synthesis of hybrid systems. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 167–170. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6_17
Doyen, L., Frehse, G., Pappas, G.J., Platzer, A.: Verification of hybrid systems. In: Handbook of Model Checking, pp. 1047–1110. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-10575-8_30
Doyle, J.C., Francis, B.A., Tannenbaum, A.R.: Feedback control theory. Courier Corporation (2013)
Forets, M., Freire, D., Schilling, C.: Efficient reachability analysis of parametric linear hybrid systems with time-triggered transitions. In: MEMOCODE, pp. 1–6. IEEE (2020). https://doi.org/10.1109/MEMOCODE51338.2020.9314994
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015). https://doi.org/10.5555/2789272.2886795
Le Guernic, C., Girard, A.: Reachability analysis of hybrid systems using support functions. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 540–554. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02658-4_40
Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. In: AAMAS, pp. 483–491 (2020). https://doi.org/10.5555/3398761.3398821
Henzinger, T.A., Kopke, P.W., Puri, A., Varaiya, P.: What’s decidable about hybrid automata? J. Comput. Syst. Sci. 57(1), 94–124 (1998). https://doi.org/10.1006/jcss.1998.1581
Jaeger, M., Bacci, G., Bacci, G., Larsen, K.G., Jensen, P.G.: Approximating euclidean by imprecise markov decision processes. In: Margaria, T., Steffen, B. (eds.) ISoLA 2020. LNCS, vol. 12476, pp. 275–289. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61362-4_15
Jaeger, M., Jensen, P.G., Guldstrand Larsen, K., Legay, A., Sedwards, S., Taankvist, J.H.: Teaching stratego to play ball: optimal synthesis for continuous space MDPs. In: Chen, Y.-F., Cheng, C.-H., Esparza, J. (eds.) ATVA 2019. LNCS, vol. 11781, pp. 81–97. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31784-3_5
Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe reinforcement learning using probabilistic shields. In: CONCUR, LIPIcs, vol. 171, pp. 3:1–3:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020). https://doi.org/10.4230/LIPIcs.CONCUR.2020.3
Kapinski, J., Krogh, B.H., Maler, O., Stursberg, O.: On systematic simulation of open continuous systems. In: Maler, O., Pnueli, A. (eds.) HSCC 2003. LNCS, vol. 2623, pp. 283–297. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36580-X_22
Karamanakos, P., Geyer, T., Manias, S.: Direct voltage control of DC-DC boost converters using enumeration-based model predictive control. IEEE Trans. Power Electron. 29(2), 968–978 (2013)
Klischat, M., Althoff, M.: A multi-step approach to accelerate the computation of reachable sets for road vehicles. In: ITSC, pp. 1–7. IEEE (2020). https://doi.org/10.1109/ITSC45102.2020.9294328
Laczkovich, M.: The removal of \(\pi \) from some undecidable problems involving elementary functions. Proc. Am. Math. Soc. 131(7), 2235–2240 (2003). https://doi.org/10.1090/S0002-9939-02-06753-9
Larsen, K.G.: Statistical model checking, refinement checking, optimization, for stochastic hybrid systems. In: Jurdziński, M., Ničković, D. (eds.) FORMATS 2012. LNCS, vol. 7595, pp. 7–10. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33365-1_2
Larsen, K.G., Le Coënt, A., Mikučionis, M., Taankvist, J.H.: Guaranteed control synthesis for continuous systems in Uppaal Tiga. In: Chamberlain, R., Taha, W., Törngren, M. (eds.) CyPhy/WESE -2018. LNCS, vol. 11615, pp. 113–133. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23703-5_6
Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Safe and optimal adaptive cruise control. In: Meyer, R., Platzer, A., Wehrheim, H. (eds.) Correct System Design. LNCS, vol. 9360, pp. 260–277. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23506-6_17
Lewis, F.L., Vrabie, D., Syrmos, V.L.: Optimal control. John Wiley & Sons (2012)
Luo, Y., Ma, T.: Learning barrier certificates: towards safe reinforcement learning with zero training-time violations. In: NeurIPS, pp. 25621–25632 (2021). https://proceedings.neurips.cc/paper/2021/hash/d71fa38b648d86602d14ac610f2e6194-Abstract.html
Maderbacher, B., Schupp, S., Bartocci, E., Bloem, R., Nickovic, D., Könighofer, B.: Provable correct and adaptive simplex architecture for bounded-liveness properties. In: SPIN. LNCS, vol. 13872, pp. 141–160. Springer (2023). https://doi.org/10.1007/978-3-031-32157-3_8
Majumdar, R., Ozay, N., Schmuck, A.: On abstraction-based controller design with output feedback. In: HSCC, pp. 15:1–15:11. ACM (2020). https://doi.org/10.1145/3365365.3382219
Noaee, M., et al.: Reinforcement learning in urban network traffic signal control: a systematic literature review. Expert Syst. Appl. 199, 116830 (2022). https://doi.org/10.1016/j.eswa.2022.116830
Shmarov, F., Zuliani, P.: Probreach: a tool for guaranteed reachability analysis of stochastic hybrid systems. In: SNR. EPiC Series in Computing, vol. 37, pp. 40–48. EasyChair (2015). https://doi.org/10.29007/mh2c
Tarski, A.: A decision method for elementary algebra and geometry. The RAND Corporation (1948). https://www.rand.org/pubs/reports/R109.html
Tarski, A.: A lattice-theoretical fixpoint theorem and its applications. Pacific J. Math. 5(2), 285–309 (1955). https://www.projecteuclid.org/journalArticle/Download?urlId=pjm%2F1103044538
Vlachogiannis, J.G., Hatziargyriou, N.D.: Reinforcement learning for reactive power control. IEEE Trans. Power Syst. 19(3), 1317–1325 (2004). https://doi.org/10.1109/TPWRS.2004.831259
Žikelić, D., Lechner, M., Henzinger, T.A., Chatterjee, K.: Learning control policies for stochastic systems with reach-avoid guarantees. In: AAAI, pp. 11926–11935. AAAI Press (2023). https://doi.org/10.1609/aaai.v37i10.26407
Wabersich, K.P., Zeilinger, M.N.: A predictive safety filter for learning-based control of constrained nonlinear dynamical systems. Autom. 129, 109597 (2021). https://doi.org/10.1016/j.automatica.2021.109597
Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis, University of Cambridge (1989)
Zhao, H., Zhan, N., Kapur, D., Larsen, K.G.: A “Hybrid’’ approach for synthesizing optimal controllers of hybrid systems: a case study of the oil pump industrial example. In: Giannakopoulou, D., Méry, D. (eds.) FM 2012. LNCS, vol. 7436, pp. 471–485. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32759-9_38
Acknowledgments
This research was partly supported by DIREC - Digital Research Centre Denmark and the Villum Investigator Grant S4OS - Scalable analysis and Synthesis of Safe, Secure and Optimal Strategies for Cyber-Physical Systems.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Brorholt, A.H., Jensen, P.G., Larsen, K.G., Lorber, F., Schilling, C. (2024). Shielded Reinforcement Learning for Hybrid Systems. In: Steffen, B. (eds) Bridging the Gap Between AI and Reality. AISoLA 2023. Lecture Notes in Computer Science, vol 14380. Springer, Cham. https://doi.org/10.1007/978-3-031-46002-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-46002-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46001-2
Online ISBN: 978-3-031-46002-9
eBook Packages: Computer ScienceComputer Science (R0)