Shielded Reinforcement Learning for Hybrid Systems

Brorholt, Asger Horn; Jensen, Peter Gjøl; Larsen, Kim Guldstrand; Lorber, Florian; Schilling, Christian

doi:10.1007/978-3-031-46002-9_3

Asger Horn Brorholt⁸,
Peter Gjøl Jensen⁸,
Kim Guldstrand Larsen⁸,
Florian Lorber⁸ &
…
Christian Schilling⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14380))

Included in the following conference series:

International Conference on Bridging the Gap between AI and Reality

1097 Accesses
1 Citations

Abstract

Safe and optimal controller synthesis for switched-controlled hybrid systems, which combine differential equations and discrete changes of the system’s state, is known to be intricately hard. Reinforcement learning has been leveraged to construct near-optimal controllers, but their behavior is not guaranteed to be safe, even when it is encouraged by reward engineering. One way of imposing safety to a learned controller is to use a shield, which is correct by design. However, obtaining a shield for non-linear and hybrid environments is itself intractable. In this paper, we propose the construction of a shield using the so-called barbaric method, where an approximate finite representation of an underlying partition-based two-player safety game is extracted via systematically picked samples of the true transition function. While hard safety guarantees are out of reach, we experimentally demonstrate strong statistical safety guarantees with a prototype implementation and Uppaal Stratego. Furthermore, we study the impact of the synthesized shield when applied as either a pre-shield (applied before learning a controller) or a post-shield (only applied after learning a controller). We experimentally demonstrate superiority of the pre-shielding approach. We apply our technique on a range of case studies, including two industrial examples, and further study post-optimization of the post-shielding approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reward Shaping from Hybrid Systems Models in Reinforcement Learning

Hybrid Controller Synthesis for Nonlinear Systems Subject to Reach-Avoid Constraints

Verifiably Safe Off-Model Reinforcement Learning

Notes

1.
For SHS with an upper bound on the number of discrete jumps up to a given time bound T, the equation is well-defined.
2.
We assume that at most one bounce can take place within the period P.

References

Reproducibility package - shielded reinforcement learning for hybrid systems. https://github.com/AsgerHB/Shielded-Learning-for-Hybrid-Systems
Abate, A., Amin, S., Prandini, M., Lygeros, J., Sastry, S.: Computational approaches to reachability analysis of stochastic hybrid systems. In: Bemporad, A., Bicchi, A., Buttazzo, G. (eds.) HSCC 2007. LNCS, vol. 4416, pp. 4–17. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71493-4_4
Chapter Google Scholar
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI, pp. 2669–2678. AAAI Press (2018). https://doi.org/10.1609/aaai.v32i1.11797
Ashok, P., Křetínský, J., Larsen, K.G., Le Coënt, A., Taankvist, J.H., Weininger, M.: SOS: safe, optimal and small strategies for hybrid markov decision processes. In: Parker, D., Wolf, V. (eds.) QEST 2019. LNCS, vol. 11785, pp. 147–164. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30281-8_9
Chapter Google Scholar
Badings, T.S., et al.: Robust control for dynamical systems with non-Gaussian noise via formal abstractions. J. Artif. Intell. Res. 76, 341–391 (2023). https://doi.org/10.1613/jair.1.14253
Article MathSciNet Google Scholar
Bastani, O., Li, S.: Safe reinforcement learning via statistical model predictive shielding. In: Robotics (2021). https://doi.org/10.15607/RSS.2021.XVII.026
Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: NeurIPS, pp. 908–918 (2017). https://proceedings.neurips.cc/paper/2017/hash/766ebcd59621e305170616ba3d3dac32-Abstract.html
Bernet, J., Janin, D., Walukiewicz, I.: Permissive strategies: from parity games to safety games. RAIRO Theor. Informatics Appl. 36(3), 261–275 (2002). https://doi.org/10.1051/ita:2002013
Article MathSciNet Google Scholar
Bloem, R., Könighofer, B., Könighofer, R., Wang, C.: Shield Synthesis: In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 533–548. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_51
Chapter Google Scholar
Bogomolov, S., Forets, M., Frehse, G., Potomkin, K., Schilling, C.: JuliaReach: a toolbox for set-based reachability. In: HSCC, pp. 39–44. ACM (2019). https://doi.org/10.1145/3302504.3311804
Bujorianu, L.M.: Stochastic reachability analysis of hybrid systems. Springer Science & Business Media (2012)
Google Scholar
Busoniu, L., de Bruin, T., Tolic, D., Kober, J., Palunko, I.: Reinforcement learning for control: Performance, stability, and deep approximators. Annu. Rev. Control. 46, 8–28 (2018). https://doi.org/10.1016/j.arcontrol.2018.09.005
Article MathSciNet Google Scholar
Carr, S., Jansen, N., Junges, S., Topcu, U.: Safe reinforcement learning via shielding under partial observability. In: AAAI, pp. 14748–14756. AAAI Press (2023). https://doi.org/10.1609/aaai.v37i12.26723
Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: AAAI, pp. 3387–3395. AAAI Press (2019). https://doi.org/10.1609/aaai.v33i01.33013387
Chow, Y., Nachum, O., Duéñez-Guzmán, E.A., Ghavamzadeh, M.: A Lyapunov-based approach to safe reinforcement learning. In: NeurIPS, pp. 8103–8112 (2018), https://proceedings.neurips.cc/paper/2018/hash/4fe5149039b52765bde64beb9f674940-Abstract.html
Davenport, J.H., Heintz, J.: Real quantifier elimination is doubly exponential. J. Symb. Comput. 5(1), 29–35 (1988). https://doi.org/10.1016/S0747-7171(88)80004-X
Article MathSciNet Google Scholar
David, A., et al.: Statistical model checking for stochastic hybrid systems. In: HSBm EPTCS, vol. 92, pp. 122–136 (2012). https://doi.org/10.4204/EPTCS.92.9
David, A., Jensen, P.G., Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Uppaal Stratego. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 206–211. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_16
Chapter Google Scholar
Donzé, A.: Breach, a toolbox for verification and parameter synthesis of hybrid systems. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 167–170. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6_17
Chapter Google Scholar
Doyen, L., Frehse, G., Pappas, G.J., Platzer, A.: Verification of hybrid systems. In: Handbook of Model Checking, pp. 1047–1110. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-10575-8_30
Chapter Google Scholar
Doyle, J.C., Francis, B.A., Tannenbaum, A.R.: Feedback control theory. Courier Corporation (2013)
Google Scholar
Forets, M., Freire, D., Schilling, C.: Efficient reachability analysis of parametric linear hybrid systems with time-triggered transitions. In: MEMOCODE, pp. 1–6. IEEE (2020). https://doi.org/10.1109/MEMOCODE51338.2020.9314994
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015). https://doi.org/10.5555/2789272.2886795
Article MathSciNet Google Scholar
Le Guernic, C., Girard, A.: Reachability analysis of hybrid systems using support functions. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 540–554. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02658-4_40
Chapter Google Scholar
Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. In: AAMAS, pp. 483–491 (2020). https://doi.org/10.5555/3398761.3398821
Henzinger, T.A., Kopke, P.W., Puri, A., Varaiya, P.: What’s decidable about hybrid automata? J. Comput. Syst. Sci. 57(1), 94–124 (1998). https://doi.org/10.1006/jcss.1998.1581
Article MathSciNet Google Scholar
Jaeger, M., Bacci, G., Bacci, G., Larsen, K.G., Jensen, P.G.: Approximating euclidean by imprecise markov decision processes. In: Margaria, T., Steffen, B. (eds.) ISoLA 2020. LNCS, vol. 12476, pp. 275–289. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61362-4_15
Chapter Google Scholar
Jaeger, M., Jensen, P.G., Guldstrand Larsen, K., Legay, A., Sedwards, S., Taankvist, J.H.: Teaching stratego to play ball: optimal synthesis for continuous space MDPs. In: Chen, Y.-F., Cheng, C.-H., Esparza, J. (eds.) ATVA 2019. LNCS, vol. 11781, pp. 81–97. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31784-3_5
Chapter Google Scholar
Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe reinforcement learning using probabilistic shields. In: CONCUR, LIPIcs, vol. 171, pp. 3:1–3:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020). https://doi.org/10.4230/LIPIcs.CONCUR.2020.3
Kapinski, J., Krogh, B.H., Maler, O., Stursberg, O.: On systematic simulation of open continuous systems. In: Maler, O., Pnueli, A. (eds.) HSCC 2003. LNCS, vol. 2623, pp. 283–297. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36580-X_22
Chapter Google Scholar
Karamanakos, P., Geyer, T., Manias, S.: Direct voltage control of DC-DC boost converters using enumeration-based model predictive control. IEEE Trans. Power Electron. 29(2), 968–978 (2013)
Article Google Scholar
Klischat, M., Althoff, M.: A multi-step approach to accelerate the computation of reachable sets for road vehicles. In: ITSC, pp. 1–7. IEEE (2020). https://doi.org/10.1109/ITSC45102.2020.9294328
Laczkovich, M.: The removal of $\pi $ from some undecidable problems involving elementary functions. Proc. Am. Math. Soc. 131(7), 2235–2240 (2003). https://doi.org/10.1090/S0002-9939-02-06753-9
Article MathSciNet Google Scholar
Larsen, K.G.: Statistical model checking, refinement checking, optimization, for stochastic hybrid systems. In: Jurdziński, M., Ničković, D. (eds.) FORMATS 2012. LNCS, vol. 7595, pp. 7–10. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33365-1_2
Chapter Google Scholar
Larsen, K.G., Le Coënt, A., Mikučionis, M., Taankvist, J.H.: Guaranteed control synthesis for continuous systems in Uppaal Tiga. In: Chamberlain, R., Taha, W., Törngren, M. (eds.) CyPhy/WESE -2018. LNCS, vol. 11615, pp. 113–133. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23703-5_6
Chapter Google Scholar
Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Safe and optimal adaptive cruise control. In: Meyer, R., Platzer, A., Wehrheim, H. (eds.) Correct System Design. LNCS, vol. 9360, pp. 260–277. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23506-6_17
Chapter Google Scholar
Lewis, F.L., Vrabie, D., Syrmos, V.L.: Optimal control. John Wiley & Sons (2012)
Google Scholar
Luo, Y., Ma, T.: Learning barrier certificates: towards safe reinforcement learning with zero training-time violations. In: NeurIPS, pp. 25621–25632 (2021). https://proceedings.neurips.cc/paper/2021/hash/d71fa38b648d86602d14ac610f2e6194-Abstract.html
Maderbacher, B., Schupp, S., Bartocci, E., Bloem, R., Nickovic, D., Könighofer, B.: Provable correct and adaptive simplex architecture for bounded-liveness properties. In: SPIN. LNCS, vol. 13872, pp. 141–160. Springer (2023). https://doi.org/10.1007/978-3-031-32157-3_8
Majumdar, R., Ozay, N., Schmuck, A.: On abstraction-based controller design with output feedback. In: HSCC, pp. 15:1–15:11. ACM (2020). https://doi.org/10.1145/3365365.3382219
Noaee, M., et al.: Reinforcement learning in urban network traffic signal control: a systematic literature review. Expert Syst. Appl. 199, 116830 (2022). https://doi.org/10.1016/j.eswa.2022.116830
Article Google Scholar
Shmarov, F., Zuliani, P.: Probreach: a tool for guaranteed reachability analysis of stochastic hybrid systems. In: SNR. EPiC Series in Computing, vol. 37, pp. 40–48. EasyChair (2015). https://doi.org/10.29007/mh2c
Tarski, A.: A decision method for elementary algebra and geometry. The RAND Corporation (1948). https://www.rand.org/pubs/reports/R109.html
Tarski, A.: A lattice-theoretical fixpoint theorem and its applications. Pacific J. Math. 5(2), 285–309 (1955). https://www.projecteuclid.org/journalArticle/Download?urlId=pjm%2F1103044538
Vlachogiannis, J.G., Hatziargyriou, N.D.: Reinforcement learning for reactive power control. IEEE Trans. Power Syst. 19(3), 1317–1325 (2004). https://doi.org/10.1109/TPWRS.2004.831259
Article Google Scholar
Žikelić, D., Lechner, M., Henzinger, T.A., Chatterjee, K.: Learning control policies for stochastic systems with reach-avoid guarantees. In: AAAI, pp. 11926–11935. AAAI Press (2023). https://doi.org/10.1609/aaai.v37i10.26407
Wabersich, K.P., Zeilinger, M.N.: A predictive safety filter for learning-based control of constrained nonlinear dynamical systems. Autom. 129, 109597 (2021). https://doi.org/10.1016/j.automatica.2021.109597
Article MathSciNet Google Scholar
Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis, University of Cambridge (1989)
Google Scholar
Zhao, H., Zhan, N., Kapur, D., Larsen, K.G.: A “Hybrid’’ approach for synthesizing optimal controllers of hybrid systems: a case study of the oil pump industrial example. In: Giannakopoulou, D., Méry, D. (eds.) FM 2012. LNCS, vol. 7436, pp. 471–485. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32759-9_38
Chapter Google Scholar

Download references

Acknowledgments

This research was partly supported by DIREC - Digital Research Centre Denmark and the Villum Investigator Grant S4OS - Scalable analysis and Synthesis of Safe, Secure and Optimal Strategies for Cyber-Physical Systems.

Author information

Authors and Affiliations

Department of Computer Science, Aalborg University, Aalborg, Denmark
Asger Horn Brorholt, Peter Gjøl Jensen, Kim Guldstrand Larsen, Florian Lorber & Christian Schilling

Authors

Asger Horn Brorholt
View author publications
You can also search for this author in PubMed Google Scholar
Peter Gjøl Jensen
View author publications
You can also search for this author in PubMed Google Scholar
Kim Guldstrand Larsen
View author publications
You can also search for this author in PubMed Google Scholar
Florian Lorber
View author publications
You can also search for this author in PubMed Google Scholar
Christian Schilling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asger Horn Brorholt .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Bernhard Steffen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brorholt, A.H., Jensen, P.G., Larsen, K.G., Lorber, F., Schilling, C. (2024). Shielded Reinforcement Learning for Hybrid Systems. In: Steffen, B. (eds) Bridging the Gap Between AI and Reality. AISoLA 2023. Lecture Notes in Computer Science, vol 14380. Springer, Cham. https://doi.org/10.1007/978-3-031-46002-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-46002-9_3
Published: 14 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46001-2
Online ISBN: 978-3-031-46002-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Shielded Reinforcement Learning for Hybrid Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reward Shaping from Hybrid Systems Models in Reinforcement Learning

Hybrid Controller Synthesis for Nonlinear Systems Subject to Reach-Avoid Constraints

Verifiably Safe Off-Model Reinforcement Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Shielded Reinforcement Learning for Hybrid Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reward Shaping from Hybrid Systems Models in Reinforcement Learning

Hybrid Controller Synthesis for Nonlinear Systems Subject to Reach-Avoid Constraints

Verifiably Safe Off-Model Reinforcement Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation