iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1007/978-3-031-46002-9_3
Shielded Reinforcement Learning for Hybrid Systems | SpringerLink
Skip to main content

Shielded Reinforcement Learning for Hybrid Systems

  • Conference paper
  • First Online:
Bridging the Gap Between AI and Reality (AISoLA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14380))

Included in the following conference series:

Abstract

Safe and optimal controller synthesis for switched-controlled hybrid systems, which combine differential equations and discrete changes of the system’s state, is known to be intricately hard. Reinforcement learning has been leveraged to construct near-optimal controllers, but their behavior is not guaranteed to be safe, even when it is encouraged by reward engineering. One way of imposing safety to a learned controller is to use a shield, which is correct by design. However, obtaining a shield for non-linear and hybrid environments is itself intractable. In this paper, we propose the construction of a shield using the so-called barbaric method, where an approximate finite representation of an underlying partition-based two-player safety game is extracted via systematically picked samples of the true transition function. While hard safety guarantees are out of reach, we experimentally demonstrate strong statistical safety guarantees with a prototype implementation and Uppaal Stratego. Furthermore, we study the impact of the synthesized shield when applied as either a pre-shield (applied before learning a controller) or a post-shield (only applied after learning a controller). We experimentally demonstrate superiority of the pre-shielding approach. We apply our technique on a range of case studies, including two industrial examples, and further study post-optimization of the post-shielding approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    For SHS with an upper bound on the number of discrete jumps up to a given time bound T, the equation is well-defined.

  2. 2.

    We assume that at most one bounce can take place within the period P.

References

  1. Reproducibility package - shielded reinforcement learning for hybrid systems. https://github.com/AsgerHB/Shielded-Learning-for-Hybrid-Systems

  2. Abate, A., Amin, S., Prandini, M., Lygeros, J., Sastry, S.: Computational approaches to reachability analysis of stochastic hybrid systems. In: Bemporad, A., Bicchi, A., Buttazzo, G. (eds.) HSCC 2007. LNCS, vol. 4416, pp. 4–17. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71493-4_4

    Chapter  Google Scholar 

  3. Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI, pp. 2669–2678. AAAI Press (2018). https://doi.org/10.1609/aaai.v32i1.11797

  4. Ashok, P., Křetínský, J., Larsen, K.G., Le Coënt, A., Taankvist, J.H., Weininger, M.: SOS: safe, optimal and small strategies for hybrid markov decision processes. In: Parker, D., Wolf, V. (eds.) QEST 2019. LNCS, vol. 11785, pp. 147–164. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30281-8_9

    Chapter  Google Scholar 

  5. Badings, T.S., et al.: Robust control for dynamical systems with non-Gaussian noise via formal abstractions. J. Artif. Intell. Res. 76, 341–391 (2023). https://doi.org/10.1613/jair.1.14253

    Article  MathSciNet  Google Scholar 

  6. Bastani, O., Li, S.: Safe reinforcement learning via statistical model predictive shielding. In: Robotics (2021). https://doi.org/10.15607/RSS.2021.XVII.026

  7. Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: NeurIPS, pp. 908–918 (2017). https://proceedings.neurips.cc/paper/2017/hash/766ebcd59621e305170616ba3d3dac32-Abstract.html

  8. Bernet, J., Janin, D., Walukiewicz, I.: Permissive strategies: from parity games to safety games. RAIRO Theor. Informatics Appl. 36(3), 261–275 (2002). https://doi.org/10.1051/ita:2002013

    Article  MathSciNet  Google Scholar 

  9. Bloem, R., Könighofer, B., Könighofer, R., Wang, C.: Shield Synthesis: In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 533–548. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_51

    Chapter  Google Scholar 

  10. Bogomolov, S., Forets, M., Frehse, G., Potomkin, K., Schilling, C.: JuliaReach: a toolbox for set-based reachability. In: HSCC, pp. 39–44. ACM (2019). https://doi.org/10.1145/3302504.3311804

  11. Bujorianu, L.M.: Stochastic reachability analysis of hybrid systems. Springer Science & Business Media (2012)

    Google Scholar 

  12. Busoniu, L., de Bruin, T., Tolic, D., Kober, J., Palunko, I.: Reinforcement learning for control: Performance, stability, and deep approximators. Annu. Rev. Control. 46, 8–28 (2018). https://doi.org/10.1016/j.arcontrol.2018.09.005

    Article  MathSciNet  Google Scholar 

  13. Carr, S., Jansen, N., Junges, S., Topcu, U.: Safe reinforcement learning via shielding under partial observability. In: AAAI, pp. 14748–14756. AAAI Press (2023). https://doi.org/10.1609/aaai.v37i12.26723

  14. Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: AAAI, pp. 3387–3395. AAAI Press (2019). https://doi.org/10.1609/aaai.v33i01.33013387

  15. Chow, Y., Nachum, O., Duéñez-Guzmán, E.A., Ghavamzadeh, M.: A Lyapunov-based approach to safe reinforcement learning. In: NeurIPS, pp. 8103–8112 (2018), https://proceedings.neurips.cc/paper/2018/hash/4fe5149039b52765bde64beb9f674940-Abstract.html

  16. Davenport, J.H., Heintz, J.: Real quantifier elimination is doubly exponential. J. Symb. Comput. 5(1), 29–35 (1988). https://doi.org/10.1016/S0747-7171(88)80004-X

    Article  MathSciNet  Google Scholar 

  17. David, A., et al.: Statistical model checking for stochastic hybrid systems. In: HSBm EPTCS, vol. 92, pp. 122–136 (2012). https://doi.org/10.4204/EPTCS.92.9

  18. David, A., Jensen, P.G., Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Uppaal Stratego. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 206–211. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_16

    Chapter  Google Scholar 

  19. Donzé, A.: Breach, a toolbox for verification and parameter synthesis of hybrid systems. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 167–170. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6_17

    Chapter  Google Scholar 

  20. Doyen, L., Frehse, G., Pappas, G.J., Platzer, A.: Verification of hybrid systems. In: Handbook of Model Checking, pp. 1047–1110. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-10575-8_30

    Chapter  Google Scholar 

  21. Doyle, J.C., Francis, B.A., Tannenbaum, A.R.: Feedback control theory. Courier Corporation (2013)

    Google Scholar 

  22. Forets, M., Freire, D., Schilling, C.: Efficient reachability analysis of parametric linear hybrid systems with time-triggered transitions. In: MEMOCODE, pp. 1–6. IEEE (2020). https://doi.org/10.1109/MEMOCODE51338.2020.9314994

  23. García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015). https://doi.org/10.5555/2789272.2886795

    Article  MathSciNet  Google Scholar 

  24. Le Guernic, C., Girard, A.: Reachability analysis of hybrid systems using support functions. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 540–554. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02658-4_40

    Chapter  Google Scholar 

  25. Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. In: AAMAS, pp. 483–491 (2020). https://doi.org/10.5555/3398761.3398821

  26. Henzinger, T.A., Kopke, P.W., Puri, A., Varaiya, P.: What’s decidable about hybrid automata? J. Comput. Syst. Sci. 57(1), 94–124 (1998). https://doi.org/10.1006/jcss.1998.1581

    Article  MathSciNet  Google Scholar 

  27. Jaeger, M., Bacci, G., Bacci, G., Larsen, K.G., Jensen, P.G.: Approximating euclidean by imprecise markov decision processes. In: Margaria, T., Steffen, B. (eds.) ISoLA 2020. LNCS, vol. 12476, pp. 275–289. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61362-4_15

    Chapter  Google Scholar 

  28. Jaeger, M., Jensen, P.G., Guldstrand Larsen, K., Legay, A., Sedwards, S., Taankvist, J.H.: Teaching stratego to play ball: optimal synthesis for continuous space MDPs. In: Chen, Y.-F., Cheng, C.-H., Esparza, J. (eds.) ATVA 2019. LNCS, vol. 11781, pp. 81–97. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31784-3_5

    Chapter  Google Scholar 

  29. Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe reinforcement learning using probabilistic shields. In: CONCUR, LIPIcs, vol. 171, pp. 3:1–3:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020). https://doi.org/10.4230/LIPIcs.CONCUR.2020.3

  30. Kapinski, J., Krogh, B.H., Maler, O., Stursberg, O.: On systematic simulation of open continuous systems. In: Maler, O., Pnueli, A. (eds.) HSCC 2003. LNCS, vol. 2623, pp. 283–297. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36580-X_22

    Chapter  Google Scholar 

  31. Karamanakos, P., Geyer, T., Manias, S.: Direct voltage control of DC-DC boost converters using enumeration-based model predictive control. IEEE Trans. Power Electron. 29(2), 968–978 (2013)

    Article  Google Scholar 

  32. Klischat, M., Althoff, M.: A multi-step approach to accelerate the computation of reachable sets for road vehicles. In: ITSC, pp. 1–7. IEEE (2020). https://doi.org/10.1109/ITSC45102.2020.9294328

  33. Laczkovich, M.: The removal of \(\pi \) from some undecidable problems involving elementary functions. Proc. Am. Math. Soc. 131(7), 2235–2240 (2003). https://doi.org/10.1090/S0002-9939-02-06753-9

    Article  MathSciNet  Google Scholar 

  34. Larsen, K.G.: Statistical model checking, refinement checking, optimization, for stochastic hybrid systems. In: Jurdziński, M., Ničković, D. (eds.) FORMATS 2012. LNCS, vol. 7595, pp. 7–10. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33365-1_2

    Chapter  Google Scholar 

  35. Larsen, K.G., Le Coënt, A., Mikučionis, M., Taankvist, J.H.: Guaranteed control synthesis for continuous systems in Uppaal Tiga. In: Chamberlain, R., Taha, W., Törngren, M. (eds.) CyPhy/WESE -2018. LNCS, vol. 11615, pp. 113–133. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23703-5_6

    Chapter  Google Scholar 

  36. Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Safe and optimal adaptive cruise control. In: Meyer, R., Platzer, A., Wehrheim, H. (eds.) Correct System Design. LNCS, vol. 9360, pp. 260–277. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23506-6_17

    Chapter  Google Scholar 

  37. Lewis, F.L., Vrabie, D., Syrmos, V.L.: Optimal control. John Wiley & Sons (2012)

    Google Scholar 

  38. Luo, Y., Ma, T.: Learning barrier certificates: towards safe reinforcement learning with zero training-time violations. In: NeurIPS, pp. 25621–25632 (2021). https://proceedings.neurips.cc/paper/2021/hash/d71fa38b648d86602d14ac610f2e6194-Abstract.html

  39. Maderbacher, B., Schupp, S., Bartocci, E., Bloem, R., Nickovic, D., Könighofer, B.: Provable correct and adaptive simplex architecture for bounded-liveness properties. In: SPIN. LNCS, vol. 13872, pp. 141–160. Springer (2023). https://doi.org/10.1007/978-3-031-32157-3_8

  40. Majumdar, R., Ozay, N., Schmuck, A.: On abstraction-based controller design with output feedback. In: HSCC, pp. 15:1–15:11. ACM (2020). https://doi.org/10.1145/3365365.3382219

  41. Noaee, M., et al.: Reinforcement learning in urban network traffic signal control: a systematic literature review. Expert Syst. Appl. 199, 116830 (2022). https://doi.org/10.1016/j.eswa.2022.116830

    Article  Google Scholar 

  42. Shmarov, F., Zuliani, P.: Probreach: a tool for guaranteed reachability analysis of stochastic hybrid systems. In: SNR. EPiC Series in Computing, vol. 37, pp. 40–48. EasyChair (2015). https://doi.org/10.29007/mh2c

  43. Tarski, A.: A decision method for elementary algebra and geometry. The RAND Corporation (1948). https://www.rand.org/pubs/reports/R109.html

  44. Tarski, A.: A lattice-theoretical fixpoint theorem and its applications. Pacific J. Math. 5(2), 285–309 (1955). https://www.projecteuclid.org/journalArticle/Download?urlId=pjm%2F1103044538

  45. Vlachogiannis, J.G., Hatziargyriou, N.D.: Reinforcement learning for reactive power control. IEEE Trans. Power Syst. 19(3), 1317–1325 (2004). https://doi.org/10.1109/TPWRS.2004.831259

    Article  Google Scholar 

  46. Žikelić, D., Lechner, M., Henzinger, T.A., Chatterjee, K.: Learning control policies for stochastic systems with reach-avoid guarantees. In: AAAI, pp. 11926–11935. AAAI Press (2023). https://doi.org/10.1609/aaai.v37i10.26407

  47. Wabersich, K.P., Zeilinger, M.N.: A predictive safety filter for learning-based control of constrained nonlinear dynamical systems. Autom. 129, 109597 (2021). https://doi.org/10.1016/j.automatica.2021.109597

    Article  MathSciNet  Google Scholar 

  48. Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis, University of Cambridge (1989)

    Google Scholar 

  49. Zhao, H., Zhan, N., Kapur, D., Larsen, K.G.: A “Hybrid’’ approach for synthesizing optimal controllers of hybrid systems: a case study of the oil pump industrial example. In: Giannakopoulou, D., Méry, D. (eds.) FM 2012. LNCS, vol. 7436, pp. 471–485. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32759-9_38

    Chapter  Google Scholar 

Download references

Acknowledgments

This research was partly supported by DIREC - Digital Research Centre Denmark and the Villum Investigator Grant S4OS - Scalable analysis and Synthesis of Safe, Secure and Optimal Strategies for Cyber-Physical Systems.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asger Horn Brorholt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Brorholt, A.H., Jensen, P.G., Larsen, K.G., Lorber, F., Schilling, C. (2024). Shielded Reinforcement Learning for Hybrid Systems. In: Steffen, B. (eds) Bridging the Gap Between AI and Reality. AISoLA 2023. Lecture Notes in Computer Science, vol 14380. Springer, Cham. https://doi.org/10.1007/978-3-031-46002-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46002-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46001-2

  • Online ISBN: 978-3-031-46002-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics