Abstract
For decades, radiation-induced failures have been a known issue for aero-space systems, in which redundancy mechanisms are employed as a protection method. Due to the shrinking of structures and operating voltages, these failures are increasingly becoming an issue even for terrestrial applications. Unfortunately, redundancy increases costs, area usage, and power consumption, which can hinder its utilization in cost- and power-sensitive safety-critical applications, such as automotive. To overcome this limitation, multiple software-based approaches have been proposed, which assume the existence of an underlying error-free operating system. In this paper, we investigate the radiation reliability of two dependability-oriented real-time operating systems, namely, the popular eCos operating system hardened through aspect-oriented programming methods, and dOSEK, an embedded kernel designed from the ground up having reliability as a major concern. Both operating systems were evaluated through extensive neutron-beam testings on a 28 nm ARM-based state-of-the-art system-on-chip, and their fault tolerance mechanisms reached reductions in the overall cross-sections relative to their baselines up to 91% and 74%, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
It is important to notice that this is based solely on the estimated failure rate figures and assuming all failures could lead to dangerous consequences; no hazard and risk assessment was carried out, nor was the software tested for coverage; we do not claim the EUC to achieve these SILs.
References
ISO/DIS 26262. Technical report (2011)
Baumann, R.: Soft errors in advanced computer systems. IEEE Design Test Comput. 22(3), 258–266 (2005)
Borchert, C., Spinczyk, O.: Hardening an L4 microkernel against soft errors by aspect-oriented programming and whole-program analysis. In: Proceeding of the 8th Workshop on Programming Languages and Operating Systems. ACM (2015)
Borchert, C., et al.: Generative software-based memory error detection and correction for operating system data structures. In: 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 1–12. IEEE (2013)
Borchert, C., et al.: Generic soft-error detection and correction for concurrent data structures. IEEE Trans. Dependable Secure Comput. PP(99) (2015)
Dietrich, C., et al.: Cross-kernel control-flow-graph analysis for event-driven real-time systems. In: Proceeding of the Conference on Languages, Compilers and Tools for Embedded Systems (LCTES 2015). ACM, June 2015
Digilent: Zedboard data sheet overview (2014). http://www.xilinx.com/support/documentation/data_sheets/ds190-Zynq-7000-Overview.pdf
Gu, W., et al.: Characterization of Linux kernel behavior under errors. In: International Conference on Dependable Systems and Networks (DSN). IEEE (2003)
Guillen Salas, A., et al.: PhoneSat in-flight experience results. In: Proceeding of the Small Satellites and Services Symposium, May 2014
Hamming, R.W.: Error detecting and error correcting codes. Bell Syst. Tech. J. 29(2), 147–160 (1950)
Herrera-Alzu, I., Lopez-Vallejo, M.: System design framework and methodology for Xilinx Virtex FPGA configuration scrubbers. IEEE Trans. Nucl. Sci. 61(1), 619–629 (2014)
Hoffmann, M., et al.: dOSEK: the design and implementation of a dependability-oriented static embedded kernel. In: Proceeding of the 21st Real-Time and Embedded Technology and Applications (RTAS 2015). pp. 259–270. IEEE, April 2015
JEDEC Solid State Technology Association: JESD89-3A: Test Method for Beam Accelerated Soft Error Rate. http://www.jedec.org/standards-documents/docs/jesd-89-3a
Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C., Lopes, C., Loingtier, J.-M., Irwin, J.: Aspect-oriented programming. In: Akşit, M., Matsuoka, S. (eds.) ECOOP 1997. LNCS, vol. 1241, pp. 220–242. Springer, Heidelberg (1997). doi:10.1007/BFb0053381
Lesea, A., et al.: Soft error study of ARM SoC at 28 nanometers. In: Proceeding of the IEEE Workshop on Silicon Errors in Logic - System Effects 2014 (2014)
Massa, A.: Embedded Software Development with eCos. Prentice Hall Professional Technical Reference (2002)
Mukherjee, S.S., et al.: A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In: Proceeding of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE (2003)
OSEK/VDX Group: operating system specification 2.2.3. Technical report. http://portal.osek-vdx.org/files/pdf/specs/os223.pdf, Accessed 29 Sept 2014
Quinn, H., et al.: Single-event effects in low-cost, low-power microprocessors. In: Radiation Effects Data Workshop (REDW), pp. 1–9. IEEE, July 2014
Santini, T., et al.: Reducing embedded software radiation-induced failures through cache memories. In: 19th European Test Symposium (ETS), pp. 1–6. IEEE (2014)
Santini, T., et al.: Beyond cross-section: spatio-temporal reliability analysis. ACM Trans. Embed. Comput. Syst. 15(1), 3:1–3:16 (2015)
Santini, T., et al.: Exploiting cache conflicts to reduce radiation sensitivity of operating systems on embedded systems. In: Proceeding of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES, pp. 49–58. IEEE (2015)
Schirmeier, H., et al.: FAIL*: an open and versatile fault-injection framework for the assessment of software-implemented hardware fault tolerance. In: Proceeding of the 11th European Dependable Computing Conference, pp. 245–255. IEEE, September 2015
Shirvani, P.P., et al.: Software-implemented EDAC protection against SEUs. IEEE Trans. Reliab. 49(3), 273–284 (2000)
Shye, A., et al.: PLR: a software approach to transient fault tolerance for multicore architectures. IEEE Trans. Dependable Secure Comput. (2009)
Smith, D.J., Simpson, K.G.: Safety Critical Systems Handbook: a straightfoward guide to functional safety, IEC 61508 and related standards, including process IEC 61511 and machinery IEC 62061 and ISO 13849. Elsevier (2010)
Spinczyk, O., Lohmann, D.: The design and implementation of AspectC++. Knowl.-Based Syst. 20(7), 636–651 (2007). Special Issue on Techniques to Produce Intelligent Secure Software
Wang, C., et al.: Compiler-managed software-based redundant multi-threading for transient fault detection. In: Proceeding of the International Symposium on Code Generation and Optimization, CGO 2007, pp. 244–258. IEEE (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Santini, T. et al. (2017). Effectiveness of Software-Based Hardening for Radiation-Induced Soft Errors in Real-Time Operating Systems. In: Knoop, J., Karl, W., Schulz, M., Inoue, K., Pionteck, T. (eds) Architecture of Computing Systems - ARCS 2017. ARCS 2017. Lecture Notes in Computer Science(), vol 10172. Springer, Cham. https://doi.org/10.1007/978-3-319-54999-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-54999-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54998-9
Online ISBN: 978-3-319-54999-6
eBook Packages: Computer ScienceComputer Science (R0)