iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1145/2818302.2818304
Hardening an L4 microkernel against soft errors by aspect-oriented programming and whole-program analysis | Proceedings of the 8th Workshop on Programming Languages and Operating Systems skip to main content
10.1145/2818302.2818304acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Hardening an L4 microkernel against soft errors by aspect-oriented programming and whole-program analysis

Published: 04 October 2015 Publication History

Abstract

Transient hardware faults in computer systems have become widespread as shrinking structures and low supply voltages reduce the amount of energy needed to trigger a fault. This paper describes the latest improvements of a software-based fault-tolerance mechanism called Generic Object Protection (GOP). It is based on Aspect-Orientied Programming in AspectC++ and has been used in a case study to harden the L4/Fiasco.OC microkernel. As a result, the improved GOP avoids 60% of kernel failures at an acceptable overhead of 19% code size and less than 1% runtime. The GOP improvements use static whole-program analysis and have been implemented in a prototypical manner. As an outlook, the paper presents envisioned language extensions providing whole-program control-flow and data-flow analyses in future AspectC++ versions.

References

[1]
Rickard A. Åberg, Julia L. Lawall, Mario Südholt, Gilles Muller, and Anne-Françoise Le Meur. On the automatic evolution of an OS kernel using temporal logic and AOP. In 18th IEEE International Conference on Automated Software Engineering (ASE 2003), 6--10 October 2003, Montreal, Canada, pages 196--204, 2003.
[2]
Francisco Afonso, Carlos Silva, Nuno Brito, Sergio Montenegro, and Adriano Tavares. Aspect-oriented fault tolerance for real-time embedded systems. In Proceedings of the 7th AOSD Workshop on Aspects, Components, and Patterns for Infrastructure Software (AOSD-ACP4IS '08), pages 2:1--2:8, New York, NY, USA, 2008. ACM.
[3]
Ruben Alexandersson and Johan Karlsson. Fault injection-based assessment of aspect-oriented implementation of fault tolerance. In Proceedings of the 41st IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '11), pages 303--314. IEEE Computer Society Press, June 2011.
[4]
Christoph Borchert, Horst Schirmeier, and Olaf Spinczyk. Protecting the dynamic dispatch in C++ by dependability aspects. In Proceedings of the 1st GI Workshop on Software-Based Methods for Robust Embedded Systems (SOBRES '12), Lecture Notes in Informatics, pages 521--535. German Society of Informatics, September 2012.
[5]
Christoph Borchert, Horst Schirmeier, and Olaf Spinczyk. Return-address protection in C/C++ code by dependability aspects. In Proceedings of the 2nd GI Workshop on Software-Based Methods for Robust Embedded Systems (SOBRES '13), Lecture Notes in Informatics. German Society of Informatics, September 2013.
[6]
Christoph Borchert, Horst Schirmeier, and Olaf Spinczyk. Generic soft-error detection and correction for concurrent data structures. IEEE Transactions on Dependable and Secure Computing, PP(99), 2015. To appear.
[7]
Michael Carbin, Sasa Misailovic, and Martin C. Rinard. Verifying quantitative reliability for programs that execute on unreliable hardware. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA '13, pages 33--52, New York, NY, USA, 2013. ACM.
[8]
Jonathan Chang, George A. Reis, and David I. August. Automatic instruction-level software-only recovery. In Dependable Systems and Networks, 2006. DSN 2006. International Conference on, pages 83--92, June 2006.
[9]
Krysztof Czarnecki and Ulrich W. Eisenecker. Generative Programming. Methods, Tools and Applications. Addison-Wesley, May 2000.
[10]
Marc de Kruijf, Shuou Nomura, and Karthikeyan Sankaralingam. Relax: An architectural framework for software recovery of hardware faults. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA '10, pages 497--508, New York, NY, USA, 2010. ACM.
[11]
Timothy J. Dell. A white paper on the benefits of chipkill-correct ECC for PC server main memory. IBM Whitepaper, 1997.
[12]
Christof Fetzer, Ute Schiffel, and Martin Süßkraut. AN-encoding compiler: Building safety-critical systems with commodity hardware. In Proceedings of the 28th International Conference on Computer Safety, Reliability, and Security, SAFECOMP '09, pages 283--296, Berlin, Heidelberg, 2009. Springer-Verlag.
[13]
R. W. Hamming. Error detecting and error correcting codes. Bell System Technical Journal, 29(2):147--160, 1950.
[14]
Martin Hoffmann, Christoph Borchert, Christian Dietrich, Horst Schirmeier, Rüdiger Kapitza, Olaf Spinczyk, and Daniel Lohmann. Effectiveness of fault detection mechanisms in static and dynamic operating system designs. In Proceedings of the 17th IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '14), pages 230--237. IEEE Computer Society Press, June 2014.
[15]
Martin Hoffmann, Florian Lukas, Christian Dietrich, and Daniel Lohmann. dOSEK: The design and implementation of a dependability-oriented static embedded kernel. In Proceedings of the 21st IEEE Real-Time and Embedded Technology and Applications (RTAS '15), Los Alamitos, CA, USA, April 2015. IEEE Computer Society Press.
[16]
Andy A. Hwang, Ioan A. Stefanovici, and Bianca Schroeder. Cosmic rays don't strike twice: Understanding the nature of DRAM errors and the implications for system design. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '12), pages 111--122, New York, NY, USA, 2012. ACM.
[17]
Bruce Jacob, Spencer W. Ng, and David T. Wang. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann Publishers Inc., Burlington, MA, USA, 2008.
[18]
G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C. Lopes, J.-M. Loingtier, and J. Irwin. Aspect-oriented programming. In M. Aksit and S. Matsuoka, editors, Proceedings of the 11th European Conference on Object-Oriented Programming (ECOOP '97), volume 1241 of Lecture Notes in Computer Science, pages 220--242. Springer-Verlag, June 1997.
[19]
Günter Kniesel and Tobias Rho. Generic aspect languages - needs, options and challenges. In JFDLPA. September 2005.
[20]
Adam Lackorzynski and Alexander Warg. Taming subsystems: Capabilities as universal resource access control in L4. In Proceedings of the Second Workshop on Isolation and Integration in Embedded Systems, IIES '09, pages 25--30, New York, NY, USA, 2009. ACM.
[21]
J. Melton and S. Buxton. Querying XML: XQuery, XPath, and SQL/XML in Context. The Morgan Kaufmann series in data management systems. Morgan Kaufmann, 2006.
[22]
Nahmsuk Oh, Subhasish Mitra, and Edward J. McCluskey. Ed4i: Error detection by diverse data and duplicated instructions. IEEE Transactions on Computers, 51(2):180--199, Feb 2002.
[23]
Nahmsuk Oh, Philip P. Shirvani, and Edward J. McCluskey. Error detection by duplicated instructions in super-scalar processors. IEEE Transactions on Reliability, 51(1):63--75, March 2002.
[24]
Karthik Pattabiraman, Vinod Grover, and Benjamin G. Zorn. Samurai: Protecting critical data in unsafe languages. In Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008, Eurosys '08, pages 219--232, New York, NY, USA, 2008. ACM.
[25]
Frances Perry, Lester Mackey, George A. Reis, Jay Ligatti, David I. August, and David Walker. Fault-tolerant typed assembly language. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, pages 42--53, New York, NY, USA, 2007. ACM.
[26]
Steven K. Reinhardt and Shubhendu S. Mukherjee. Transient fault detection via simultaneous multithreading. SIGARCH Comput. Archit. News, 28(2):25--36, May 2000.
[27]
Swarup Kumar Sahoo, Man-Lap Li, Pradeep Ramachandran, Sarita V. Adve, Vikram S. Adve, and Yuanyuan Zhou. Using likely program invariants to detect hardware errors. In Proceedings of the 38th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '08), pages 70--79. IEEE Computer Society Press.
[28]
Horst Schirmeier, Christoph Borchert, and Olaf Spinczyk. Avoiding pitfalls in fault-injection based comparison of program susceptibility to soft errors. In Proceedings of the 45th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '15). IEEE Computer Society Press, June 2015.
[29]
Horst Schirmeier, Martin Hoffmann, Rüdiger Kapitza, Daniel Lohmann, and Olaf Spinczyk. FAIL*: Towards a versatile fault-injection experiment framework. In 25th International Conference on Architecture of Computing Systems (ARCS '12), Workshop Proceedings, volume 200 of Lecture Notes in Informatics, pages 201--210. German Society of Informatics, March 2012.
[30]
Philip P. Shirvani, Nirmal R. Saxena, and Edward J. McCluskey. Software-implemented EDAC protection against SEUs. IEEE Transactions on Reliability, 49(3):273--284, Sep 2000.
[31]
A. Shye, J. Blomstedt, T. Moseley, V. J. Reddi, and D. A. Connors. Plr: A software approach to transient fault tolerance for multicore architectures. Dependable and Secure Computing, IEEE Transactions on, 6(2):135--148, April 2009.
[32]
D. Skarin and J. Karlsson. Software implemented detection and recovery of soft errors in a brake-by-wire system. In Dependable Computing Conference, 2008. EDCC 2008. Seventh European, pages 145--154, May 2008.
[33]
Olaf Spinczyk and Daniel Lohmann. The design and implementation of AspectC++. Knowledge-Based Systems, Special Issue on Techniques to Produce Intelligent Secure Software, 20(7):636--651, 2007.
[34]
Vilas Sridharan, Nathan DeBardeleben, Sean Blanchard, Kurt B. Ferreira, Jon Stearley, John Shalf, and Sudhanva Gurumurthi. Memory errors in modern systems: The good, the bad, and the ugly. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15), pages 297--310, New York, NY, USA, 2015. ACM Press.
[35]
Vilas Sridharan, Jon Stearley, Nathan DeBardeleben, Sean Blanchard, and Sudhanva Gurumurthi. Feng shui of supercomputer memory: Positional effects in DRAM and SRAM faults. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '13), pages 22:1--22:11, New York, NY, USA, November 2013. ACM Press.
[36]
Cheng Wang, H. Kim, Y. Wu, and V. Ying. Compiler-managed software-based redundant multi-threading for transient fault detection. In Code Generation and Optimization, 2007. CGO '07. International Symposium on, pages 244--258, March 2007.
[37]
Doe Hyun Yoon and Mattan Erez. Virtualized and flexible ECC for main memory. In 15th Int. Conf. on Arch. Support for Programming Languages and Operating Systems (ASPLOS '10), pages 397--408, New York, NY, USA, 2010. ACM.

Cited By

View all
  • (2020)Dependability Aspects in Configurable Embedded Operating SystemsDependable Embedded Systems10.1007/978-3-030-52017-5_4(85-116)Online publication date: 10-Dec-2020
  • (2017)Effectiveness of Software-Based Hardening for Radiation-Induced Soft Errors in Real-Time Operating SystemsArchitecture of Computing Systems - ARCS 201710.1007/978-3-319-54999-6_1(3-15)Online publication date: 4-Mar-2017
  • (2016)CrossCheck: A Holistic Approach for Tolerating Crash-Faults and Arbitrary Failures2016 12th European Dependable Computing Conference (EDCC)10.1109/EDCC.2016.29(65-76)Online publication date: Sep-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLOS '15: Proceedings of the 8th Workshop on Programming Languages and Operating Systems
October 2015
50 pages
ISBN:9781450339421
DOI:10.1145/2818302
  • Program Chair:
  • Shan Lu
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 October 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • German Research Foundation (DFG)

Conference

SOSP '15
Sponsor:

Acceptance Rates

PLOS '15 Paper Acceptance Rate 7 of 16 submissions, 44%;
Overall Acceptance Rate 17 of 32 submissions, 53%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Dependability Aspects in Configurable Embedded Operating SystemsDependable Embedded Systems10.1007/978-3-030-52017-5_4(85-116)Online publication date: 10-Dec-2020
  • (2017)Effectiveness of Software-Based Hardening for Radiation-Induced Soft Errors in Real-Time Operating SystemsArchitecture of Computing Systems - ARCS 201710.1007/978-3-319-54999-6_1(3-15)Online publication date: 4-Mar-2017
  • (2016)CrossCheck: A Holistic Approach for Tolerating Crash-Faults and Arbitrary Failures2016 12th European Dependable Computing Conference (EDCC)10.1109/EDCC.2016.29(65-76)Online publication date: Sep-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media