Vinicius Fochi, L. L. Caimi, Marcelo Ruaro, E. Wächter, F. Moraes
{"title":"System management recovery protocol for MPSoCs","authors":"Vinicius Fochi, L. L. Caimi, Marcelo Ruaro, E. Wächter, F. Moraes","doi":"10.1109/SOCC.2017.8226080","DOIUrl":null,"url":null,"abstract":"The advances in silicon technology lead to systems with hundreds of processors, the NoC-based MPSoCs. However, the higher fault probability in deep sub-micron technologies shortens the integrated circuits lifetime. Operating systems enable to execute distributed applications in the MPSoC processing elements (PEs). Large systems require PEs dedicated to management purposes, for example, execute the task mapping, handle monitoring data, and run self-awareness adaptation. This paper addresses an MPSoC hierarchically organized: PEs with an embedded operating system executing the applications (SpE) and dedicated PEs manage at runtime the system resources (Mpe). A rich literature presents fault-tolerant proposals for the hardware and software components of the MPSoC, but there is a significant gap related to fault-tolerant approaches at the system level, i.e., related to the PEs with the function to manage the system. Consider for example an Mpe responsible for managing a set of SpE s. A fault in an Mpe prevents the access to the set of SpE s to execute new applications. The goal of this paper is to present a method to determine when an Mpe became faulty, and propose a protocol to migrate the management software safely to an Spe. The management data is preserved, without saving the context in redundant structures. The proposal is transparent to the applications executing in the system, with a small execution overhead observed during the management migration, presented in the results Section.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 30th IEEE International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCC.2017.8226080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
The advances in silicon technology lead to systems with hundreds of processors, the NoC-based MPSoCs. However, the higher fault probability in deep sub-micron technologies shortens the integrated circuits lifetime. Operating systems enable to execute distributed applications in the MPSoC processing elements (PEs). Large systems require PEs dedicated to management purposes, for example, execute the task mapping, handle monitoring data, and run self-awareness adaptation. This paper addresses an MPSoC hierarchically organized: PEs with an embedded operating system executing the applications (SpE) and dedicated PEs manage at runtime the system resources (Mpe). A rich literature presents fault-tolerant proposals for the hardware and software components of the MPSoC, but there is a significant gap related to fault-tolerant approaches at the system level, i.e., related to the PEs with the function to manage the system. Consider for example an Mpe responsible for managing a set of SpE s. A fault in an Mpe prevents the access to the set of SpE s to execute new applications. The goal of this paper is to present a method to determine when an Mpe became faulty, and propose a protocol to migrate the management software safely to an Spe. The management data is preserved, without saving the context in redundant structures. The proposal is transparent to the applications executing in the system, with a small execution overhead observed during the management migration, presented in the results Section.