System management recovery protocol for MPSoCs

2017 30th IEEE International System-on-Chip Conference (SOCC) Pub Date : 2017-09-01 DOI:10.1109/SOCC.2017.8226080

Vinicius Fochi, L. L. Caimi, Marcelo Ruaro, E. Wächter, F. Moraes

{"title":"System management recovery protocol for MPSoCs","authors":"Vinicius Fochi, L. L. Caimi, Marcelo Ruaro, E. Wächter, F. Moraes","doi":"10.1109/SOCC.2017.8226080","DOIUrl":null,"url":null,"abstract":"The advances in silicon technology lead to systems with hundreds of processors, the NoC-based MPSoCs. However, the higher fault probability in deep sub-micron technologies shortens the integrated circuits lifetime. Operating systems enable to execute distributed applications in the MPSoC processing elements (PEs). Large systems require PEs dedicated to management purposes, for example, execute the task mapping, handle monitoring data, and run self-awareness adaptation. This paper addresses an MPSoC hierarchically organized: PEs with an embedded operating system executing the applications (SpE) and dedicated PEs manage at runtime the system resources (Mpe). A rich literature presents fault-tolerant proposals for the hardware and software components of the MPSoC, but there is a significant gap related to fault-tolerant approaches at the system level, i.e., related to the PEs with the function to manage the system. Consider for example an Mpe responsible for managing a set of SpE s. A fault in an Mpe prevents the access to the set of SpE s to execute new applications. The goal of this paper is to present a method to determine when an Mpe became faulty, and propose a protocol to migrate the management software safely to an Spe. The management data is preserved, without saving the context in redundant structures. The proposal is transparent to the applications executing in the system, with a small execution overhead observed during the management migration, presented in the results Section.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 30th IEEE International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCC.2017.8226080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

The advances in silicon technology lead to systems with hundreds of processors, the NoC-based MPSoCs. However, the higher fault probability in deep sub-micron technologies shortens the integrated circuits lifetime. Operating systems enable to execute distributed applications in the MPSoC processing elements (PEs). Large systems require PEs dedicated to management purposes, for example, execute the task mapping, handle monitoring data, and run self-awareness adaptation. This paper addresses an MPSoC hierarchically organized: PEs with an embedded operating system executing the applications (SpE) and dedicated PEs manage at runtime the system resources (Mpe). A rich literature presents fault-tolerant proposals for the hardware and software components of the MPSoC, but there is a significant gap related to fault-tolerant approaches at the system level, i.e., related to the PEs with the function to manage the system. Consider for example an Mpe responsible for managing a set of SpE s. A fault in an Mpe prevents the access to the set of SpE s to execute new applications. The goal of this paper is to present a method to determine when an Mpe became faulty, and propose a protocol to migrate the management software safely to an Spe. The management data is preserved, without saving the context in redundant structures. The proposal is transparent to the applications executing in the system, with a small execution overhead observed during the management migration, presented in the results Section.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于mpsoc的系统管理恢复协议

硅技术的进步导致了拥有数百个处理器的系统，即基于noc的mpsoc。然而，在深亚微米技术中，较高的故障概率缩短了集成电路的寿命。操作系统能够在MPSoC处理元素(pe)中执行分布式应用程序。大型系统需要专用于管理目的的pe，例如，执行任务映射、处理监视数据和运行自我感知适应。本文讨论了一种分层组织的MPSoC: pe具有执行应用程序的嵌入式操作系统(SpE)，专用pe在运行时管理系统资源(Mpe)。丰富的文献提出了MPSoC硬件和软件组件的容错建议，但在系统级别上存在与容错方法相关的显着差距，即与具有管理系统功能的pe相关。例如，考虑一个负责管理一组SpE的Mpe。Mpe的故障会阻止访问这组SpE来执行新的应用程序。本文的目标是提出一种确定Mpe何时出现故障的方法，并提出一种将管理软件安全地迁移到Spe的协议。管理数据被保留，而不需要在冗余结构中保存上下文。该建议对于在系统中执行的应用程序是透明的，在管理迁移期间观察到的执行开销很小，在结果部分中给出。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 30th IEEE International System-on-Chip Conference (SOCC)

自引率

0.00%

发文量