Pub Date : 2016-06-27DOI: 10.1109/ReCoSoC.2016.7533901
M. Rashid, Malik Imran, A. Jafri
Flexible hardware implementations of cryptographic algorithms in the real time applications have been frequently proposed. This paper classifies the state-of-the-art research practices through a Systematic Literature Review (SLR) process. The selected researches have been classified into three design categories: crypto processor, crypto coprocessor and multicore crypto processor. Subsequently, comparative analysis in terms of flexibility, throughput and area is presented. It facilitates the researchers and designers of the domain to select an appropriate design approach for a particular algorithm and/or application.
{"title":"Comparative analysis of flexible cryptographic implementations","authors":"M. Rashid, Malik Imran, A. Jafri","doi":"10.1109/ReCoSoC.2016.7533901","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533901","url":null,"abstract":"Flexible hardware implementations of cryptographic algorithms in the real time applications have been frequently proposed. This paper classifies the state-of-the-art research practices through a Systematic Literature Review (SLR) process. The selected researches have been classified into three design categories: crypto processor, crypto coprocessor and multicore crypto processor. Subsequently, comparative analysis in terms of flexibility, throughput and area is presented. It facilitates the researchers and designers of the domain to select an appropriate design approach for a particular algorithm and/or application.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121912903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-27DOI: 10.1109/ReCoSoC.2016.7533896
D. Stroobandt, A. Varbanescu, C. Ciobanu, Muhammed Al Kadi, A. Brokalakis, George Charitopoulos, T. Todman, Xinyu Niu, D. Pnevmatikatos, Amit Kulkarni, Elias Vansteenkiste, W. Luk, M. Santambrogio, D. Sciuto, M. Huebner, Tobias Becker, G. Gaydadjiev, A. Nikitakis, A. Thom
To handle the stringent performance requirements of future exascale-class applications, High Performance Computing (HPC) systems need ultra-efficient heterogeneous compute nodes. To reduce power and increase performance, such compute nodes will require hardware accelerators with a high degree of specialization. Ideally, dynamic reconfiguration will be an intrinsic feature, so that specific HPC application features can be optimally accelerated, even if they regularly change over time. In the EXTRA project, we create a new and flexible exploration platform for developing reconfigurable architectures, design tools and HPC applications with run-time reconfiguration built-in as a core fundamental feature instead of an add-on. EXTRA covers the entire stack from architecture up to the application, focusing on the fundamental building blocks for run-time reconfigurable exascale HPC systems: new chip architectures with very low reconfiguration overhead, new tools that truly take reconfiguration as a central design concept, and applications that are tuned to maximally benefit from the proposed run-time reconfiguration techniques. Ultimately, this open platform will improve Europe's competitive advantage and leadership in the field.
{"title":"EXTRA: Towards the exploitation of eXascale technology for reconfigurable architectures","authors":"D. Stroobandt, A. Varbanescu, C. Ciobanu, Muhammed Al Kadi, A. Brokalakis, George Charitopoulos, T. Todman, Xinyu Niu, D. Pnevmatikatos, Amit Kulkarni, Elias Vansteenkiste, W. Luk, M. Santambrogio, D. Sciuto, M. Huebner, Tobias Becker, G. Gaydadjiev, A. Nikitakis, A. Thom","doi":"10.1109/ReCoSoC.2016.7533896","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533896","url":null,"abstract":"To handle the stringent performance requirements of future exascale-class applications, High Performance Computing (HPC) systems need ultra-efficient heterogeneous compute nodes. To reduce power and increase performance, such compute nodes will require hardware accelerators with a high degree of specialization. Ideally, dynamic reconfiguration will be an intrinsic feature, so that specific HPC application features can be optimally accelerated, even if they regularly change over time. In the EXTRA project, we create a new and flexible exploration platform for developing reconfigurable architectures, design tools and HPC applications with run-time reconfiguration built-in as a core fundamental feature instead of an add-on. EXTRA covers the entire stack from architecture up to the application, focusing on the fundamental building blocks for run-time reconfigurable exascale HPC systems: new chip architectures with very low reconfiguration overhead, new tools that truly take reconfiguration as a central design concept, and applications that are tuned to maximally benefit from the proposed run-time reconfiguration techniques. Ultimately, this open platform will improve Europe's competitive advantage and leadership in the field.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128702041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-27DOI: 10.1109/ReCoSoC.2016.7533909
Wolfgang Büter, Dominic Oehlert, A. Ortiz
Optical on-chip communication technology provides an unprecedented bandwidth. It allows to connect the hundreds or even thousands of processing elements expected in many core systems using optical Network-on-Chip. However, the required buffers to interface the electrical and optical layers are very large, since optical data-flow cannot be stored. Moreover, on-chip optical technologies have high defect rates which limits its usability severely. In order to address these challenges, this work presents a buffer-efficient reconfigurable optical Network-on-Chip with permanent-error recognition. The buffer-efficiency is achieved by a global credit-based arbitration with optical tokens. Further on, the architecture autonomously detects permanent errors in the optical components and configures the communication paths to avoid them. The work provides a thorough analysis at the gate-level of the area overhead incurred by the electrical sub-modules of the proposed system. It shows the practicability of the approach, experimental validated on a FPGA prototype. Compared with previously reported optical networks, it achieves an area reduction of up to 80% with almost identical performance.
{"title":"ERRCA: A buffer-efficient reconfigurable optical Network-on-Chip with permanent-error recognition","authors":"Wolfgang Büter, Dominic Oehlert, A. Ortiz","doi":"10.1109/ReCoSoC.2016.7533909","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533909","url":null,"abstract":"Optical on-chip communication technology provides an unprecedented bandwidth. It allows to connect the hundreds or even thousands of processing elements expected in many core systems using optical Network-on-Chip. However, the required buffers to interface the electrical and optical layers are very large, since optical data-flow cannot be stored. Moreover, on-chip optical technologies have high defect rates which limits its usability severely. In order to address these challenges, this work presents a buffer-efficient reconfigurable optical Network-on-Chip with permanent-error recognition. The buffer-efficiency is achieved by a global credit-based arbitration with optical tokens. Further on, the architecture autonomously detects permanent errors in the optical components and configures the communication paths to avoid them. The work provides a thorough analysis at the gate-level of the area overhead incurred by the electrical sub-modules of the proposed system. It shows the practicability of the approach, experimental validated on a FPGA prototype. Compared with previously reported optical networks, it achieves an area reduction of up to 80% with almost identical performance.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128611782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-27DOI: 10.1109/ReCoSoC.2016.7533902
G. Tsamis, S. Kavvadias, A. Papagrigoriou, M. Grammatikakis, Kyprianos Papademetriou
We design a bandwidth regulation module, by adapting and extending the algorithm of MemGuard Linux kernel module for hardware implementation. Our extensions differentiate among NoC sources with rate-constrained and best-effort traffic provisions, support a violation free-guaranteed operating mode for rate-constrained flows, and support dynamic adaptivity through EWMA prediction. Our strategies enhance support for mixed criticality applications on MPSoCs. C++-based statistical simulation shows improvements over hardware adaptation of the original MemGuard algorithm without our extensions. Using SystemC, we further evaluate MemGuard at the memory controller of a NoC-based SoC model using an MPEG4 traffic model and compare its hardware cost using synthesis from Xilinx Vivado HLS and Vivado, with ARM AMBA AXI4 and a 4×4 STNoC instance.
{"title":"Efficient bandwidth regulation at memory controller for mixed criticality applications","authors":"G. Tsamis, S. Kavvadias, A. Papagrigoriou, M. Grammatikakis, Kyprianos Papademetriou","doi":"10.1109/ReCoSoC.2016.7533902","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533902","url":null,"abstract":"We design a bandwidth regulation module, by adapting and extending the algorithm of MemGuard Linux kernel module for hardware implementation. Our extensions differentiate among NoC sources with rate-constrained and best-effort traffic provisions, support a violation free-guaranteed operating mode for rate-constrained flows, and support dynamic adaptivity through EWMA prediction. Our strategies enhance support for mixed criticality applications on MPSoCs. C++-based statistical simulation shows improvements over hardware adaptation of the original MemGuard algorithm without our extensions. Using SystemC, we further evaluate MemGuard at the memory controller of a NoC-based SoC model using an MPEG4 traffic model and compare its hardware cost using synthesis from Xilinx Vivado HLS and Vivado, with ARM AMBA AXI4 and a 4×4 STNoC instance.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121054125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-27DOI: 10.1109/ReCoSoC.2016.7533899
T. Xu, Hongxiang Gu, M. Potkonjak
We have proposed a new security platform: physical unclonable function (PUF) matching using programmable delay lines (PDL). Our platform inherits good security properties of standard PUFs, such as low energy, low delay, and unclonability. However, standard PUF-based security protocols induce high computational resources of at least one involved party. To resolve this issue, we take advantage of PDL technology to match standard PUFs in such a way that two PUFs have the same challenge response mapping function. The matched pair of PUFs enables a majority of protocols to be executed in an ultra low energy, low latency manner for all the involved parties.
{"title":"An ultra-low energy PUF matching security platform using programmable delay lines","authors":"T. Xu, Hongxiang Gu, M. Potkonjak","doi":"10.1109/ReCoSoC.2016.7533899","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533899","url":null,"abstract":"We have proposed a new security platform: physical unclonable function (PUF) matching using programmable delay lines (PDL). Our platform inherits good security properties of standard PUFs, such as low energy, low delay, and unclonability. However, standard PUF-based security protocols induce high computational resources of at least one involved party. To resolve this issue, we take advantage of PDL technology to match standard PUFs in such a way that two PUFs have the same challenge response mapping function. The matched pair of PUFs enables a majority of protocols to be executed in an ultra low energy, low latency manner for all the involved parties.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116201409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-27DOI: 10.1109/ReCoSoC.2016.7533910
T. Wiersema, M. Platzner
Runtime reconfiguration can be used to replace hardware modules in the field and even to continuously improve them during operation. Runtime reconfiguration poses new challenges for validation, since the required properties of newly arriving modules may be difficult to check fast enough to sustain the intended system dynamics. In this paper we present a method for just-in-time verification of the worst-case completion time of a reconfigurable hardware module. We assume so-called run-to-completion modules that exhibit start and done signals indicating the start and end of execution, respectively. We present a formal verification approach that exploits the concept of proof-carrying hardware. The approach tasks the creator of a hardware module with constructing a proof of the worst-case completion time, which can then easily be checked by the user of the module, just prior to reconfiguration. After explaining the verification approach and a corresponding tool flow, we present results from two case studies, a short term synthesis filter and a multihead weigher. The results clearly show that cost of verifying the completion time of the module is paid by the creator instead of the user of the module.
{"title":"Verifying worst-case completion times for reconfigurable hardware modules using proof-carrying hardware","authors":"T. Wiersema, M. Platzner","doi":"10.1109/ReCoSoC.2016.7533910","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533910","url":null,"abstract":"Runtime reconfiguration can be used to replace hardware modules in the field and even to continuously improve them during operation. Runtime reconfiguration poses new challenges for validation, since the required properties of newly arriving modules may be difficult to check fast enough to sustain the intended system dynamics. In this paper we present a method for just-in-time verification of the worst-case completion time of a reconfigurable hardware module. We assume so-called run-to-completion modules that exhibit start and done signals indicating the start and end of execution, respectively. We present a formal verification approach that exploits the concept of proof-carrying hardware. The approach tasks the creator of a hardware module with constructing a proof of the worst-case completion time, which can then easily be checked by the user of the module, just prior to reconfiguration. After explaining the verification approach and a corresponding tool flow, we present results from two case studies, a short term synthesis filter and a multihead weigher. The results clearly show that cost of verifying the completion time of the module is paid by the creator instead of the user of the module.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134435524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-27DOI: 10.1109/ReCoSoC.2016.7533893
Manuel Selva, A. Gamatie, D. Novo, G. Sassatelli
Network on Chip (NoC) communication infrastructures are increasingly being used in modern manycore architectures. Many industrial and research NoC simulators have been proposed in the last years in order to facilitate the design of such communication infrastructures. As any simulator, all of them have to trade off speed and accuracy. Simulation time directly depends on the simulation accuracy. It also directly depends on the complexity of the system to be simulated, e.g., the number of cores and their unit complexity. In this work, we show that the memory footprint of NoC simulators can be a serious factor limiting the simulation of manycore architectures with a large number of cores. We first quantitatively compare the memory footprint of a transactional level modeling NoC simulator and its cycle-accurate counterpart to show that memory footprint is a concern. Then, we show that memory footprint is also largely impacted by the choice of the programming abstraction by comparing two cycle-accurate simulators written using different application programming interfaces, i.e., plain C++ and SystemC.
{"title":"Speed and accuracy dilemma in NoC simulation: What about memory impact?","authors":"Manuel Selva, A. Gamatie, D. Novo, G. Sassatelli","doi":"10.1109/ReCoSoC.2016.7533893","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533893","url":null,"abstract":"Network on Chip (NoC) communication infrastructures are increasingly being used in modern manycore architectures. Many industrial and research NoC simulators have been proposed in the last years in order to facilitate the design of such communication infrastructures. As any simulator, all of them have to trade off speed and accuracy. Simulation time directly depends on the simulation accuracy. It also directly depends on the complexity of the system to be simulated, e.g., the number of cores and their unit complexity. In this work, we show that the memory footprint of NoC simulators can be a serious factor limiting the simulation of manycore architectures with a large number of cores. We first quantitatively compare the memory footprint of a transactional level modeling NoC simulator and its cycle-accurate counterpart to show that memory footprint is a concern. Then, we show that memory footprint is also largely impacted by the choice of the programming abstraction by comparing two cycle-accurate simulators written using different application programming interfaces, i.e., plain C++ and SystemC.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120905543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-27DOI: 10.1109/ReCoSoC.2016.7533892
M. Grammatikakis, Kyprianos Papademetriou, P. Petrakis, M. Coppola, Michael Soulie
New generations of NoC-based platforms incorporate address interleaving, which enables balancing transactions between the memory nodes. The memory space is distributed in different nodes of the NoC, and accessed alternately by each on-chip initiator. A memory node is accessed depending on the transaction request address through a memory map. Interleaving can allow for efficient use of NoC bandwidth and congestion reduction, and we study whether its gains scale over system size. In this work we concentrate on an instance of a customizable point-to-point interconnect from STMicroelectronics called STNoC. We first evaluate a setup with 4 CPU initiators and 4 memories, and show that interleaving relieves the NoC from congestion and permits higher packet injection rates. We also show that this depends on the number of packets sent per transaction by an initiator prior to changing destination memory node; this is called interleaving step. We then enriched the setup with several DMA engines, which is in accordance with industry roadmap. We experimented with MPSoCs having up to 32-nodes and for various link-widths of the STNoC. When link-width was 32 Bytes, the aggregate throughput gain from address interleaving was 20.8%, but when we set it 8 Bytes the throughput gain reached 69.64%. This implies silicon savings in SoCs, as it is not always necessary to configure NoCs with wide link-widths.
{"title":"Address interleaving for low-cost NoCs","authors":"M. Grammatikakis, Kyprianos Papademetriou, P. Petrakis, M. Coppola, Michael Soulie","doi":"10.1109/ReCoSoC.2016.7533892","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533892","url":null,"abstract":"New generations of NoC-based platforms incorporate address interleaving, which enables balancing transactions between the memory nodes. The memory space is distributed in different nodes of the NoC, and accessed alternately by each on-chip initiator. A memory node is accessed depending on the transaction request address through a memory map. Interleaving can allow for efficient use of NoC bandwidth and congestion reduction, and we study whether its gains scale over system size. In this work we concentrate on an instance of a customizable point-to-point interconnect from STMicroelectronics called STNoC. We first evaluate a setup with 4 CPU initiators and 4 memories, and show that interleaving relieves the NoC from congestion and permits higher packet injection rates. We also show that this depends on the number of packets sent per transaction by an initiator prior to changing destination memory node; this is called interleaving step. We then enriched the setup with several DMA engines, which is in accordance with industry roadmap. We experimented with MPSoCs having up to 32-nodes and for various link-widths of the STNoC. When link-width was 32 Bytes, the aggregate throughput gain from address interleaving was 20.8%, but when we set it 8 Bytes the throughput gain reached 69.64%. This implies silicon savings in SoCs, as it is not always necessary to configure NoCs with wide link-widths.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124100157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-27DOI: 10.1109/ReCoSoC.2016.7533895
Luca Pezzarossa, Martin Schoeberl, J. Sparsø
In general-purpose computing multi-core platforms, hardware accelerators and reconfiguration are means to improve performance; i.e., the average-case execution time of a software application. In hard real-time systems, such average-case speed-up is not in itself relevant - it is the worst-case execution-time of tasks of an application that determines the systems ability to respond in time. To support this focus, the platform must provide service guarantees for both communication and computation resources. In addition, many hard real-time applications have multiple modes of operation, and each mode has specific requirements. An interesting perspective on reconfigurable computing is to exploit run-time reconfiguration to support mode changes. In this paper we explore approaches to reconfiguration of communication and computation resources in the T-CREST hard real-time multi-core platform. The reconfiguration of communication resources is supported by extending the message-passing network-on-chip with capabilities for setting up, tearing down, and modifying the bandwidth of virtual circuits. The reconfiguration of computation resources, such as hardware accelerators, is performed using the dynamic partial reconfiguration capabilities found in modern FPGAs.
{"title":"Reconfiguration in FPGA-based multi-core platforms for hard real-time applications","authors":"Luca Pezzarossa, Martin Schoeberl, J. Sparsø","doi":"10.1109/ReCoSoC.2016.7533895","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533895","url":null,"abstract":"In general-purpose computing multi-core platforms, hardware accelerators and reconfiguration are means to improve performance; i.e., the average-case execution time of a software application. In hard real-time systems, such average-case speed-up is not in itself relevant - it is the worst-case execution-time of tasks of an application that determines the systems ability to respond in time. To support this focus, the platform must provide service guarantees for both communication and computation resources. In addition, many hard real-time applications have multiple modes of operation, and each mode has specific requirements. An interesting perspective on reconfigurable computing is to exploit run-time reconfiguration to support mode changes. In this paper we explore approaches to reconfiguration of communication and computation resources in the T-CREST hard real-time multi-core platform. The reconfiguration of communication resources is supported by extending the message-passing network-on-chip with capabilities for setting up, tearing down, and modifying the bandwidth of virtual circuits. The reconfiguration of computation resources, such as hardware accelerators, is performed using the dynamic partial reconfiguration capabilities found in modern FPGAs.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133627177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-27DOI: 10.1109/ReCoSoC.2016.7533906
E. Kashefi, H. Zarandi, A. Gordon-Ross
This paper presents an improved method to postpone wearout failures and improves functional unit and entire system lifetime by considering two important wearout factors: temperature and functional unit usage. Our method provides a more fine grained approach as compared to prior methods by considering individual functional unit usage. Using this information, system behavior can be predicted and appropriate thread scheduling and migration decisions can be made. Our method incorporates temperature predictions based on recent historical temperatures and functional unit usages to rank threads and cores in a chip multiprocessor (CMP). Using these rankings, our method migrates threads among cores to reduce thermal hotspots. Simulation results on the ESESC simulator show that our method can improve the average system temperature and lifetime by approximately 4.33°C and 21.65%,respectively,in a tri-core CMP, and 6.4°C and 32% in a quad-core CMP.
{"title":"Postponing wearout failures in chip multiprocessors using thermal management and thread migration","authors":"E. Kashefi, H. Zarandi, A. Gordon-Ross","doi":"10.1109/ReCoSoC.2016.7533906","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533906","url":null,"abstract":"This paper presents an improved method to postpone wearout failures and improves functional unit and entire system lifetime by considering two important wearout factors: temperature and functional unit usage. Our method provides a more fine grained approach as compared to prior methods by considering individual functional unit usage. Using this information, system behavior can be predicted and appropriate thread scheduling and migration decisions can be made. Our method incorporates temperature predictions based on recent historical temperatures and functional unit usages to rank threads and cores in a chip multiprocessor (CMP). Using these rankings, our method migrates threads among cores to reduce thermal hotspots. Simulation results on the ESESC simulator show that our method can improve the average system temperature and lifetime by approximately 4.33°C and 21.65%,respectively,in a tri-core CMP, and 6.4°C and 32% in a quad-core CMP.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123798746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}