The physical dimensions of standard cells constrain the dimensions of power networks, affecting the on-chip power noise. An exploratory modeling methodology is presented for estimating power noise in advanced technology nodes. The models are evaluated for 14, 10, and 7 nm technologies to assess the impact on performance. Scaled technologies are shown to be more sensitive to power noise, resulting in potential loss of performance enhancements achieved by scaling. Stripes between local track rails is evaluated as a means to reduce power noise, exhibiting up to 56.5% improvement in power noise for the 7 nm technology node. A strong dependence on the width of a stripe is observed, indicating that fewer wide stripes are more favorable then many thin stripes. As a promising alternative material for power network interconnects, graphene is shown to exhibit good potential in reducing power noise. The effects of different scaling scenarios of local power rails on power noise are also discussed.
{"title":"Exploratory power noise models of standard cell 14, 10, and 7 nm FinFET ICs","authors":"Ravi Patel, Kan Xu, E. Friedman, P. Raghavan","doi":"10.1145/2902961.2903035","DOIUrl":"https://doi.org/10.1145/2902961.2903035","url":null,"abstract":"The physical dimensions of standard cells constrain the dimensions of power networks, affecting the on-chip power noise. An exploratory modeling methodology is presented for estimating power noise in advanced technology nodes. The models are evaluated for 14, 10, and 7 nm technologies to assess the impact on performance. Scaled technologies are shown to be more sensitive to power noise, resulting in potential loss of performance enhancements achieved by scaling. Stripes between local track rails is evaluated as a means to reduce power noise, exhibiting up to 56.5% improvement in power noise for the 7 nm technology node. A strong dependence on the width of a stripe is observed, indicating that fewer wide stripes are more favorable then many thin stripes. As a promising alternative material for power network interconnects, graphene is shown to exhibit good potential in reducing power noise. The effects of different scaling scenarios of local power rails on power noise are also discussed.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126350450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The internet of things (IoT) is quickly emerging as the next major domain for embedded computer systems. Although the term IoT could be defined in a variety of different ways, IoT always encompasses typically ordinary devices (e.g., thermostats and kitchen appliances) augmented with computational power that allows regular communication via the internet. Given the simplicity of typical IoT devices, their on-board computer systems must also be simple in the sense that they be small and consume minimal power. However, the IoT itself presents new privacy and security concerns that must be considered when designing IoT devices. In order to provide robust security with minimal area and power overhead, it is the premise of this paper that IoT security be implemented using nanoelectronic security primitives and nano-enabled security protocols. Such nanoscale security primitives are expected to utilize a very small amount of area and consume a negligible amount of power, all while providing the required levels of security. This paper presents some examples of nanoelectronic security primitives and discusses how such circuits and systems can be of use for inclusion in emerging IoT devices.
{"title":"Security meets nanoelectronics for Internet of things applications","authors":"G. Rose","doi":"10.1145/2902961.2903045","DOIUrl":"https://doi.org/10.1145/2902961.2903045","url":null,"abstract":"The internet of things (IoT) is quickly emerging as the next major domain for embedded computer systems. Although the term IoT could be defined in a variety of different ways, IoT always encompasses typically ordinary devices (e.g., thermostats and kitchen appliances) augmented with computational power that allows regular communication via the internet. Given the simplicity of typical IoT devices, their on-board computer systems must also be simple in the sense that they be small and consume minimal power. However, the IoT itself presents new privacy and security concerns that must be considered when designing IoT devices. In order to provide robust security with minimal area and power overhead, it is the premise of this paper that IoT security be implemented using nanoelectronic security primitives and nano-enabled security protocols. Such nanoscale security primitives are expected to utilize a very small amount of area and consume a negligible amount of power, all while providing the required levels of security. This paper presents some examples of nanoelectronic security primitives and discusses how such circuits and systems can be of use for inclusion in emerging IoT devices.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126585468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large-scale MPSoCs requires a scalable and dynamic real-time (RT) task scheduler, able to handle non-deterministic computational behaviors. Current proposals for MPSoCs have limitations, as lack of scalability, complex static steps, validation with abstract models, or are not flexible to enable changes at runtime of the RT constraints. This work proposes a hierarchical task scheduler with monitoring features. The scheduler is dynamic, supporting changes in RT constraints at runtime. An API enables these features allowing to the application developer to reconfigure the tasks' period, deadline, and execution time by annotating the task code. At runtime, according to the task execution, the scheduler handles the API calls and adjust itself to ensure RT guarantees according to the new constraints. Scalability is ensured by dividing the scheduler into two hierarchical levels: LS (Local Schedulers), and CS (Cluster Schedulers). The LS runs at the processor level, using the LST (Least Slack-Time) algorithm. The CS runs at the cluster level, i.e., a group of processors controlled by a manager processor. The CS receives messages from the LSs, informing the processor slack-time, deadline violations, and RT changes. The CS implements an RT adaptation heuristic, triggering task migrations according to RT reconfiguration or deadline misses. Results show a negligible overhead in the applications' execution time and the fulfillment of the applications' RT constraints even with a high degree of resources sharing, in both processors and NoC.
{"title":"Dynamic real-time scheduler for large-scale MPSoCs","authors":"Marcelo Ruaro, F. Moraes","doi":"10.1145/2902961.2903027","DOIUrl":"https://doi.org/10.1145/2902961.2903027","url":null,"abstract":"Large-scale MPSoCs requires a scalable and dynamic real-time (RT) task scheduler, able to handle non-deterministic computational behaviors. Current proposals for MPSoCs have limitations, as lack of scalability, complex static steps, validation with abstract models, or are not flexible to enable changes at runtime of the RT constraints. This work proposes a hierarchical task scheduler with monitoring features. The scheduler is dynamic, supporting changes in RT constraints at runtime. An API enables these features allowing to the application developer to reconfigure the tasks' period, deadline, and execution time by annotating the task code. At runtime, according to the task execution, the scheduler handles the API calls and adjust itself to ensure RT guarantees according to the new constraints. Scalability is ensured by dividing the scheduler into two hierarchical levels: LS (Local Schedulers), and CS (Cluster Schedulers). The LS runs at the processor level, using the LST (Least Slack-Time) algorithm. The CS runs at the cluster level, i.e., a group of processors controlled by a manager processor. The CS receives messages from the LSs, informing the processor slack-time, deadline violations, and RT changes. The CS implements an RT adaptation heuristic, triggering task migrations according to RT reconfiguration or deadline misses. Results show a negligible overhead in the applications' execution time and the fulfillment of the applications' RT constraints even with a high degree of resources sharing, in both processors and NoC.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115942787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
New hardware security threats are identified in emerging three-dimensional (3D) integrated circuits (ICs) and potential counter-measures are introduced. Trigger and payload mechanisms for future 3D hardware Trojans are predicted. Furthermore, a novel, network-on-chip based 3D obfuscation method is proposed to block the direct communication between two commercial dies in a 3D structure, thus thwarting reverse engineering attacks on the vertical dimension. Simulation results demonstrate that the proposed method effectively obfuscates the cross-plane communication by increasing the reverse engineering time by approximately 5× as compared to using direct through silicon via (TSV) connections. The proposed method consumes approximately one fifth the area and power of a typical network-on-chip designed in a 65 nm technology, exhibiting limited overhead.
{"title":"Hardware security threats and potential countermeasures in emerging 3D ICs","authors":"Jaya Dofe, Qiaoyan Yu, Hailang Wang, E. Salman","doi":"10.1145/2902961.2903014","DOIUrl":"https://doi.org/10.1145/2902961.2903014","url":null,"abstract":"New hardware security threats are identified in emerging three-dimensional (3D) integrated circuits (ICs) and potential counter-measures are introduced. Trigger and payload mechanisms for future 3D hardware Trojans are predicted. Furthermore, a novel, network-on-chip based 3D obfuscation method is proposed to block the direct communication between two commercial dies in a 3D structure, thus thwarting reverse engineering attacks on the vertical dimension. Simulation results demonstrate that the proposed method effectively obfuscates the cross-plane communication by increasing the reverse engineering time by approximately 5× as compared to using direct through silicon via (TSV) connections. The proposed method consumes approximately one fifth the area and power of a typical network-on-chip designed in a 65 nm technology, exhibiting limited overhead.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116623218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Circuit obfuscation techniques have been proposed to conceal circuit's functionality in order to thwart reverse engineering (RE) attacks to integrated circuits (IC). We believe that a good obfuscation method should have low design complexity and low performance overhead, yet, causing high RE attack complexity. However, existing obfuscation techniques do not meet all these requirements. In this paper, we propose a polynomial obfuscation scheme which leverages special designed multiplexers (MUXs) to replace judiciously selected logic gates. Candidate to-be-obfuscated logic gates are selected based on a novel gate classification method which utilizes IC topological structure information. We show that this scheme is resilient to all the known attacks, hence it is secure. Experiments are conducted on ISCAS 85/89 and MCNC benchmark suites to evaluate the performance overhead due to obfuscation.
{"title":"Secure and low-overhead circuit obfuscation technique with multiplexers","authors":"Xueyan Wang, Xiaotao Jia, Qiang Zhou, Yici Cai, Jianlei Yang, Mingze Gao, G. Qu","doi":"10.1145/2902961.2903000","DOIUrl":"https://doi.org/10.1145/2902961.2903000","url":null,"abstract":"Circuit obfuscation techniques have been proposed to conceal circuit's functionality in order to thwart reverse engineering (RE) attacks to integrated circuits (IC). We believe that a good obfuscation method should have low design complexity and low performance overhead, yet, causing high RE attack complexity. However, existing obfuscation techniques do not meet all these requirements. In this paper, we propose a polynomial obfuscation scheme which leverages special designed multiplexers (MUXs) to replace judiciously selected logic gates. Candidate to-be-obfuscated logic gates are selected based on a novel gate classification method which utilizes IC topological structure information. We show that this scheme is resilient to all the known attacks, hence it is secure. Experiments are conducted on ISCAS 85/89 and MCNC benchmark suites to evaluate the performance overhead due to obfuscation.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131762941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A technique for sampling clock skew correction by adjusting the delay in the input signal to each channel in a time-interleaved (TI) ADC is proposed. A proof-of-concept TI ADC employing this technique was implemented in a 65 nm CMOS process. The four-way TI ADC operates at an effective sampling rate of 150 MS/s, and achieves 60.2 dB and 58.2 dB SNDR for an input signal frequency of 2.1 MHz and 74.1 MHz, respectively. The ADC consumes 12.4 mW from a 1.2 V supply and occupies an area of 0.9 mm2.
{"title":"A sampling clock skew correction technique for time-interleaved SAR ADCs","authors":"D. Prashanth, Hae-Seung Lee","doi":"10.1145/2902961.2903008","DOIUrl":"https://doi.org/10.1145/2902961.2903008","url":null,"abstract":"A technique for sampling clock skew correction by adjusting the delay in the input signal to each channel in a time-interleaved (TI) ADC is proposed. A proof-of-concept TI ADC employing this technique was implemented in a 65 nm CMOS process. The four-way TI ADC operates at an effective sampling rate of 150 MS/s, and achieves 60.2 dB and 58.2 dB SNDR for an input signal frequency of 2.1 MHz and 74.1 MHz, respectively. The ADC consumes 12.4 mW from a 1.2 V supply and occupies an area of 0.9 mm2.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"33 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131992150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimitrios Stamoulis, S. Corbetta, D. Rodopoulos, P. Weckx, P. Debacker, B. Meyer, B. Kaczer, P. Raghavan, D. Soudris, F. Catthoor, Z. Zilic
Atomistic-based approaches accurately model Bias Temperature Instability phenomena, but they suffer from prolonged execution times, preventing their seamless integration in system-level analysis flows. In this paper we present a comprehensive flow that combines the accuracy of Capture Emission Time (CET) maps with the efficiency of the Compact Digital Waveform (CDW) representation. That way, we capture the true workload-dependent BTI-induced degradation of selected CPU components. First, we show that existing works that assume constant stress patterns fail to account for workload dependency leading to fundamental estimation errors. Second, we evaluate the impact of different real workloads on selected CPU sub-blocks from a commercial processor design. To the best of our knowledge, this is the first work that combines atomistic property and true workload-dependency for variability analysis.
{"title":"Capturing true workload dependency of BTI-induced degradation in CPU components","authors":"Dimitrios Stamoulis, S. Corbetta, D. Rodopoulos, P. Weckx, P. Debacker, B. Meyer, B. Kaczer, P. Raghavan, D. Soudris, F. Catthoor, Z. Zilic","doi":"10.1145/2902961.2902992","DOIUrl":"https://doi.org/10.1145/2902961.2902992","url":null,"abstract":"Atomistic-based approaches accurately model Bias Temperature Instability phenomena, but they suffer from prolonged execution times, preventing their seamless integration in system-level analysis flows. In this paper we present a comprehensive flow that combines the accuracy of Capture Emission Time (CET) maps with the efficiency of the Compact Digital Waveform (CDW) representation. That way, we capture the true workload-dependent BTI-induced degradation of selected CPU components. First, we show that existing works that assume constant stress patterns fail to account for workload dependency leading to fundamental estimation errors. Second, we evaluate the impact of different real workloads on selected CPU sub-blocks from a commercial processor design. To the best of our knowledge, this is the first work that combines atomistic property and true workload-dependency for variability analysis.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116560646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To fully exploit the massive parallelism of many cores, this work tackles the problem of mapping large-scale applications onto heterogeneous on-chip networks (NoCs) to minimize the peak workload for energy hotspot avoidance. A task-resource co-optimization framework is proposed which configures the on-chip communication infrastructure and maps the applications simultaneously and coherently, aiming to minimize the peak load under the constraints of computation power and communication capacity and a total cost budget of on-chip resources. The problem is first formulated into a linear programming model to search for optimal solution. A heuristic algorithm is further developed for fast design space exploration in extremely large-scale many-core NoCs. Extensive simulations are carried out under real-world benchmarks and randomly generated task graphs to demonstrate the effectiveness and efficiency of the proposed schemes.
{"title":"Task-resource co-allocation for hotspot minimization in heterogeneous many-core NoCs","authors":"Md Farhadur Reza, Dan Zhao, Hongyi Wu","doi":"10.1145/2902961.2903003","DOIUrl":"https://doi.org/10.1145/2902961.2903003","url":null,"abstract":"To fully exploit the massive parallelism of many cores, this work tackles the problem of mapping large-scale applications onto heterogeneous on-chip networks (NoCs) to minimize the peak workload for energy hotspot avoidance. A task-resource co-optimization framework is proposed which configures the on-chip communication infrastructure and maps the applications simultaneously and coherently, aiming to minimize the peak load under the constraints of computation power and communication capacity and a total cost budget of on-chip resources. The problem is first formulated into a linear programming model to search for optimal solution. A heuristic algorithm is further developed for fast design space exploration in extremely large-scale many-core NoCs. Extensive simulations are carried out under real-world benchmarks and randomly generated task graphs to demonstrate the effectiveness and efficiency of the proposed schemes.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114271766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the design of a non-volatile register file using cells made of a SRAM and a Programmable Metallization Cell (PMC). The proposed cell is a symmetric 8T2P (8-transistors, 2PMC) design; it utilizes three control lines to ensure the correctness in its operations (i.e. Write, Read, Store and Restore). Simulation results using HSPICE are provided for the cell as well as the register file array (both one- and two-dimensional schemes). At cell level, it is shown that the off-state resistance has a limited effect on the Read time, because in the proposed circuit the transistor connecting the PMCs to the SRAM is off. While having no significant effect on the Store time, the time of the Restore operation depends on the value of the off-state resistance, i.e. an increase in off-state PMC resistance causes an increase in Restore time. Comparison between non-volatile register files utilizing either PMCs, or Phase Change Memories (PCMs) is provided.The register file using PMCs has a faster Store and Read times than the PCM-based counterpart; this is mostly caused by the difference in resistance values for these two non-volatile technologies. The lower delay involved in these operations confirms that the proposed PMC-based register file offers significant advantages in terms of delay performance.
{"title":"A design of a non-volatile PMC-based (programmable metallization cell) register file","authors":"Salin Junsangsri, Jie Han, F. Lombardi","doi":"10.1145/2902961.2903034","DOIUrl":"https://doi.org/10.1145/2902961.2903034","url":null,"abstract":"This paper presents the design of a non-volatile register file using cells made of a SRAM and a Programmable Metallization Cell (PMC). The proposed cell is a symmetric 8T2P (8-transistors, 2PMC) design; it utilizes three control lines to ensure the correctness in its operations (i.e. Write, Read, Store and Restore). Simulation results using HSPICE are provided for the cell as well as the register file array (both one- and two-dimensional schemes). At cell level, it is shown that the off-state resistance has a limited effect on the Read time, because in the proposed circuit the transistor connecting the PMCs to the SRAM is off. While having no significant effect on the Store time, the time of the Restore operation depends on the value of the off-state resistance, i.e. an increase in off-state PMC resistance causes an increase in Restore time. Comparison between non-volatile register files utilizing either PMCs, or Phase Change Memories (PCMs) is provided.The register file using PMCs has a faster Store and Read times than the PCM-based counterpart; this is mostly caused by the difference in resistance values for these two non-volatile technologies. The lower delay involved in these operations confirms that the proposed PMC-based register file offers significant advantages in terms of delay performance.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114686259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Negative bias temperature instability (NBTI) has emerged as a critical challenge to lifetime reliability of computing systems. Traditionally, temperature-aware methodologies are used to mitigate the impact of NBTI on aging and degradation of computing systems. However, in the presence of process variation, which is the norm in manycore processors, temperature-aware techniques are inefficient in improving lifetime reliability and can result in poor performance. In this paper, we propose a novel performance constraint-aware task mapping technique to improve lifetime reliability by mitigating NBTI considering on-chip process variation. Our approach consists of two phases, namely design-time and run-time. During design time, we generate Pareto-optimal mappings. Following which, our run-time technique judiciously intervenes to perform workload migration to save the weakest processing core. We compare our approach with performance-greedy and thermal-aware task mapping techniques. The experiment results demonstrate that our approach outperforms other two techniques and improves lifetime reliability of a manycore system as much as 54% without violating the throughput constraint.
{"title":"Performance constraint-aware task mapping to optimize lifetime reliability of manycore systems","authors":"Vijeta Rathore, Vivek Chaturvedi, T. Srikanthan","doi":"10.1145/2902961.2902996","DOIUrl":"https://doi.org/10.1145/2902961.2902996","url":null,"abstract":"Negative bias temperature instability (NBTI) has emerged as a critical challenge to lifetime reliability of computing systems. Traditionally, temperature-aware methodologies are used to mitigate the impact of NBTI on aging and degradation of computing systems. However, in the presence of process variation, which is the norm in manycore processors, temperature-aware techniques are inefficient in improving lifetime reliability and can result in poor performance. In this paper, we propose a novel performance constraint-aware task mapping technique to improve lifetime reliability by mitigating NBTI considering on-chip process variation. Our approach consists of two phases, namely design-time and run-time. During design time, we generate Pareto-optimal mappings. Following which, our run-time technique judiciously intervenes to perform workload migration to save the weakest processing core. We compare our approach with performance-greedy and thermal-aware task mapping techniques. The experiment results demonstrate that our approach outperforms other two techniques and improves lifetime reliability of a manycore system as much as 54% without violating the throughput constraint.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121359523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}