Pub Date : 2015-04-27DOI: 10.1109/VTS.2015.7116292
Mehdi Sadi, L. Winemberg, M. Tehranipoor
Because of process variations, the post-silicon critical or near-critical paths differ from those identified in the pre-silicon stage. Thus, it has become necessary to extract timing slack information from circuit paths in the post-silicon phase. In this paper, we present a robust digital sensor IP for in-situ timing slack monitoring on actual circuit paths from SoCs. The timing slack data is converted into a digital format and stored in a dedicated scan register chain for easy extraction at any point in time during test and functional modes. A novel layout-aware and netlist-level sensor insertion flow is proposed. The sensor IP has been designed with 32/28nm standard cell library and its performance is demonstrated in the physical design of several benchmark circuits.
{"title":"A robust digital sensor IP and sensor insertion flow for in-situ path timing slack monitoring in SoCs","authors":"Mehdi Sadi, L. Winemberg, M. Tehranipoor","doi":"10.1109/VTS.2015.7116292","DOIUrl":"https://doi.org/10.1109/VTS.2015.7116292","url":null,"abstract":"Because of process variations, the post-silicon critical or near-critical paths differ from those identified in the pre-silicon stage. Thus, it has become necessary to extract timing slack information from circuit paths in the post-silicon phase. In this paper, we present a robust digital sensor IP for in-situ timing slack monitoring on actual circuit paths from SoCs. The timing slack data is converted into a digital format and stored in a dedicated scan register chain for easy extraction at any point in time during test and functional modes. A novel layout-aware and netlist-level sensor insertion flow is proposed. The sensor IP has been designed with 32/28nm standard cell library and its performance is demonstrated in the physical design of several benchmark circuits.","PeriodicalId":187545,"journal":{"name":"2015 IEEE 33rd VLSI Test Symposium (VTS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125787045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-27DOI: 10.1109/VTS.2015.7116263
M. Ricchetti
Debug and diagnosis using the IEEE 1149.1 TAP has been a useful tool for engineers for some twenty years. The TAP however is limited as it only can provide a single full duplex data stream of 50 to 100mb/s. IEEE 1500 provides higher bandwidth via parallel access to multiple scan-channels however providing physical access to hundreds of pins has become more challenging. This parallel access has little benefit in debug when the SoC is in the system. This presentation focuses on the solution proposed by IEEE P1149.10 which uses a packet protocol over SERDES to access on-chip DFT (instruments) like the TAP but with multi-gigabit SERDES. With the standardization of the IEEE 1149.1-2013 PDL language (Procedural Description Language) which abstracts the TAP, PDL can be used with a higher speed interface as proposed by P1149.10. Use cases of how the proposed standard are shown with benefits for silicon debug.
{"title":"Innovative practices session 3C: Advances in silicon debug & diagnosis","authors":"M. Ricchetti","doi":"10.1109/VTS.2015.7116263","DOIUrl":"https://doi.org/10.1109/VTS.2015.7116263","url":null,"abstract":"Debug and diagnosis using the IEEE 1149.1 TAP has been a useful tool for engineers for some twenty years. The TAP however is limited as it only can provide a single full duplex data stream of 50 to 100mb/s. IEEE 1500 provides higher bandwidth via parallel access to multiple scan-channels however providing physical access to hundreds of pins has become more challenging. This parallel access has little benefit in debug when the SoC is in the system. This presentation focuses on the solution proposed by IEEE P1149.10 which uses a packet protocol over SERDES to access on-chip DFT (instruments) like the TAP but with multi-gigabit SERDES. With the standardization of the IEEE 1149.1-2013 PDL language (Procedural Description Language) which abstracts the TAP, PDL can be used with a higher speed interface as proposed by P1149.10. Use cases of how the proposed standard are shown with benefits for silicon debug.","PeriodicalId":187545,"journal":{"name":"2015 IEEE 33rd VLSI Test Symposium (VTS)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127004829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-27DOI: 10.1109/VTS.2015.7116255
Subidh Ali, O. Sinanoglu
We present a new class of scan attack on hardware implementation of ciphers. The existing scan attacks on ciphers exploit the Design for Testability (DfT) infrastructure of the implementation, where an attacker applies cipher inputs in the functional mode and then by switching to the test mode retrieves the secret key in the form of test responses. These attacks can be thwarted by applying a reset operation when there is a switch of mode. However, the mode-reset countermeasure can be thwarted by using only the test mode of a secure chip. In this work we show how a Test-Mode-Only (TMO) attack can overcome the constraints imposed by a mode-reset countermeasure and demonstrate TMO attacks on private key as well as public key ciphers.
{"title":"TMO: A new class of attack on cipher misusing test infrastructure","authors":"Subidh Ali, O. Sinanoglu","doi":"10.1109/VTS.2015.7116255","DOIUrl":"https://doi.org/10.1109/VTS.2015.7116255","url":null,"abstract":"We present a new class of scan attack on hardware implementation of ciphers. The existing scan attacks on ciphers exploit the Design for Testability (DfT) infrastructure of the implementation, where an attacker applies cipher inputs in the functional mode and then by switching to the test mode retrieves the secret key in the form of test responses. These attacks can be thwarted by applying a reset operation when there is a switch of mode. However, the mode-reset countermeasure can be thwarted by using only the test mode of a secure chip. In this work we show how a Test-Mode-Only (TMO) attack can overcome the constraints imposed by a mode-reset countermeasure and demonstrate TMO attacks on private key as well as public key ciphers.","PeriodicalId":187545,"journal":{"name":"2015 IEEE 33rd VLSI Test Symposium (VTS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132706145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-27DOI: 10.1109/VTS.2015.7116283
S. Ozev, L. Milor
BIST for analog and RF circuits has been proposed many years ago and we are still chasing it. One school of thought is to have generic BIST components for input stimulus generation and output analysis and to use them in a plug-and-play fashion. Another school of thought is to develop dedicated circuits for each functionality and re-use the same blocks for the same functionality. A third approach is designing completely circuit-specific BIST for each primary circuit. The truth is ad-hoc examples of BIST have been around for years. However, there is no standardized way of implementing or inserting BIST for analog and RF circuits. The panelists, all experts in this domain, will share their view of the best way of implementing BIST for analog and RF circuits, if there is such a thing…
{"title":"Panel: Analog/RF BIST: Are we there yet?","authors":"S. Ozev, L. Milor","doi":"10.1109/VTS.2015.7116283","DOIUrl":"https://doi.org/10.1109/VTS.2015.7116283","url":null,"abstract":"BIST for analog and RF circuits has been proposed many years ago and we are still chasing it. One school of thought is to have generic BIST components for input stimulus generation and output analysis and to use them in a plug-and-play fashion. Another school of thought is to develop dedicated circuits for each functionality and re-use the same blocks for the same functionality. A third approach is designing completely circuit-specific BIST for each primary circuit. The truth is ad-hoc examples of BIST have been around for years. However, there is no standardized way of implementing or inserting BIST for analog and RF circuits. The panelists, all experts in this domain, will share their view of the best way of implementing BIST for analog and RF circuits, if there is such a thing…","PeriodicalId":187545,"journal":{"name":"2015 IEEE 33rd VLSI Test Symposium (VTS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130809399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-27DOI: 10.1109/VTS.2015.7116253
Da Cheng, S. Gupta
Hardware redundancy, such as spare processors and cores, has been added to chip multi-processors (CMPs) to improve yield while sustaining all functionalities of CMPs. During post-silicon testing, spares processors and cores are used for repair. Even after repair, some CMPs may have processors with insufficient number of cores; in such CMPs some processors are disabled and such chips are sold at lower prices to improve yield per area. Despite binning on the number of processors, substantial functional resources are wasted in disabled components. In this work, we propose a new utility function and a new repair algorithm which enable utilization of every working core on a CMP. We demonstrate the benefits of the proposed approach for benchmarks from ISPASS and Nvidia CUDA SDK using GPGPU-sim to compute the instructions per cycle (IPC). Results show that our design and repair approaches provide above 50% IPC per wafer area even with 10x the current defect density.
硬件冗余,如备用处理器和核心,已经添加到芯片多处理器(cmp)中,以提高产量,同时保持cmp的所有功能。在硅后测试期间,备用处理器和核心用于修复。即使在修复之后,一些cmp的处理器可能内核数量不足;在这种cmp中,一些处理器被禁用,这样的芯片以较低的价格出售,以提高单位面积的产量。尽管减少了处理器的数量,但是大量的功能资源被浪费在被禁用的组件上。在这项工作中,我们提出了一种新的效用函数和一种新的修复算法,使CMP上的每个工作核心都能被利用。我们展示了使用GPGPU-sim卡计算每周期指令(IPC)的ISPASS和Nvidia CUDA SDK基准测试方法的好处。结果表明,我们的设计和修复方法即使在当前缺陷密度为10倍的情况下,每个晶圆面积也能提供50%以上的IPC。
{"title":"PPB: Partially-working processors binning for maximizing wafer utilization","authors":"Da Cheng, S. Gupta","doi":"10.1109/VTS.2015.7116253","DOIUrl":"https://doi.org/10.1109/VTS.2015.7116253","url":null,"abstract":"Hardware redundancy, such as spare processors and cores, has been added to chip multi-processors (CMPs) to improve yield while sustaining all functionalities of CMPs. During post-silicon testing, spares processors and cores are used for repair. Even after repair, some CMPs may have processors with insufficient number of cores; in such CMPs some processors are disabled and such chips are sold at lower prices to improve yield per area. Despite binning on the number of processors, substantial functional resources are wasted in disabled components. In this work, we propose a new utility function and a new repair algorithm which enable utilization of every working core on a CMP. We demonstrate the benefits of the proposed approach for benchmarks from ISPASS and Nvidia CUDA SDK using GPGPU-sim to compute the instructions per cycle (IPC). Results show that our design and repair approaches provide above 50% IPC per wafer area even with 10x the current defect density.","PeriodicalId":187545,"journal":{"name":"2015 IEEE 33rd VLSI Test Symposium (VTS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130861639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-27DOI: 10.1109/VTS.2015.7116256
Jennifer Dworak, A. Crouch
Today's chips often contain a wealth of embedded instruments, including sensors, hardware monitors, built-in self-test (BIST) engines, etc. They may process sensitive data that requires encryption or obfuscation and may contain encryption keys and ChipIDs. Unfortunately, unauthorized access to internal registers or instruments through test and debug circuitry can turn design for testability (DFT) logic into a backdoor for data theft, reverse engineering, counterfeiting, and denial-of-service attacks. A compromised chip also poses a security threat to any board or system that includes that chip, and boards have their own security issues. We will provide an overview of some chip and board security concerns as they relate to DFT hardware and will briefly review several ways in which the new IEEE 1687 standard can be made more secure. We will then discuss the need for an IEEE Security Standard that can provide solutions and metrics for providing appropriate security matched to the needs of a real world environment.
{"title":"A call to action: Securing IEEE 1687 and the need for an IEEE test Security Standard","authors":"Jennifer Dworak, A. Crouch","doi":"10.1109/VTS.2015.7116256","DOIUrl":"https://doi.org/10.1109/VTS.2015.7116256","url":null,"abstract":"Today's chips often contain a wealth of embedded instruments, including sensors, hardware monitors, built-in self-test (BIST) engines, etc. They may process sensitive data that requires encryption or obfuscation and may contain encryption keys and ChipIDs. Unfortunately, unauthorized access to internal registers or instruments through test and debug circuitry can turn design for testability (DFT) logic into a backdoor for data theft, reverse engineering, counterfeiting, and denial-of-service attacks. A compromised chip also poses a security threat to any board or system that includes that chip, and boards have their own security issues. We will provide an overview of some chip and board security concerns as they relate to DFT hardware and will briefly review several ways in which the new IEEE 1687 standard can be made more secure. We will then discuss the need for an IEEE Security Standard that can provide solutions and metrics for providing appropriate security matched to the needs of a real world environment.","PeriodicalId":187545,"journal":{"name":"2015 IEEE 33rd VLSI Test Symposium (VTS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115107651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-27DOI: 10.1109/VTS.2015.7116250
Navankur Beohar, Priyanka Bakliwal, Sidhanto Roy, Debashis Mandal, P. Adell, B. Vermeire, B. Bakkaloglu, S. Ozev
Complex electronic systems include multiple power domains and drastically varying dynamic power consumption patterns, requiring the use of multiple power conversion and regulation units. High frequency switching converters have been gaining prominence in the DC-DC converter market due to their high efficiency. Unfortunately, they are also subject to higher process variations jeopardizing stable operation of the power supply. This paper presents a technique to track changes in the dynamic loop characteristics of the DC-DC converters without disturbing the normal mode of operation using a white noise based excitation and correlation. White noise excitation is generated via pseudo random disturbance at reference and PWM input of the converter with the test signal energy being spread over a wide bandwidth, below the converter noise and ripple floor. Test signal analysis is achieved by correlating the pseudo random input sequence with the output response and thereby accumulating the desired behavior over time and pulling it above the noise floor of the measurement set-up. An off-the-shelf power converter, LM27402 is used as the DUT for the experimental verification. Experimental results show that the proposed technique can estimate converter's natural frequency and Q-factor within ±2.5% and ±0.7% error margin respectively, over changes in load inductance and capacitance.
{"title":"Disturbance-free BIST for loop characterization of DC-DC buck converters","authors":"Navankur Beohar, Priyanka Bakliwal, Sidhanto Roy, Debashis Mandal, P. Adell, B. Vermeire, B. Bakkaloglu, S. Ozev","doi":"10.1109/VTS.2015.7116250","DOIUrl":"https://doi.org/10.1109/VTS.2015.7116250","url":null,"abstract":"Complex electronic systems include multiple power domains and drastically varying dynamic power consumption patterns, requiring the use of multiple power conversion and regulation units. High frequency switching converters have been gaining prominence in the DC-DC converter market due to their high efficiency. Unfortunately, they are also subject to higher process variations jeopardizing stable operation of the power supply. This paper presents a technique to track changes in the dynamic loop characteristics of the DC-DC converters without disturbing the normal mode of operation using a white noise based excitation and correlation. White noise excitation is generated via pseudo random disturbance at reference and PWM input of the converter with the test signal energy being spread over a wide bandwidth, below the converter noise and ripple floor. Test signal analysis is achieved by correlating the pseudo random input sequence with the output response and thereby accumulating the desired behavior over time and pulling it above the noise floor of the measurement set-up. An off-the-shelf power converter, LM27402 is used as the DUT for the experimental verification. Experimental results show that the proposed technique can estimate converter's natural frequency and Q-factor within ±2.5% and ±0.7% error margin respectively, over changes in load inductance and capacitance.","PeriodicalId":187545,"journal":{"name":"2015 IEEE 33rd VLSI Test Symposium (VTS)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126331290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-27DOI: 10.1109/VTS.2015.7116281
R. Aitken, E. Cannon, M. Pant, M. Tahoori
Improvements in chip manufacturing technology, driven by high degree of integration due to small device sizes and additional complex functionalities enabled by heterogeneous integration, have propelled an astonishing growth of computing systems. While the pervasiveness of these systems enables emerging application domains, however, this trend is facing serious challenges, both at device and system levels. As the minimum feature size continues to shrink, a host of vulnerabilities influence the robustness, reliability, and resiliency of embedded and critical systems. Some of these factors are caused by the stochastic nature of the nanoscale manufacturing process, while other factors appear because of high frequencies and nanoscale features. This paper overviews the vision by some of the key industrial players regarding the emerging resiliency challenges faced at the extreme nanoscale technologies.
{"title":"Resiliency challenges in sub-10nm technologies","authors":"R. Aitken, E. Cannon, M. Pant, M. Tahoori","doi":"10.1109/VTS.2015.7116281","DOIUrl":"https://doi.org/10.1109/VTS.2015.7116281","url":null,"abstract":"Improvements in chip manufacturing technology, driven by high degree of integration due to small device sizes and additional complex functionalities enabled by heterogeneous integration, have propelled an astonishing growth of computing systems. While the pervasiveness of these systems enables emerging application domains, however, this trend is facing serious challenges, both at device and system levels. As the minimum feature size continues to shrink, a host of vulnerabilities influence the robustness, reliability, and resiliency of embedded and critical systems. Some of these factors are caused by the stochastic nature of the nanoscale manufacturing process, while other factors appear because of high frequencies and nanoscale features. This paper overviews the vision by some of the key industrial players regarding the emerging resiliency challenges faced at the extreme nanoscale technologies.","PeriodicalId":187545,"journal":{"name":"2015 IEEE 33rd VLSI Test Symposium (VTS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125997482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-27DOI: 10.1109/VTS.2015.7116301
Chao Han, A. Singh
Recent test studies on volume production data suggest that a significant number of CMOS open defects remain undetected by commonly applied TDF timing tests, potentially leading to high defectivity in the shipped parts. This has focused attention on developing tests that explicitly target open faults, in particular transistor stuck open faults (TSOFs). However, while TSOFs cover all open faults in circuits implemented from primitive logic gates, they do not model a type of open fault found only in complex CMOS gates. We refer to these as cross wire open (CWO) faults. In this paper, we develop the first tests that target CWOs. Although we observe that the fault list of potential CWOs can be significantly reduced if the layouts of complex gate cells used in the design are available, we present test generation methodologies both with and without this layout information. CWO fault coverage results for scan based tests are presented for ISCAS89 and ITC99 benchmark circuits that have been resynthesized using an open source cell library containing complex gates.
{"title":"Testing cross wire opens within complex gates","authors":"Chao Han, A. Singh","doi":"10.1109/VTS.2015.7116301","DOIUrl":"https://doi.org/10.1109/VTS.2015.7116301","url":null,"abstract":"Recent test studies on volume production data suggest that a significant number of CMOS open defects remain undetected by commonly applied TDF timing tests, potentially leading to high defectivity in the shipped parts. This has focused attention on developing tests that explicitly target open faults, in particular transistor stuck open faults (TSOFs). However, while TSOFs cover all open faults in circuits implemented from primitive logic gates, they do not model a type of open fault found only in complex CMOS gates. We refer to these as cross wire open (CWO) faults. In this paper, we develop the first tests that target CWOs. Although we observe that the fault list of potential CWOs can be significantly reduced if the layouts of complex gate cells used in the design are available, we present test generation methodologies both with and without this layout information. CWO fault coverage results for scan based tests are presented for ISCAS89 and ITC99 benchmark circuits that have been resynthesized using an open source cell library containing complex gates.","PeriodicalId":187545,"journal":{"name":"2015 IEEE 33rd VLSI Test Symposium (VTS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131036237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-27DOI: 10.1109/VTS.2015.7116295
Nathan Debardeleben, S. Blanchard, D. Kaeli, P. Rech
Reliability is an issue for today's large scale computing systems designers, producers, and users. As we approach exascale, the resilience challenge will become critical due to increase in system-scale. It is then fundamental to understand the nature of errors, evaluate their probability of occurrence, and improve the design to reduce their impact on the overall system. In the paper we will present experimental, field, and analytical data to characterize and quantify errors on accelerators, providing a thorough understanding of errors impact on today and future large-scale systems.
{"title":"Field, experimental, and analytical data on large-scale HPC systems and evaluation of the implications for exascale system design","authors":"Nathan Debardeleben, S. Blanchard, D. Kaeli, P. Rech","doi":"10.1109/VTS.2015.7116295","DOIUrl":"https://doi.org/10.1109/VTS.2015.7116295","url":null,"abstract":"Reliability is an issue for today's large scale computing systems designers, producers, and users. As we approach exascale, the resilience challenge will become critical due to increase in system-scale. It is then fundamental to understand the nature of errors, evaluate their probability of occurrence, and improve the design to reduce their impact on the overall system. In the paper we will present experimental, field, and analytical data to characterize and quantify errors on accelerators, providing a thorough understanding of errors impact on today and future large-scale systems.","PeriodicalId":187545,"journal":{"name":"2015 IEEE 33rd VLSI Test Symposium (VTS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116461713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}