Pub Date : 2013-04-29DOI: 10.1109/ASPDAC.2013.6509649
Li Yu, O. Mysore, Lan Wei, L. Daniel, D. Antoniadis, I. Elfadel, D. Boning
In this paper, we present the first validation of the virtual source (VS) charge-based compact model for standard cell libraries and large-scale digital circuits. With only a modest number of physically meaningful parameters, the VS model accounts for the main short-channel effects in nanometer technologies. Using a novel DC and transient parameter extraction methodology, the model is verified with simulated data from a well-characterized, industrial 40-nm bulk silicon model. The VS model is used to fully characterize a standard cell library with timing comparisons showing less than 2.7% error with respect to the industrial design kit. Furthermore, a 1001-stage inverter chain and a 32-bit ripple-carry adder are employed as test cases in a vendor CAD environment to validate the use of the VS model for large-scale digital circuit applications. Parametric Vdd sweeps show that the VS model is also ready for usage in low-power design methodologies. Finally, runtime comparisons have shown that the use of the VS model results in a speedup of about 7.6×.
{"title":"An ultra-compact virtual source FET model for deeply-scaled devices: Parameter extraction and validation for standard cell libraries and digital circuits","authors":"Li Yu, O. Mysore, Lan Wei, L. Daniel, D. Antoniadis, I. Elfadel, D. Boning","doi":"10.1109/ASPDAC.2013.6509649","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509649","url":null,"abstract":"In this paper, we present the first validation of the virtual source (VS) charge-based compact model for standard cell libraries and large-scale digital circuits. With only a modest number of physically meaningful parameters, the VS model accounts for the main short-channel effects in nanometer technologies. Using a novel DC and transient parameter extraction methodology, the model is verified with simulated data from a well-characterized, industrial 40-nm bulk silicon model. The VS model is used to fully characterize a standard cell library with timing comparisons showing less than 2.7% error with respect to the industrial design kit. Furthermore, a 1001-stage inverter chain and a 32-bit ripple-carry adder are employed as test cases in a vendor CAD environment to validate the use of the VS model for large-scale digital circuit applications. Parametric Vdd sweeps show that the VS model is also ready for usage in low-power design methodologies. Finally, runtime comparisons have shown that the use of the VS model results in a speedup of about 7.6×.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133258997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-29DOI: 10.1109/ASPDAC.2013.6509608
Qing'an Li, Jianhua Li, Liang Shi, C. Xue, Yiran Chen, Yanxiang He
Spin-Transfer Torque RAM (STT-RAM) has been proposed to build on-chip caches because of its attractive features: high storage density and negligible leakage power. Recently, researchers propose to improve the write performance of STT-RAM by relaxing its non-volatility property. To avoid data loss resulting from volatility, refresh schemes are proposed. However, refresh operations consume additional energy. In this paper, we propose to reduce the number of refresh operations through re-arranging program data layout at compilation time. An N-refresh scheme is also proposed. Experimental results show that, on average, the proposedmethods can reduce the number of refresh operations by 73.3%, and reduce the dynamic energy consumption by 27.6%.
{"title":"Compiler-assisted refresh minimization for volatile STT-RAM cache","authors":"Qing'an Li, Jianhua Li, Liang Shi, C. Xue, Yiran Chen, Yanxiang He","doi":"10.1109/ASPDAC.2013.6509608","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509608","url":null,"abstract":"Spin-Transfer Torque RAM (STT-RAM) has been proposed to build on-chip caches because of its attractive features: high storage density and negligible leakage power. Recently, researchers propose to improve the write performance of STT-RAM by relaxing its non-volatility property. To avoid data loss resulting from volatility, refresh schemes are proposed. However, refresh operations consume additional energy. In this paper, we propose to reduce the number of refresh operations through re-arranging program data layout at compilation time. An N-refresh scheme is also proposed. Experimental results show that, on average, the proposedmethods can reduce the number of refresh operations by 73.3%, and reduce the dynamic energy consumption by 27.6%.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"27 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113973221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-29DOI: 10.1109/ASPDAC.2013.6509692
Xuexin Liu, A. A. Palma-Rodriguez, S. Rodriguez-Chavez, S. Tan, E. Tlelo-Cuautle, Yici Cai
Yield estimation for analog integrated circuits are crucial for analog circuit design and optimization in the presence of process variations. In this paper, we present a novel analog yield estimation method based on performance bound analysis technique in frequency domain. The new method first derives the transfer functions of linear (or linearized) analog circuits via a graph-based symbolic analysis method. Then frequency response bounds of the transfer functions in terms of magnitude and phase are obtained by a nonlinear constrained optimization technique. To predict yield rate, bound information are employed to calculate Gaussian distribution functions. Experimental results show that the new method can achieve similar accuracy while delivers 20 times speedup over Monte Carlo simulation of HSPICE on some typical analog circuits.
{"title":"Performance bound and yield analysis for analog circuits under process variations","authors":"Xuexin Liu, A. A. Palma-Rodriguez, S. Rodriguez-Chavez, S. Tan, E. Tlelo-Cuautle, Yici Cai","doi":"10.1109/ASPDAC.2013.6509692","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509692","url":null,"abstract":"Yield estimation for analog integrated circuits are crucial for analog circuit design and optimization in the presence of process variations. In this paper, we present a novel analog yield estimation method based on performance bound analysis technique in frequency domain. The new method first derives the transfer functions of linear (or linearized) analog circuits via a graph-based symbolic analysis method. Then frequency response bounds of the transfer functions in terms of magnitude and phase are obtained by a nonlinear constrained optimization technique. To predict yield rate, bound information are employed to calculate Gaussian distribution functions. Experimental results show that the new method can achieve similar accuracy while delivers 20 times speedup over Monte Carlo simulation of HSPICE on some typical analog circuits.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"254 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114406731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-29DOI: 10.1109/ASPDAC.2013.6509555
M. Ebrahimi, M. Daneshtalab, J. Plosila, Farhad Mehdipour
The communication requirements of many-core embedded systems are convened by the emerging Network-on-Chip (NoC) paradigm. As on-chip communication reliability is a crucial factor in many-core systems, the NoC paradigm should address the reliability issues. Using fault-tolerant routing algorithms to reroute packets around faulty regions will increase the packet latency and create congestion around the faulty region. On the other hand, the performance of NoC is highly affected by the network congestion. Congestion in the network can increase the delay of packets to route from a source to a destination, so it should be avoided. In this paper, a minimal and defect-resilient (MD) routing algorithm is proposed in order to route packets adaptively through the shortest paths in the presence of a faulty link, as long as a path exists. To avoid congestion, output channels can be adaptively chosen whenever the distance from the current to destination node is greater than one hop along both directions. In addition, an analytical model is presented to evaluate MD for two-faulty cases.
{"title":"MD: Minimal path-based fault-tolerant routing in on-Chip Networks","authors":"M. Ebrahimi, M. Daneshtalab, J. Plosila, Farhad Mehdipour","doi":"10.1109/ASPDAC.2013.6509555","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509555","url":null,"abstract":"The communication requirements of many-core embedded systems are convened by the emerging Network-on-Chip (NoC) paradigm. As on-chip communication reliability is a crucial factor in many-core systems, the NoC paradigm should address the reliability issues. Using fault-tolerant routing algorithms to reroute packets around faulty regions will increase the packet latency and create congestion around the faulty region. On the other hand, the performance of NoC is highly affected by the network congestion. Congestion in the network can increase the delay of packets to route from a source to a destination, so it should be avoided. In this paper, a minimal and defect-resilient (MD) routing algorithm is proposed in order to route packets adaptively through the shortest paths in the presence of a faulty link, as long as a path exists. To avoid congestion, output channels can be adaptively chosen whenever the distance from the current to destination node is greater than one hop along both directions. In addition, an analytical model is presented to evaluate MD for two-faulty cases.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131857396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-29DOI: 10.1109/ASPDAC.2013.6509639
Yun Liang, Zheng Cui, K. Rupnow, Deming Chen
GPUs are an increasingly popular implementation platform for a variety of general purpose applications from mobile and embedded devices to high performance computing. The CUDA and OpenCL parallel programming models enable easy utilization of the GPU's resources. However, tuning GPU applications' performance is a complex and labor intensive task. Software programmers employ a variety of optimization techniques to explore tradeoffs between the thread parallelism and performance of a single thread. However, prior techniques ignore register allocation, a significant factor in single thread performance and, indirectly affects the number of simultaneously active threads. In this paper, we show that joint optimization of register allocation and thread structure has great potential to significantly improve performance. However, the design space for this joint optimization can be large; therefore, we develop performance metrics appropriate for evaluation within a compiler's inner loop and efficient design space exploration techniques that use the metrics to narrow the search space. Across a range of GPU applications, we achieve average performance speedup of 1.33X (up to 1.73X) with design space exploration 355X faster than the exhaustive search.
{"title":"Register and thread structure optimization for GPUs","authors":"Yun Liang, Zheng Cui, K. Rupnow, Deming Chen","doi":"10.1109/ASPDAC.2013.6509639","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509639","url":null,"abstract":"GPUs are an increasingly popular implementation platform for a variety of general purpose applications from mobile and embedded devices to high performance computing. The CUDA and OpenCL parallel programming models enable easy utilization of the GPU's resources. However, tuning GPU applications' performance is a complex and labor intensive task. Software programmers employ a variety of optimization techniques to explore tradeoffs between the thread parallelism and performance of a single thread. However, prior techniques ignore register allocation, a significant factor in single thread performance and, indirectly affects the number of simultaneously active threads. In this paper, we show that joint optimization of register allocation and thread structure has great potential to significantly improve performance. However, the design space for this joint optimization can be large; therefore, we develop performance metrics appropriate for evaluation within a compiler's inner loop and efficient design space exploration techniques that use the metrics to narrow the search space. Across a range of GPU applications, we achieve average performance speedup of 1.33X (up to 1.73X) with design space exploration 355X faster than the exhaustive search.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134128336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-29DOI: 10.1109/ASPDAC.2013.6509684
Bernard Schmidt, Carlos Villarraga, J. Bormann, D. Stoffel, Markus Wedler, W. Kunz
This paper describes a method to generate a computational model for formal verification of hardware-dependent software in embedded systems. The computational model of the combined HW/SW system is a program netlist (PN) consisting of instruction cells connected in a directed acyclic graph that compactly represents all execution paths of the software. The model can be easily integrated into SAT-based verification environments such as those based on Bounded Model Checking (BMC). The proposed construction of the model, however, allows for an efficient reasoning of the SAT solver over entire execution paths. We demonstrate the efficiency of our approach by presenting experimental results from the formal verification of an industrial LIN (Local Interconnect Network) bus node, implemented as a software driver on a 32-bit RISC machine.
{"title":"A computational model for SAT-based verification of hardware-dependent low-level embedded system software","authors":"Bernard Schmidt, Carlos Villarraga, J. Bormann, D. Stoffel, Markus Wedler, W. Kunz","doi":"10.1109/ASPDAC.2013.6509684","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509684","url":null,"abstract":"This paper describes a method to generate a computational model for formal verification of hardware-dependent software in embedded systems. The computational model of the combined HW/SW system is a program netlist (PN) consisting of instruction cells connected in a directed acyclic graph that compactly represents all execution paths of the software. The model can be easily integrated into SAT-based verification environments such as those based on Bounded Model Checking (BMC). The proposed construction of the model, however, allows for an efficient reasoning of the SAT solver over entire execution paths. We demonstrate the efficiency of our approach by presenting experimental results from the formal verification of an industrial LIN (Local Interconnect Network) bus node, implemented as a software driver on a 32-bit RISC machine.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130524197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-29DOI: 10.1109/ASPDAC.2013.6509643
H. Qian, Hao Liang, Chip-Hong Chang, Wei Zhang, Hao Yu
This paper presents a fast and accurate steady state thermal simulator for heatsink and microfluid-cooled 3D-ICs. This model considers the thermal effect of TSVs at fine-granularity by calculating the anisotropic equivalent thermal conductances of a solid grid cell if TSVs are inserted. Entrance effect of microchannels is also investigated for accurate modeling of microfluidic cooling. The proposed thermal simulator is verified against commercial multiphysics solver COMSOL and compared with Hotspot and 3D-ICE. Simulation results shows that for heatsink cooling, the proposed simulator is as accurate as Hotspot but runs much faster at moderate granularity. For microfluidic cooling, our proposed simulator is much more accurate than 3D-ICE in its estimation of steady state temperature and thermal distribution.
{"title":"Thermal simulator of 3D-IC with modeling of anisotropic TSV conductance and microchannel entrance effects","authors":"H. Qian, Hao Liang, Chip-Hong Chang, Wei Zhang, Hao Yu","doi":"10.1109/ASPDAC.2013.6509643","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509643","url":null,"abstract":"This paper presents a fast and accurate steady state thermal simulator for heatsink and microfluid-cooled 3D-ICs. This model considers the thermal effect of TSVs at fine-granularity by calculating the anisotropic equivalent thermal conductances of a solid grid cell if TSVs are inserted. Entrance effect of microchannels is also investigated for accurate modeling of microfluidic cooling. The proposed thermal simulator is verified against commercial multiphysics solver COMSOL and compared with Hotspot and 3D-ICE. Simulation results shows that for heatsink cooling, the proposed simulator is as accurate as Hotspot but runs much faster at moderate granularity. For microfluidic cooling, our proposed simulator is much more accurate than 3D-ICE in its estimation of steady state temperature and thermal distribution.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"54 60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124705261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-29DOI: 10.1109/ASPDAC.2013.6509655
Koji Inoue
This paper introduces a manycore research project called SMYLE (Scalable ManYcore for Low Energy computing). The aims of this project are: 1) proposing a manycore SoC architecture and developing a suitable programming and execution environment, 2) designing a domain specific manycore system for emerging video mining applications, and 3) releasing developed software tools and FPGA emulation environments to accelerate manycore research and development in the community. The project started in December 2010 with full support from the New Energy and Industrial Technology Development Organization (NEDO).
本文介绍了一个名为SMYLE (Scalable many - core for Low Energy computing)的多核研究项目。该项目的目标是:1)提出一个多核SoC架构并开发一个合适的编程和执行环境;2)为新兴的视频挖掘应用设计一个特定领域的多核系统;3)发布开发的软件工具和FPGA仿真环境,以加速社区的多核研究和开发。该项目于2010年12月在新能源和工业技术发展组织(NEDO)的全力支持下启动。
{"title":"SMYLE Project: Toward high-performance, low-power computing on manycore-processor SoCs","authors":"Koji Inoue","doi":"10.1109/ASPDAC.2013.6509655","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509655","url":null,"abstract":"This paper introduces a manycore research project called SMYLE (Scalable ManYcore for Low Energy computing). The aims of this project are: 1) proposing a manycore SoC architecture and developing a suitable programming and execution environment, 2) designing a domain specific manycore system for emerging video mining applications, and 3) releasing developed software tools and FPGA emulation environments to accelerate manycore research and development in the community. The project started in December 2010 with full support from the New Energy and Industrial Technology Development Organization (NEDO).","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124307275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-29DOI: 10.1109/ASPDAC.2013.6509566
Chao Sun, Hiroki Fujii, K. Miyaji, K. Johguchi, K. Higuchi, K. Takeuchi
A 3D through-silicon-via (TSV)-integrated hybrid ReRAM/multi-level-cell (MLC) NAND solid-state drive's (SSD's) architecture is proposed with NAND-like interface (I/F) and sector-access overwrite policy for ReRAM. Furthermore, intelligent data management algorithms are proposed to suppress data fragmentation and excess usage of MLC NAND. As a result, 11-times performance increase, 6.9-times endurance enhancement and 93% write energy reduction are achieved. Both ReRAM write and read latency should be less than 3 μs to obtain these improvements. The required endurance for ReRAM is 105.
{"title":"Over 10-times high-speed, energy efficient 3D TSV-integrated hybrid ReRAM/MLC NAND SSD by intelligent data fragmentation suppression","authors":"Chao Sun, Hiroki Fujii, K. Miyaji, K. Johguchi, K. Higuchi, K. Takeuchi","doi":"10.1109/ASPDAC.2013.6509566","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509566","url":null,"abstract":"A 3D through-silicon-via (TSV)-integrated hybrid ReRAM/multi-level-cell (MLC) NAND solid-state drive's (SSD's) architecture is proposed with NAND-like interface (I/F) and sector-access overwrite policy for ReRAM. Furthermore, intelligent data management algorithms are proposed to suppress data fragmentation and excess usage of MLC NAND. As a result, 11-times performance increase, 6.9-times endurance enhancement and 93% write energy reduction are achieved. Both ReRAM write and read latency should be less than 3 μs to obtain these improvements. The required endurance for ReRAM is 105.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124481043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-29DOI: 10.1109/ASPDAC.2013.6509667
Jude Angelo Ambrose, H. Pettenghi, L. Sousa
Security in embedded systems is of critical importance since most of our secure transactions are currently made via credit cards or mobile phones. Power analysis based side channel attacks have been proved as the most successful attacks on embedded systems to retrieve secret keys, allowing impersonation and theft. State-of-the-art solutions for such attacks in Elliptic Curve Cryptography (ECC), mostly in software, hinder performance and repeatedly attacked using improved techniques. To protect the ECC from both simple power analysis and differential power analysis, as a hardware solution, we propose to take advantage of the inherent parallelization capability in Multi-modulo Residue Number Systems (RNS) architectures to obfuscate the secure information. Random selection of moduli is proposed to randomly choose the moduli sets for each key bit operation. This solution allows us to prevent power analysis, while still providing all the benefits of RNS. In this paper, we show that Differential Power Analysis is thwarted, as well as correlation analysis.
{"title":"DARNS:A randomized multi-modulo RNS architecture for double-and-add in ECC to prevent power analysis side channel attacks","authors":"Jude Angelo Ambrose, H. Pettenghi, L. Sousa","doi":"10.1109/ASPDAC.2013.6509667","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509667","url":null,"abstract":"Security in embedded systems is of critical importance since most of our secure transactions are currently made via credit cards or mobile phones. Power analysis based side channel attacks have been proved as the most successful attacks on embedded systems to retrieve secret keys, allowing impersonation and theft. State-of-the-art solutions for such attacks in Elliptic Curve Cryptography (ECC), mostly in software, hinder performance and repeatedly attacked using improved techniques. To protect the ECC from both simple power analysis and differential power analysis, as a hardware solution, we propose to take advantage of the inherent parallelization capability in Multi-modulo Residue Number Systems (RNS) architectures to obfuscate the secure information. Random selection of moduli is proposed to randomly choose the moduli sets for each key bit operation. This solution allows us to prevent power analysis, while still providing all the benefits of RNS. In this paper, we show that Differential Power Analysis is thwarted, as well as correlation analysis.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127486906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}