首页 > 最新文献

2012 IEEE 30th International Conference on Computer Design (ICCD)最新文献

英文 中文
Enhancing 3T DRAMs for SRAM replacement under 10nm tri-gate SOI FinFETs 在10nm三栅极SOI finfet下增强3T dram取代SRAM
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378657
Z. Jaksic, R. Canal
In this paper, we present the dynamic 3T memory cell for future 10nm tri-gate FinFETs as a potential replacement for classical 6T SRAM cell for implementation in high speed cache memories. We investigate read access time, retention time, and static power consumption of the cell when it is exposed to the effects of process and environmental variations. Process variations are extracted from the ITRS predictions and they are modeled at device level. For simulation, we use 10nm SOI tri-gate FinFET BSIM-CMG model card developed by the University of Glasgow, Device Modeling Group. When compared to the classical 6T SRAM, 3T cell has 40% smaller area, leakage is reduced up to 14 times while access time is approximately the same. In order to achieve higher retention times, we propose several cell extensions which, at the same time, enable post-fabrication/run-time adaptability.
在本文中,我们提出了用于未来10nm三门finfet的动态3T存储单元,作为高速缓存存储器中经典6T SRAM单元的潜在替代品。我们调查读取访问时间,保持时间,和电池的静态功耗时,它暴露在过程和环境变化的影响。从ITRS预测中提取工艺变化,并在设备级对其进行建模。为了进行仿真,我们使用由格拉斯哥大学器件建模组开发的10nm SOI三栅极FinFET BSIM-CMG模型卡。与传统的6T SRAM相比,3T单元的面积减少了40%,漏损减少了14倍,而访问时间大致相同。为了获得更高的保留时间,我们提出了几个单元扩展,同时实现制造后/运行时的适应性。
{"title":"Enhancing 3T DRAMs for SRAM replacement under 10nm tri-gate SOI FinFETs","authors":"Z. Jaksic, R. Canal","doi":"10.1109/ICCD.2012.6378657","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378657","url":null,"abstract":"In this paper, we present the dynamic 3T memory cell for future 10nm tri-gate FinFETs as a potential replacement for classical 6T SRAM cell for implementation in high speed cache memories. We investigate read access time, retention time, and static power consumption of the cell when it is exposed to the effects of process and environmental variations. Process variations are extracted from the ITRS predictions and they are modeled at device level. For simulation, we use 10nm SOI tri-gate FinFET BSIM-CMG model card developed by the University of Glasgow, Device Modeling Group. When compared to the classical 6T SRAM, 3T cell has 40% smaller area, leakage is reduced up to 14 times while access time is approximately the same. In order to achieve higher retention times, we propose several cell extensions which, at the same time, enable post-fabrication/run-time adaptability.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126789034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Adaptive Backpressure: Efficient buffer management for on-chip networks 自适应背压:片上网络的有效缓冲管理
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378673
Daniel U. Becker, Nan Jiang, George Michelogiannakis, W. Dally
This paper introduces Adaptive Backpressure, a novel scheme that improves the utilization of dynamically managed router input buffers by continuously adjusting the stiffness of the flow control feedback loop in response to observed traffic conditions. Through a simple extension to the router's flow control mechanism, the proposed scheme heuristically limits the number of credits available to individual virtual channels based on estimated downstream congestion, aiming to minimize the amount of buffer space that is occupied unproductively. This leads to more efficient distribution of buffer space and improves isolation between multiple concurrently executing workloads with differing performance characteristics. Experimental results for a 64-node mesh network show that Adaptive Backpressure improves network stability, leading to an average 2.6× increase in throughput under heavy load across traffic patterns. In the presence of background traffic, the proposed scheme reduces zero-load latency by an average of 31%. Finally, it mitigates the performance degradation encountered when latency- and throughput-optimized execution cores contend for network resources in a heterogeneous chip multi-processor; across a set of PARSEC benchmarks, we observe an average reduction in execution time of 34%.
本文介绍了自适应背压,这是一种新的方案,通过响应观察到的交通状况不断调整流量控制反馈回路的刚度来提高动态管理路由器输入缓冲区的利用率。通过对路由器流量控制机制的简单扩展,该方案基于估计的下游拥塞,启发式地限制单个虚拟通道可用的信用额度,旨在最大限度地减少非生产性占用的缓冲空间。这样可以更有效地分配缓冲区空间,并提高具有不同性能特征的多个并发执行工作负载之间的隔离。对64节点网状网络的实验结果表明,自适应背压提高了网络的稳定性,使跨流量模式下的高负载吞吐量平均提高了2.6倍。在存在后台流量的情况下,该方案将零负载延迟平均降低31%。最后,它减轻了在异构芯片多处理器中,当延迟和吞吐量优化的执行内核争夺网络资源时所遇到的性能下降;通过一组PARSEC基准测试,我们观察到执行时间平均减少了34%。
{"title":"Adaptive Backpressure: Efficient buffer management for on-chip networks","authors":"Daniel U. Becker, Nan Jiang, George Michelogiannakis, W. Dally","doi":"10.1109/ICCD.2012.6378673","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378673","url":null,"abstract":"This paper introduces Adaptive Backpressure, a novel scheme that improves the utilization of dynamically managed router input buffers by continuously adjusting the stiffness of the flow control feedback loop in response to observed traffic conditions. Through a simple extension to the router's flow control mechanism, the proposed scheme heuristically limits the number of credits available to individual virtual channels based on estimated downstream congestion, aiming to minimize the amount of buffer space that is occupied unproductively. This leads to more efficient distribution of buffer space and improves isolation between multiple concurrently executing workloads with differing performance characteristics. Experimental results for a 64-node mesh network show that Adaptive Backpressure improves network stability, leading to an average 2.6× increase in throughput under heavy load across traffic patterns. In the presence of background traffic, the proposed scheme reduces zero-load latency by an average of 31%. Finally, it mitigates the performance degradation encountered when latency- and throughput-optimized execution cores contend for network resources in a heterogeneous chip multi-processor; across a set of PARSEC benchmarks, we observe an average reduction in execution time of 34%.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128155836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
An efficient reliability simulation flow for evaluating the hot carrier injection effect in CMOS VLSI circuits 一种评估CMOS VLSI电路热载流子注入效应的高效可靠性仿真流程
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378663
M. Kamal, Q. Xie, M. Pedram, A. Afzali-Kusha, S. Safari
Hot carrier injection (HCI) effect is one of the major reliability concerns in VLSI circuits. This paper presents a scalable reliability simulation flow, including a logic cell characterization method and an efficient full chip simulation method, to analyze the HCI-induced transistor aging with a fast run time and high accuracy. The transistor-level HCI effect is modeled based on the Reaction-Diffusion (R-D) framework. The gate-level HCI impact characterization method combines HSpice simulation and piecewise linear curve fitting. The proposed characterization method reveals that the HCI effect on some transistors is much more significant than the others according to the logic cell structure. Additionally, during the circuit simulation, pertinent transitions are identified and all cells in the circuit are classified into two groups: critical and non-critical. The proposed method reduces the simulation time while maintaining high accuracy by applying fine granularity simulation time steps to the critical cells and coarse granularity ones to the non-critical cells in the circuit.
热载流子注入(HCI)效应是VLSI电路中主要的可靠性问题之一。本文提出了一种可扩展的可靠性仿真流程,包括逻辑单元表征方法和高效的全芯片仿真方法,以分析hci引起的晶体管老化,具有快速运行时间和高精度。晶体管级的HCI效应是基于反应-扩散(R-D)框架建模的。门级HCI冲击表征方法将HSpice仿真与分段线性曲线拟合相结合。所提出的表征方法表明,根据逻辑单元的结构,HCI对某些晶体管的影响要比其他晶体管显著得多。此外,在电路模拟过程中,识别相关的转换,并将电路中的所有细胞分为两组:关键和非关键。该方法对电路中的关键单元采用细粒度模拟时间步长,对非关键单元采用粗粒度模拟时间步长,减少了仿真时间,同时保持了较高的精度。
{"title":"An efficient reliability simulation flow for evaluating the hot carrier injection effect in CMOS VLSI circuits","authors":"M. Kamal, Q. Xie, M. Pedram, A. Afzali-Kusha, S. Safari","doi":"10.1109/ICCD.2012.6378663","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378663","url":null,"abstract":"Hot carrier injection (HCI) effect is one of the major reliability concerns in VLSI circuits. This paper presents a scalable reliability simulation flow, including a logic cell characterization method and an efficient full chip simulation method, to analyze the HCI-induced transistor aging with a fast run time and high accuracy. The transistor-level HCI effect is modeled based on the Reaction-Diffusion (R-D) framework. The gate-level HCI impact characterization method combines HSpice simulation and piecewise linear curve fitting. The proposed characterization method reveals that the HCI effect on some transistors is much more significant than the others according to the logic cell structure. Additionally, during the circuit simulation, pertinent transitions are identified and all cells in the circuit are classified into two groups: critical and non-critical. The proposed method reduces the simulation time while maintaining high accuracy by applying fine granularity simulation time steps to the critical cells and coarse granularity ones to the non-critical cells in the circuit.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124267521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
WaveSync: A low-latency source synchronous bypass network-on-chip architecture WaveSync:一个低延迟源同步旁路片上网络架构
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378647
Yoon Seok Yang, Reeshav Kumar, G. Choi, Paul V. Gratz
WaveSync is a low-latency focused, network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs. WaveSync facilitates low-latency communication leveraging the source-synchronous clock sent with the data, to time components in the downstream routers data-path to reduce the number of synchronizations needed. WaveSync accomplishes this by partitioning the router components at each node into different clock-domains, each synchronized with one of the the orthogonal incoming source synchronous clocks in a GALS 2D mesh network. The data and clock subsequently propagate through each node/router, synchronously, until the destination is reached, regardless of the number of hops it may take. As long as the data travel in the path of clock propagation, and no congestion is encountered, it will be propagated without latching, as if in a long-combinatorial path, with both the clock and the data accruing delay at the same rate. The result is that the need for synchronization between the mesochronous nodes and/or the asynchronous control associated with typical GALS network is completely eliminated. The proposed WaveSync network outperforms conventional GALS networks by 87-90% in average nanosecond latency with 1.8-6.5 times more throughput across synthetic traffic patterns and SPLASH-2 benchmark suite.
WaveSync是面向全局异步本地同步(GALS)设计的低延迟芯片网络架构。WaveSync利用与数据一起发送的源同步时钟促进低延迟通信,为下游路由器数据路径中的组件计时,以减少所需的同步次数。WaveSync通过将每个节点的路由器组件划分到不同的时钟域来实现这一点,每个时钟域与GALS 2D网状网络中的一个正交输入源同步时钟同步。数据和时钟随后同步地通过每个节点/路由器传播,直到到达目的地,而不管它可能需要多少跳。只要数据在时钟传播路径中传播,并且没有遇到拥塞,它就会在没有锁存的情况下传播,就像在长组合路径中一样,时钟和数据都以相同的速率累积延迟。其结果是完全消除了在中同步节点和/或与典型GALS网络相关联的异步控制之间进行同步的需要。所提出的WaveSync网络在平均纳秒延迟方面比传统的GALS网络高出87-90%,在综合流量模式和SPLASH-2基准测试套件上的吞吐量是传统GALS网络的1.8-6.5倍。
{"title":"WaveSync: A low-latency source synchronous bypass network-on-chip architecture","authors":"Yoon Seok Yang, Reeshav Kumar, G. Choi, Paul V. Gratz","doi":"10.1109/ICCD.2012.6378647","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378647","url":null,"abstract":"WaveSync is a low-latency focused, network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs. WaveSync facilitates low-latency communication leveraging the source-synchronous clock sent with the data, to time components in the downstream routers data-path to reduce the number of synchronizations needed. WaveSync accomplishes this by partitioning the router components at each node into different clock-domains, each synchronized with one of the the orthogonal incoming source synchronous clocks in a GALS 2D mesh network. The data and clock subsequently propagate through each node/router, synchronously, until the destination is reached, regardless of the number of hops it may take. As long as the data travel in the path of clock propagation, and no congestion is encountered, it will be propagated without latching, as if in a long-combinatorial path, with both the clock and the data accruing delay at the same rate. The result is that the need for synchronization between the mesochronous nodes and/or the asynchronous control associated with typical GALS network is completely eliminated. The proposed WaveSync network outperforms conventional GALS networks by 87-90% in average nanosecond latency with 1.8-6.5 times more throughput across synthetic traffic patterns and SPLASH-2 benchmark suite.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114078796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
SECRET: Selective error correction for refresh energy reduction in DRAMs 秘密:选择错误纠正刷新能量减少在dram
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378619
Chung-Hsiang Lin, De-Yu Shen, Yi-Jung Chen, Chia-Lin Yang, Cheng-Yuan Michael Wang
DRAMs are used as the main memory in most computing systems today. Studies show that DRAMs contribute to a significant part of overall system power consumption. Therefore, one of the main challenges in low-power DRAM design is the inevitable refresh process. Due to process variation, memory cells exhibit retention time variations. Current DRAMs use a single worst-case refresh period. Prolonging refresh intervals introduces retention errors. Previous works adopt conventional ECC (Error Correcting Code) to correct retention errors. These approaches introduce significant area and energy overheads. In this paper, we propose a novel error correction framework for retention errors in DRAMs, called SECRET (Selective Error Correction for Refresh Energy reducTion). The key observation we make is that retention errors can be treated as hard errors rather than soft errors, and only few DRAM cells have large leakage. Therefore, instead of equipping error correction capability in all memory cells as existing ECC schemes, we only allocate error correction information to leaky cells under a refresh interval. Our SECRET framework contains two parts, an off-line phase to identify memory cells with retention errors given a target error rate, and a low-overhead error correction mechanism. The experimental results show that the proposed SECRET framework can reduce refresh power by 87.2%, and overall DRAM power by 18.57% with negligible area and performance overheads.
在今天的大多数计算系统中,dram被用作主存储器。研究表明,dram占整个系统功耗的很大一部分。因此,低功耗DRAM设计的主要挑战之一是不可避免的刷新过程。由于过程的变化,记忆细胞表现出保留时间的变化。当前的dram使用单个最坏情况刷新周期。延长刷新间隔会导致保留错误。以往的工作采用传统的ECC (Error Correcting Code)来纠正保留错误。这些方法带来了巨大的面积和能源开销。在本文中,我们提出了一种新的错误纠正框架,称为SECRET (Selective error correction for Refresh Energy reducTion)。我们所做的关键观察是,保留错误可以被视为硬错误而不是软错误,并且只有少数DRAM单元有大的泄漏。因此,我们没有像现有的ECC方案那样在所有存储单元中配置纠错能力,而是只在刷新间隔下为泄漏单元分配纠错信息。我们的SECRET框架包含两个部分,一个是离线阶段,用于在给定目标错误率的情况下识别具有保留错误的存储单元,另一个是低开销的错误纠正机制。实验结果表明,SECRET框架可以降低87.2%的刷新功率和18.57%的DRAM总功耗,而面积和性能开销可以忽略不计。
{"title":"SECRET: Selective error correction for refresh energy reduction in DRAMs","authors":"Chung-Hsiang Lin, De-Yu Shen, Yi-Jung Chen, Chia-Lin Yang, Cheng-Yuan Michael Wang","doi":"10.1109/ICCD.2012.6378619","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378619","url":null,"abstract":"DRAMs are used as the main memory in most computing systems today. Studies show that DRAMs contribute to a significant part of overall system power consumption. Therefore, one of the main challenges in low-power DRAM design is the inevitable refresh process. Due to process variation, memory cells exhibit retention time variations. Current DRAMs use a single worst-case refresh period. Prolonging refresh intervals introduces retention errors. Previous works adopt conventional ECC (Error Correcting Code) to correct retention errors. These approaches introduce significant area and energy overheads. In this paper, we propose a novel error correction framework for retention errors in DRAMs, called SECRET (Selective Error Correction for Refresh Energy reducTion). The key observation we make is that retention errors can be treated as hard errors rather than soft errors, and only few DRAM cells have large leakage. Therefore, instead of equipping error correction capability in all memory cells as existing ECC schemes, we only allocate error correction information to leaky cells under a refresh interval. Our SECRET framework contains two parts, an off-line phase to identify memory cells with retention errors given a target error rate, and a low-overhead error correction mechanism. The experimental results show that the proposed SECRET framework can reduce refresh power by 87.2%, and overall DRAM power by 18.57% with negligible area and performance overheads.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114798386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
CoNA: Dynamic application mapping for congestion reduction in many-core systems CoNA:用于减少多核系统中的拥塞的动态应用程序映射
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378665
Mohammad Fattah, Marco Ramírez, M. Daneshtalab, P. Liljeberg, J. Plosila
Increasing the number of processors in a single chip toward network-based many-core systems requires a run-time task allocation algorithm. We propose an efficient mapping algorithm that assigns communicating tasks of incoming applications onto resources of a many-core system utilizing Network-on-Chip paradigm. In our contiguous neighborhood allocation (CoNA) algorithm, we target at the reduction of both internal and external congestion due to detrimental impact of congestion on the network performance. We approach the goal by keeping the mapped region contiguous and placing the communicating tasks in a close neighborhood. A completely synthesizable simulation environment where none of the system objects are assumed to be ideal is provided. Experiments show at least 40% gain in different mapping cost functions, as well as 16% reduction in average network latency compared to existing algorithms.
将单个芯片中的处理器数量增加到基于网络的多核系统需要运行时任务分配算法。我们提出了一种有效的映射算法,该算法利用片上网络范式将传入应用程序的通信任务分配到多核系统的资源上。在我们的连续邻域分配(CoNA)算法中,我们的目标是减少由于拥塞对网络性能的有害影响而导致的内部和外部拥塞。我们通过保持映射区域连续并将通信任务放置在邻近区域来实现目标。提供了一个完全可合成的仿真环境,其中没有一个系统对象被认为是理想的。实验表明,与现有算法相比,不同映射成本函数的增益至少为40%,平均网络延迟降低16%。
{"title":"CoNA: Dynamic application mapping for congestion reduction in many-core systems","authors":"Mohammad Fattah, Marco Ramírez, M. Daneshtalab, P. Liljeberg, J. Plosila","doi":"10.1109/ICCD.2012.6378665","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378665","url":null,"abstract":"Increasing the number of processors in a single chip toward network-based many-core systems requires a run-time task allocation algorithm. We propose an efficient mapping algorithm that assigns communicating tasks of incoming applications onto resources of a many-core system utilizing Network-on-Chip paradigm. In our contiguous neighborhood allocation (CoNA) algorithm, we target at the reduction of both internal and external congestion due to detrimental impact of congestion on the network performance. We approach the goal by keeping the mapped region contiguous and placing the communicating tasks in a close neighborhood. A completely synthesizable simulation environment where none of the system objects are assumed to be ideal is provided. Experiments show at least 40% gain in different mapping cost functions, as well as 16% reduction in average network latency compared to existing algorithms.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116622882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
A novel profiled side-channel attack in presence of high Algorithmic Noise 一种新的高算法噪声下的侧信道攻击方法
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378675
Mostafa M. I. Taha, P. Schaumont
Understanding the nature of hardware designs is a vital element in a successful Side-Channel Analysis. The inherent parallelism of these designs adds excessive Algorithmic Noise in the power consumption trace, which makes it difficult to mount a successful power attack against it. In this paper, we address this high Algorithmic Noise with a novel profiled attack that is generic and independent of any specific cryptographic algorithm. We propose both a new profiling phase and two new insights in the attack phase. The proposed profiling technique takes the high design parallelism into consideration, which results in a more accurate power model. In the attack phase, we first define two new targeted regions in the power trace, then aggregate the attack results from each of them to get a more powerful attack phase. The proposed attack model has been tested on the 128bit AES of the widely known DPA Contest (V2) and achieved a stable 80% Global Success Rate (GSR) at 2755 traces.
理解硬件设计的本质是成功进行侧信道分析的关键因素。这些设计固有的并行性在功耗轨迹中增加了过多的算法噪声,这使得很难对其进行成功的功率攻击。在本文中,我们用一种新颖的、通用的、独立于任何特定密码算法的轮廓攻击来解决这种高算法噪声。我们提出了一个新的分析阶段和两个攻击阶段的新见解。该方法考虑了较高的设计并行性,从而得到了更精确的功率模型。在攻击阶段,我们首先在功率跟踪中定义两个新的目标区域,然后将每个区域的攻击结果进行汇总,从而得到一个更强大的攻击阶段。所提出的攻击模型已经在众所周知的DPA竞赛(V2)的128位AES上进行了测试,并在2755个迹线处获得了稳定的80%的全局成功率(GSR)。
{"title":"A novel profiled side-channel attack in presence of high Algorithmic Noise","authors":"Mostafa M. I. Taha, P. Schaumont","doi":"10.1109/ICCD.2012.6378675","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378675","url":null,"abstract":"Understanding the nature of hardware designs is a vital element in a successful Side-Channel Analysis. The inherent parallelism of these designs adds excessive Algorithmic Noise in the power consumption trace, which makes it difficult to mount a successful power attack against it. In this paper, we address this high Algorithmic Noise with a novel profiled attack that is generic and independent of any specific cryptographic algorithm. We propose both a new profiling phase and two new insights in the attack phase. The proposed profiling technique takes the high design parallelism into consideration, which results in a more accurate power model. In the attack phase, we first define two new targeted regions in the power trace, then aggregate the attack results from each of them to get a more powerful attack phase. The proposed attack model has been tested on the 128bit AES of the widely known DPA Contest (V2) and achieved a stable 80% Global Success Rate (GSR) at 2755 traces.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123227201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Retrospective on “Power-Sensitive Multithreaded Architecture” 回顾“功率敏感型多线程架构”
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378609
J. Seng, D. Tullsen, George Z. N. Cai
This article provides a retrospective look at the research that went into the 2000 ICCD paper “Power-Sensitive Multithreaded Architecture”. At the time, simultaneous multithreading processors were soon to be commercially available and power consumption was proving to be a challenging design constraint. That research introduced optimizations that increased power and energy efficiency through multithreading, while maintaining performance. This article discusses the optimizations in the paper and discusses how processor designs have changed since its publication.
本文回顾了2000年ICCD论文“功率敏感的多线程体系结构”中的研究。当时,同步多线程处理器很快就会商业化,功耗被证明是一个具有挑战性的设计限制。该研究引入了通过多线程提高功率和能源效率的优化,同时保持了性能。本文讨论了论文中的优化,并讨论了自论文发表以来处理器设计的变化。
{"title":"Retrospective on “Power-Sensitive Multithreaded Architecture”","authors":"J. Seng, D. Tullsen, George Z. N. Cai","doi":"10.1109/ICCD.2012.6378609","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378609","url":null,"abstract":"This article provides a retrospective look at the research that went into the 2000 ICCD paper “Power-Sensitive Multithreaded Architecture”. At the time, simultaneous multithreading processors were soon to be commercially available and power consumption was proving to be a challenging design constraint. That research introduced optimizations that increased power and energy efficiency through multithreading, while maintaining performance. This article discusses the optimizations in the paper and discusses how processor designs have changed since its publication.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123708016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Design and evaluation of a delay-based FPGA Physically Unclonable Function 基于延迟的FPGA物理不可克隆功能的设计与评估
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378632
Aaron Mills, Sudhanshu Vyas, Michael Patterson, Christopher Sabotta, Phillip H. Jones, Joseph Zambreno
A new Physically Unclonable Function (PUF) variant was developed on an FPGA, and its quality evaluated. It is conceptually similar to PUFs developed using standard SRAM cells, except it utilizes general FPGA reconfigurable fabric, which offers several advantages. Comparison between our approach and other PUF designs indicates that our design is competitive in terms of repeatability within a given instance, and uniqueness between instances. The design can also be tuned to achieve desired response characteristics which broadens the potential range of applications.
在FPGA上开发了一种新的物理不可克隆函数(PUF)变体,并对其质量进行了评估。它在概念上类似于使用标准SRAM单元开发的puf,除了它使用通用的FPGA可重构结构,它提供了几个优点。将我们的方法与其他PUF设计进行比较表明,我们的设计在给定实例内的可重复性和实例之间的唯一性方面具有竞争力。该设计还可以调整以达到所需的响应特性,从而扩大潜在的应用范围。
{"title":"Design and evaluation of a delay-based FPGA Physically Unclonable Function","authors":"Aaron Mills, Sudhanshu Vyas, Michael Patterson, Christopher Sabotta, Phillip H. Jones, Joseph Zambreno","doi":"10.1109/ICCD.2012.6378632","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378632","url":null,"abstract":"A new Physically Unclonable Function (PUF) variant was developed on an FPGA, and its quality evaluated. It is conceptually similar to PUFs developed using standard SRAM cells, except it utilizes general FPGA reconfigurable fabric, which offers several advantages. Comparison between our approach and other PUF designs indicates that our design is competitive in terms of repeatability within a given instance, and uniqueness between instances. The design can also be tuned to achieve desired response characteristics which broadens the potential range of applications.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130093084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Cloud computing: Virtualization and resiliency for data center computing 云计算:数据中心计算的虚拟化和弹性
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378606
V. Salapura
Cloud computing is being rapidly adopted across the IT industry, driven by the need to reduce the total cost of ownership of increasingly more demanding workloads. Within companies, private clouds are offering a more efficient way to manage and use private data centers. In the broader marketplace, public clouds offer the promise of buying computing capabilities based on a utility model. This utility model enables IT consumers to purchase compute resources on demand to fit current business needs and scale expenses associated with computing resources. Thus, cloud computing offers IT to be treated as an ongoing variable operating expense billed by usage rather than requiring capital expenditures that must be planned years in advance. Advantageously, operating expenses can be charged against the revenue generated by these expenses directly. In contrast, capital expenses incurred by the purchase of a system need to be paid at the time of purchase, but can only be depreciated to reduce the taxable income over the lifetime of the system.
云计算正在被整个IT行业迅速采用,这是由于需要降低越来越苛刻的工作负载的总拥有成本。在公司内部,私有云提供了一种更有效的方式来管理和使用私有数据中心。在更广阔的市场中,公共云提供了基于实用新型购买计算能力的承诺。这种实用新型使IT消费者能够按需购买计算资源,以适应当前的业务需求和与计算资源相关的费用。因此,云计算将IT视为按使用情况计费的持续可变运营费用,而不是要求必须提前几年计划的资本支出。有利的是,营业费用可以直接从这些费用产生的收入中扣除。相比之下,购买系统所产生的资本费用需要在购买时支付,但只能在系统的整个生命周期内折旧以减少应纳税所得额。
{"title":"Cloud computing: Virtualization and resiliency for data center computing","authors":"V. Salapura","doi":"10.1109/ICCD.2012.6378606","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378606","url":null,"abstract":"Cloud computing is being rapidly adopted across the IT industry, driven by the need to reduce the total cost of ownership of increasingly more demanding workloads. Within companies, private clouds are offering a more efficient way to manage and use private data centers. In the broader marketplace, public clouds offer the promise of buying computing capabilities based on a utility model. This utility model enables IT consumers to purchase compute resources on demand to fit current business needs and scale expenses associated with computing resources. Thus, cloud computing offers IT to be treated as an ongoing variable operating expense billed by usage rather than requiring capital expenditures that must be planned years in advance. Advantageously, operating expenses can be charged against the revenue generated by these expenses directly. In contrast, capital expenses incurred by the purchase of a system need to be paid at the time of purchase, but can only be depreciated to reduce the taxable income over the lifetime of the system.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125879370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
期刊
2012 IEEE 30th International Conference on Computer Design (ICCD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1