首页 > 最新文献

2016 International Great Lakes Symposium on VLSI (GLSVLSI)最新文献

英文 中文
A parallel random walk solver for the capacitance calculation problem in touchscreen design 触摸屏设计中电容计算问题的并行随机游走求解器
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2903011
Zhezhao Xu, Wenjian Yu, Chao Zhang, Bolong Zhang, Meijuan Lu, M. Mascagni
In this paper, a random walk based solver is presented which calculates capacitances for verifying a touchscreen design. To suit the complicated conductor geometries in touchscreen structures, we extend the floating random walk (FRW) method for handling non-Manhattan conductors. A unified dielectric pre-characterization scheme is proposed to suit arbitrary dielectric profiles while keeping high accuracy. The algorithm is finally implemented on a computer cluster, which enables massively parallel computing. Numerical experiments validate the accuracy of the proposed techniques and the up to 67X parallel speedup. Compared with other schemes, the unified dielectric pre-characterization scheme exhibits the highest accuracy while costing the least in terms of memory usage.
本文提出了一种基于随机游走的求解器,用于计算触摸屏设计的电容。为了适应触摸屏结构中复杂的导体几何形状,我们扩展了浮动随机漫步(FRW)方法来处理非曼哈顿导体。提出了一种统一的介质预表征方案,可适应任意介质分布,同时保持高精度。该算法最终在计算机集群上实现,实现了大规模并行计算。数值实验验证了所提技术的准确性和高达67倍的并行加速。与其他方案相比,统一介质预表征方案具有最高的精度和最少的内存消耗。
{"title":"A parallel random walk solver for the capacitance calculation problem in touchscreen design","authors":"Zhezhao Xu, Wenjian Yu, Chao Zhang, Bolong Zhang, Meijuan Lu, M. Mascagni","doi":"10.1145/2902961.2903011","DOIUrl":"https://doi.org/10.1145/2902961.2903011","url":null,"abstract":"In this paper, a random walk based solver is presented which calculates capacitances for verifying a touchscreen design. To suit the complicated conductor geometries in touchscreen structures, we extend the floating random walk (FRW) method for handling non-Manhattan conductors. A unified dielectric pre-characterization scheme is proposed to suit arbitrary dielectric profiles while keeping high accuracy. The algorithm is finally implemented on a computer cluster, which enables massively parallel computing. Numerical experiments validate the accuracy of the proposed techniques and the up to 67X parallel speedup. Compared with other schemes, the unified dielectric pre-characterization scheme exhibits the highest accuracy while costing the least in terms of memory usage.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115271241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Design and comparative evaluation of a hybrid Cache memory at architectural level 架构级混合高速缓存的设计与比较评估
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2903002
Wei Wei, K. Namba, F. Lombardi
A hybrid memory cell usually consists of a Static Random Access Memory (SRAM) and an embedded Dynamic Random Access Memory (eDRAM) cell; hybrid cells are particularly suitable for cache design. A novel hybrid cache memory scheme (that has also non-volatile elements) is initially proposed; this scheme is assessed through extensive simulation to show significant improvements in performance. Different design implementations of the hybrid cache are then proposed at architectural level and different features (such as the memory hit rate, the Instruction Per Cycle (IPC) access pattern and the memory cell access time) are also simulated at this level using benchmarks to show the advantages of the proposed scheme for use as an hybrid cache.
混合存储单元通常由静态随机存取存储器(SRAM)和嵌入式动态随机存取存储器(eDRAM)单元组成;混合电池特别适合缓存设计。初步提出了一种新的混合缓存方案(也具有非易失性元素);该方案通过广泛的模拟来评估,以显示性能的显着改进。然后在体系结构级别上提出了混合缓存的不同设计实现,并在此级别上使用基准测试模拟了不同的特性(例如内存命中率、每周期指令(IPC)访问模式和内存单元访问时间),以显示所提出的方案作为混合缓存使用的优势。
{"title":"Design and comparative evaluation of a hybrid Cache memory at architectural level","authors":"Wei Wei, K. Namba, F. Lombardi","doi":"10.1145/2902961.2903002","DOIUrl":"https://doi.org/10.1145/2902961.2903002","url":null,"abstract":"A hybrid memory cell usually consists of a Static Random Access Memory (SRAM) and an embedded Dynamic Random Access Memory (eDRAM) cell; hybrid cells are particularly suitable for cache design. A novel hybrid cache memory scheme (that has also non-volatile elements) is initially proposed; this scheme is assessed through extensive simulation to show significant improvements in performance. Different design implementations of the hybrid cache are then proposed at architectural level and different features (such as the memory hit rate, the Instruction Per Cycle (IPC) access pattern and the memory cell access time) are also simulated at this level using benchmarks to show the advantages of the proposed scheme for use as an hybrid cache.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115681702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Low energy sketching engines on many-core platform for big data acceleration 面向大数据加速的多核平台低耗能绘图引擎
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2902984
A. Kulkarni, Tahmid Abtahi, E. Smith, T. Mohsenin
Almost 90% of the data available today was created within the last couple of years, thus Big Data set processing is of utmost importance. Many solutions have been investigated to increase processing speed and memory capacity, however I/O bottleneck is still a critical issue. To tackle this issue we adopt Sketching technique to reduce data communications. Reconstruction of the sketched matrix is performed using Orthogonal Matching Pursuit (OMP). Additionally we propose Gradient Descent OMP (GD-OMP) algorithm to reduce hardware complexity. Big data processing at real-time imposes rigid constraints on sketching kernel, hence to further reduce hardware overhead both algorithms are implemented on a low power domain specific many-core platform called Power Efficient Nano Clusters (PENC). GD-OMP algorithm is evaluated for image reconstruction accuracy and the PENC many-core architecture. Implementation results show that for large matrix sizes GD-OMP algorithm is 1.3× faster and consumes 1.4× less energy than OMP algorithm implementations. Compared to GPU and Quad-Core CPU implementations the PENC many-core reconstructs 5.4× and 9.8× faster respectively for large signal sizes with higher sparsity.
如今,几乎90%的可用数据都是在过去几年内创建的,因此大数据集处理至关重要。已经研究了许多解决方案来提高处理速度和内存容量,但是I/O瓶颈仍然是一个关键问题。为了解决这个问题,我们采用了草图技术来减少数据通信。利用正交匹配追踪(OMP)对绘制好的矩阵进行重构。此外,我们提出梯度下降OMP (GD-OMP)算法来降低硬件复杂度。实时大数据处理对绘制内核施加了严格的约束,因此为了进一步减少硬件开销,这两种算法都在低功耗领域特定的多核平台上实现,称为功率高效纳米集群(PENC)。对GD-OMP算法的图像重建精度和PENC多核结构进行了评价。实现结果表明,对于大矩阵,GD-OMP算法比OMP算法实现速度快1.3倍,能耗低1.4倍。与GPU和四核CPU实现相比,对于具有更高稀疏性的大信号大小,PENC多核重构速度分别快5.4倍和9.8倍。
{"title":"Low energy sketching engines on many-core platform for big data acceleration","authors":"A. Kulkarni, Tahmid Abtahi, E. Smith, T. Mohsenin","doi":"10.1145/2902961.2902984","DOIUrl":"https://doi.org/10.1145/2902961.2902984","url":null,"abstract":"Almost 90% of the data available today was created within the last couple of years, thus Big Data set processing is of utmost importance. Many solutions have been investigated to increase processing speed and memory capacity, however I/O bottleneck is still a critical issue. To tackle this issue we adopt Sketching technique to reduce data communications. Reconstruction of the sketched matrix is performed using Orthogonal Matching Pursuit (OMP). Additionally we propose Gradient Descent OMP (GD-OMP) algorithm to reduce hardware complexity. Big data processing at real-time imposes rigid constraints on sketching kernel, hence to further reduce hardware overhead both algorithms are implemented on a low power domain specific many-core platform called Power Efficient Nano Clusters (PENC). GD-OMP algorithm is evaluated for image reconstruction accuracy and the PENC many-core architecture. Implementation results show that for large matrix sizes GD-OMP algorithm is 1.3× faster and consumes 1.4× less energy than OMP algorithm implementations. Compared to GPU and Quad-Core CPU implementations the PENC many-core reconstructs 5.4× and 9.8× faster respectively for large signal sizes with higher sparsity.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121804779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
DCC: Double capacity Cache architecture for narrow-width values DCC:窄宽度值的双容量缓存架构
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2902990
M. Imani, S. Patil, T. Simunic
Modern caches are designed to hold 64-bits wide data, however a proportion of data in the caches continues to be narrow width. In this paper, we propose a new cache architecture which increases the effective cache capacity up to 2X for the systems with narrow-width values, while also improving its power efficiency, bandwidth, and reliability. The proposed double capacity cache (DCC) architecture uses a fast and efficient peripheral circuitry to store two narrow-width values in a single wordline. In order to minimize the latency overhead in workloads without narrow-width data, the flag bits are added to tag store. The proposed DCC architecture decreases cache miss-rate by 50%, which results in 27% performance improvement and 30% higher dynamic energy efficiency. To improve reliability, DCC modifies the data distribution on individual bits, which results in 20% and 25% average static-noise margin (SNM) improvement in L1 and L2 caches respectively.
现代缓存被设计为容纳64位宽的数据,但是缓存中的一部分数据仍然是窄宽度的。在本文中,我们提出了一种新的缓存架构,它可以将具有窄宽度值的系统的有效缓存容量提高到2X,同时还可以提高其功率效率,带宽和可靠性。所提出的双容量缓存(DCC)架构使用快速高效的外围电路在单个字行中存储两个窄宽度值。为了最小化没有窄宽度数据的工作负载中的延迟开销,将标志位添加到标记存储中。提出的DCC架构将缓存丢失率降低了50%,从而使性能提高了27%,动态能源效率提高了30%。为了提高可靠性,DCC修改了单个比特上的数据分布,这使得L1和L2缓存的平均静态噪声边际(SNM)分别提高了20%和25%。
{"title":"DCC: Double capacity Cache architecture for narrow-width values","authors":"M. Imani, S. Patil, T. Simunic","doi":"10.1145/2902961.2902990","DOIUrl":"https://doi.org/10.1145/2902961.2902990","url":null,"abstract":"Modern caches are designed to hold 64-bits wide data, however a proportion of data in the caches continues to be narrow width. In this paper, we propose a new cache architecture which increases the effective cache capacity up to 2X for the systems with narrow-width values, while also improving its power efficiency, bandwidth, and reliability. The proposed double capacity cache (DCC) architecture uses a fast and efficient peripheral circuitry to store two narrow-width values in a single wordline. In order to minimize the latency overhead in workloads without narrow-width data, the flag bits are added to tag store. The proposed DCC architecture decreases cache miss-rate by 50%, which results in 27% performance improvement and 30% higher dynamic energy efficiency. To improve reliability, DCC modifies the data distribution on individual bits, which results in 20% and 25% average static-noise margin (SNM) improvement in L1 and L2 caches respectively.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124885688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An enhanced analytical electrical masking model for multiple event transients 一种改进的多事件瞬态分析电掩蔽模型
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2903007
Adam Watkins, S. Tragoudas
Due to the reducing transistor feature size, the susceptibility of modern circuits to radiation induced errors has increased. This, as a result, has increased the likelihood of multiple transients affecting a circuit. An important aspect when modeling convergent pulses is the approximation of the gate output. Thus, in this paper, a model that approximates the output pulse shape for convergent inputs is proposed. Extensive simulations showed that the proposed model matched closely with HSPICE and provides a speed-up of 15X.
由于晶体管特征尺寸的减小,现代电路对辐射引起的误差的敏感性增加了。因此,增加了多个瞬变影响电路的可能性。当对收敛脉冲进行建模时,一个重要的方面是门输出的近似。因此,本文提出了一个近似于收敛输入的输出脉冲形状的模型。大量的仿真表明,所提出的模型与HSPICE非常匹配,并且提供了15倍的加速。
{"title":"An enhanced analytical electrical masking model for multiple event transients","authors":"Adam Watkins, S. Tragoudas","doi":"10.1145/2902961.2903007","DOIUrl":"https://doi.org/10.1145/2902961.2903007","url":null,"abstract":"Due to the reducing transistor feature size, the susceptibility of modern circuits to radiation induced errors has increased. This, as a result, has increased the likelihood of multiple transients affecting a circuit. An important aspect when modeling convergent pulses is the approximation of the gate output. Thus, in this paper, a model that approximates the output pulse shape for convergent inputs is proposed. Extensive simulations showed that the proposed model matched closely with HSPICE and provides a speed-up of 15X.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114387049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Delay estimates for graphene nanoribbons: A novel measure of fidelity and experiments with global routing trees 石墨烯纳米带的延迟估计:一种新颖的保真度测量和全球路由树实验
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2903036
Subrata Das, Soma Das, Adrija Majumder, P. Dasgupta, D. K. Das
With extreme miniaturization of traditional CMOS devices in deep sub-micron design levels, the delay of a circuit, as well as power dissipation and area are dominated by interconnections between logic blocks. In an attempt to search for alternative materials, Graphene nanoribbons (GNRs) have been found to be potential for both transistors and interconnects due to its outstanding electrical and thermal properties. GNRs provide better options as materials used for global routing trees in VLSI circuits. However, certain special characteristics of GNRs prohibit direct application of existing VLSI routing tree construction methods for the GNR-based interconnection trees. In this paper, we address this issue possibly for the first time, and propose a heuristic method for construction of GNR-based minimum-delay Steiner trees based on linear-cum-bending hybrid delay model. Experimental results demonstrate the effectiveness of our proposed methods. We propose a novel technique for analyzing the relative accuracy of the delay estimates using rank correlation and statistical significance test. We also compute the delays for the trees generated by hybrid delay heuristic using Elmore delay approximation and use them for determining the relative accuracy of the hybrid delay estimate.
随着传统CMOS器件在深亚微米设计水平上的极度小型化,电路的延迟、功耗和面积主要取决于逻辑块之间的互连。在寻找替代材料的尝试中,石墨烯纳米带(gnr)由于其出色的电学和热学性能而被发现具有晶体管和互连的潜力。gnr作为VLSI电路中用于全局路由树的材料提供了更好的选择。然而,gnr的某些特殊特性禁止将现有的VLSI路由树构建方法直接应用于基于gnr的互连树。在本文中,我们可能第一次解决了这个问题,并提出了一种基于线性和弯曲混合延迟模型的基于gnr的最小延迟Steiner树的启发式构造方法。实验结果证明了所提方法的有效性。提出了一种利用秩相关和统计显著性检验分析延迟估计相对精度的新方法。我们还利用Elmore延迟近似计算了混合延迟启发式生成的树的延迟,并用它们来确定混合延迟估计的相对精度。
{"title":"Delay estimates for graphene nanoribbons: A novel measure of fidelity and experiments with global routing trees","authors":"Subrata Das, Soma Das, Adrija Majumder, P. Dasgupta, D. K. Das","doi":"10.1145/2902961.2903036","DOIUrl":"https://doi.org/10.1145/2902961.2903036","url":null,"abstract":"With extreme miniaturization of traditional CMOS devices in deep sub-micron design levels, the delay of a circuit, as well as power dissipation and area are dominated by interconnections between logic blocks. In an attempt to search for alternative materials, Graphene nanoribbons (GNRs) have been found to be potential for both transistors and interconnects due to its outstanding electrical and thermal properties. GNRs provide better options as materials used for global routing trees in VLSI circuits. However, certain special characteristics of GNRs prohibit direct application of existing VLSI routing tree construction methods for the GNR-based interconnection trees. In this paper, we address this issue possibly for the first time, and propose a heuristic method for construction of GNR-based minimum-delay Steiner trees based on linear-cum-bending hybrid delay model. Experimental results demonstrate the effectiveness of our proposed methods. We propose a novel technique for analyzing the relative accuracy of the delay estimates using rank correlation and statistical significance test. We also compute the delays for the trees generated by hybrid delay heuristic using Elmore delay approximation and use them for determining the relative accuracy of the hybrid delay estimate.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122093983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An offline frequent value encoding for energy-efficient MLC/TLC non-volatile memories 高效节能MLC/TLC非易失性存储器的离线频繁值编码
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2902979
Ali Alsuwaiyan, K. Mohanram
This paper describes a low overhead, offline frequent value encoding (FVE) solution to reduce the write energy in multi-level/triple-level cell (MLC/TLC) non-volatile memories (NVMs). The proposed solution, which does not require any runtime software support, clusters a set of general-purpose applications according to their data frequency profiles and generates a dedicated offline FVE that minimizes write energy for each cluster. Results show that the write energy reduction of evaluation sets - using FVEs generated for training sets - are close (equal) to the best known solution for MLC (TLC) NVM encoding; however, our solution incurs a memory overhead that is 16× (5.7×) less than the best comparable scheme in the literature for MLC (TLC) NVMs.
本文提出了一种低开销、离线频率值编码(FVE)的解决方案,以减少多电平/三电平单元(MLC/TLC)非易失性存储器(nvm)的写入能量。建议的解决方案不需要任何运行时软件支持,它根据数据频率配置文件将一组通用应用程序聚集在一起,并生成一个专用的脱机FVE,以最大限度地减少每个集群的写能量。结果表明,使用为训练集生成的FVEs,评估集的写能量减少接近(等于)MLC (TLC) NVM编码的最佳解决方案;然而,我们的解决方案产生的内存开销比文献中MLC (TLC) nvm的最佳可比方案少16倍(5.7倍)。
{"title":"An offline frequent value encoding for energy-efficient MLC/TLC non-volatile memories","authors":"Ali Alsuwaiyan, K. Mohanram","doi":"10.1145/2902961.2902979","DOIUrl":"https://doi.org/10.1145/2902961.2902979","url":null,"abstract":"This paper describes a low overhead, offline frequent value encoding (FVE) solution to reduce the write energy in multi-level/triple-level cell (MLC/TLC) non-volatile memories (NVMs). The proposed solution, which does not require any runtime software support, clusters a set of general-purpose applications according to their data frequency profiles and generates a dedicated offline FVE that minimizes write energy for each cluster. Results show that the write energy reduction of evaluation sets - using FVEs generated for training sets - are close (equal) to the best known solution for MLC (TLC) NVM encoding; however, our solution incurs a memory overhead that is 16× (5.7×) less than the best comparable scheme in the literature for MLC (TLC) NVMs.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126236134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Area-efficient error-resilient discrete fourier transformation design using stochastic computing 利用随机计算的面积效率误差弹性离散傅立叶变换设计
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2902978
Bo Yuan, Yanzhi Wang, Zhongfeng Wang
Discrete Fourier Transformation (DFT)/Fast Fourier Transformation (FFT) are the widely used techniques in numerous modern signal processing applications. In general, because of their inherent multiplication-intensive characteristics, the hardware implementations of DFT/FFT usually require a large amount of hardware resource, which limits their applications in area-constraint scenarios. To overcome this challenge, this paper, for the first time, proposes area-efficient error-resilient DFT designs using stochastic computing. By leveraging low-complexity stochastic multipliers, two types of stochastic DFT design are presented with significant reduction in overall area. Analysis results show that compared with the conventional design, the proposed two 256-point stochastic DFT designs achieve 76% and 62% reduction in area, respectively. More importantly, these stochastic DFT designs also show much stronger error-resilience, which is very attractive in nanoscale CMOS era.
离散傅里叶变换(DFT)/快速傅里叶变换(FFT)是在现代信号处理中广泛应用的技术。通常,由于DFT/FFT固有的乘法密集特性,硬件实现通常需要大量的硬件资源,这限制了它们在区域约束场景中的应用。为了克服这一挑战,本文首次提出了使用随机计算的面积高效容错DFT设计。利用低复杂度的随机乘法器,提出了两种显著减少总面积的随机DFT设计。分析结果表明,与传统设计相比,本文提出的两种256点随机DFT设计分别实现了76%和62%的面积缩减。更重要的是,这些随机DFT设计还显示出更强的错误恢复能力,这在纳米级CMOS时代非常有吸引力。
{"title":"Area-efficient error-resilient discrete fourier transformation design using stochastic computing","authors":"Bo Yuan, Yanzhi Wang, Zhongfeng Wang","doi":"10.1145/2902961.2902978","DOIUrl":"https://doi.org/10.1145/2902961.2902978","url":null,"abstract":"Discrete Fourier Transformation (DFT)/Fast Fourier Transformation (FFT) are the widely used techniques in numerous modern signal processing applications. In general, because of their inherent multiplication-intensive characteristics, the hardware implementations of DFT/FFT usually require a large amount of hardware resource, which limits their applications in area-constraint scenarios. To overcome this challenge, this paper, for the first time, proposes area-efficient error-resilient DFT designs using stochastic computing. By leveraging low-complexity stochastic multipliers, two types of stochastic DFT design are presented with significant reduction in overall area. Analysis results show that compared with the conventional design, the proposed two 256-point stochastic DFT designs achieve 76% and 62% reduction in area, respectively. More importantly, these stochastic DFT designs also show much stronger error-resilience, which is very attractive in nanoscale CMOS era.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115980281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Prolonging lifetime of non-volatile last level caches with cluster mapping 使用集群映射延长非易失性最后一级缓存的生命周期
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2902980
Morteza Soltani, Mohammad Ebrahimi, Z. Navabi
Recently, work has been done on using nonvolatile cells, such as Spin Transfer Torque RAM (STT-RAM) or Magnetic RAM (M-RAM), to construct last level caches (LLC). These structures mitigate the leakage power and density problem found in traditional SRAM cells. However, the low endurance of nonvolatile caches decreases the lifetime of the LLC. Therefore, an effective wear-leveling technique is required to tackle this issue. In this paper, we propose the inter-set algorithm that distributes the write traffic to all portions of the cache. Our method is based on cluster mapping that dynamically replaces two clusters during the operation of system. Since the inter-set algorithm is based on data movement, a large amount of data must transfer in each replacement. For an efficient data movement with a minimum effect on performance, we develop the novel scheduling technique that utilizes the idle time of the LLC in the computation phase of the processors. Our approach effectively improves the lifetime of LLC with negligible performance and area overhead. Using these methods in a quad core system with 2MB LLC, we can improve the lifetime of non-volatile LLC by 30% on average.
最近,人们开始使用非易失性电池,如自旋传递扭矩RAM (STT-RAM)或磁性RAM (M-RAM),来构建最后一级缓存(LLC)。这些结构减轻了传统SRAM单元中存在的泄漏功率和密度问题。然而,非易失性缓存的低耐久性降低了LLC的使用寿命。因此,需要一种有效的磨损均衡技术来解决这个问题。在本文中,我们提出了一种将写流量分配到缓存的所有部分的集间算法。该方法基于集群映射,在系统运行过程中动态替换两个集群。由于集间算法是基于数据移动的,每次替换都要传输大量的数据。为了在对性能影响最小的情况下进行有效的数据移动,我们开发了一种新的调度技术,该技术利用了处理器计算阶段LLC的空闲时间。我们的方法有效地提高了LLC的使用寿命,而性能和面积开销可以忽略不计。在2MB LLC的四核系统中使用这些方法,我们可以将非易失性LLC的寿命平均提高30%。
{"title":"Prolonging lifetime of non-volatile last level caches with cluster mapping","authors":"Morteza Soltani, Mohammad Ebrahimi, Z. Navabi","doi":"10.1145/2902961.2902980","DOIUrl":"https://doi.org/10.1145/2902961.2902980","url":null,"abstract":"Recently, work has been done on using nonvolatile cells, such as Spin Transfer Torque RAM (STT-RAM) or Magnetic RAM (M-RAM), to construct last level caches (LLC). These structures mitigate the leakage power and density problem found in traditional SRAM cells. However, the low endurance of nonvolatile caches decreases the lifetime of the LLC. Therefore, an effective wear-leveling technique is required to tackle this issue. In this paper, we propose the inter-set algorithm that distributes the write traffic to all portions of the cache. Our method is based on cluster mapping that dynamically replaces two clusters during the operation of system. Since the inter-set algorithm is based on data movement, a large amount of data must transfer in each replacement. For an efficient data movement with a minimum effect on performance, we develop the novel scheduling technique that utilizes the idle time of the LLC in the computation phase of the processors. Our approach effectively improves the lifetime of LLC with negligible performance and area overhead. Using these methods in a quad core system with 2MB LLC, we can improve the lifetime of non-volatile LLC by 30% on average.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125050660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Enhancing fault emulation of transient faults by separating combinational and sequential fault propagation 通过分离组合故障传播和顺序故障传播,增强暂态故障的故障仿真
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2903021
R. Nyberg, Johann Heyszl, Dietmar Heinz, G. Sigl
We present a fault emulation environment capable of injecting single and multiple transient faults in sequential as well as combinational logic. It is used to perform fault injection campaigns during design verification of security circuits such as smart cards. In order to reduce the unacceptable hardware overhead of fault emulation for combinational faults, we split the problem of combinational fault modeling into two steps: 1) Fault injection in combinational cells and propagation into sequential cells, processed by a software approach, and 2) fast FPGA-based fault emulation of faults in sequential logic. We used the presented tool to emulate single and multiple faults in two different designs used for security applications. We analyzed how faults propagate from combinational to sequential logic, discuss the resulting consequences for developers of security circuits and fault analysis environments and derive performance optimizations. We demonstrate the performance of our method with varying tests and varying fault multiplicities. Interestingly, we found that the presented method outperforms conventional standalone FPGA-based approaches, while it requires 45% less logic elements on the FPGA.
我们提出了一种故障仿真环境,能够以顺序逻辑和组合逻辑注入单个和多个暂态故障。它用于在智能卡等安全电路的设计验证期间执行故障注入活动。为了减少组合故障仿真的硬件开销,我们将组合故障建模问题分为两个步骤:1)在组合单元中注入故障并传播到顺序单元,通过软件方法进行处理;2)在顺序逻辑中快速进行基于fpga的故障仿真。我们使用所提供的工具来模拟用于安全应用程序的两种不同设计中的单个和多个故障。我们分析了故障如何从组合逻辑传播到顺序逻辑,讨论了安全电路和故障分析环境开发人员的结果,并得出了性能优化。我们用不同的测试和不同的故障数来证明我们的方法的性能。有趣的是,我们发现所提出的方法优于传统的基于FPGA的独立方法,同时它在FPGA上需要的逻辑元件减少了45%。
{"title":"Enhancing fault emulation of transient faults by separating combinational and sequential fault propagation","authors":"R. Nyberg, Johann Heyszl, Dietmar Heinz, G. Sigl","doi":"10.1145/2902961.2903021","DOIUrl":"https://doi.org/10.1145/2902961.2903021","url":null,"abstract":"We present a fault emulation environment capable of injecting single and multiple transient faults in sequential as well as combinational logic. It is used to perform fault injection campaigns during design verification of security circuits such as smart cards. In order to reduce the unacceptable hardware overhead of fault emulation for combinational faults, we split the problem of combinational fault modeling into two steps: 1) Fault injection in combinational cells and propagation into sequential cells, processed by a software approach, and 2) fast FPGA-based fault emulation of faults in sequential logic. We used the presented tool to emulate single and multiple faults in two different designs used for security applications. We analyzed how faults propagate from combinational to sequential logic, discuss the resulting consequences for developers of security circuits and fault analysis environments and derive performance optimizations. We demonstrate the performance of our method with varying tests and varying fault multiplicities. Interestingly, we found that the presented method outperforms conventional standalone FPGA-based approaches, while it requires 45% less logic elements on the FPGA.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131177588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2016 International Great Lakes Symposium on VLSI (GLSVLSI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1