首页 > 最新文献

Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)最新文献

英文 中文
Statistical information processing: Computing for the nanoscale era 统计信息处理:纳米级时代的计算
Naresh R Shanbhag
Computing platforms operating at the limits of energy-efficiency need to contend with the issue of robustness. This energy vs. robustness trade-off is fundamental in such systems. This talk will describe a Shannon-inspired framework referred to as statistical information processing (SIP). SIP navigates the energy vs. robustness trade-off by treating the problem of energy-efficient computing as one of information processing on low-SNR and unreliable nanoscale device/circuit fabrics. In doing do, SIP seeks to transform computing from its von Neumann roots in data processing to a Shannon-inspired foundation for information processing. Key elements of SIP are the use of information-based metrics, a stochastic low-SNR circuit fabric, and statistical error compensation techniques based on estimation and detection theory, and machine learning. SIP has been used for designing energy-efficient and robust computation, communication, storage, and mixed-signal analog front-ends. This talk will conclude with a brief overview of the Systems On Nanoscale Information fabriCs (SONIC) Center, a 5-year multi-university research center, focused on developing a Shannon/brain-inspired foundation for information processing on CMOS and beyond CMOS nanoscale fabrics.
在能源效率极限下运行的计算平台需要解决健壮性问题。这种能量与健壮性的权衡是这种系统的基础。本演讲将描述一个香农启发的框架,称为统计信息处理(SIP)。SIP通过将节能计算问题视为低信噪比和不可靠的纳米级器件/电路结构上的信息处理问题之一,在能量与鲁棒性之间进行了权衡。在这样做的过程中,SIP试图将计算从冯·诺伊曼的数据处理根源转变为香农启发的信息处理基础。SIP的关键要素是使用基于信息的度量、随机低信噪比电路结构、基于估计和检测理论的统计误差补偿技术以及机器学习。SIP已被用于设计节能和鲁棒的计算、通信、存储和混合信号模拟前端。本次演讲将以对纳米级信息结构系统(SONIC)中心的简要概述结束,SONIC是一个为期5年的多所大学研究中心,专注于开发香农/大脑启发的CMOS和超越CMOS纳米结构的信息处理基础。
{"title":"Statistical information processing: Computing for the nanoscale era","authors":"Naresh R Shanbhag","doi":"10.1109/ISLPED.2015.7273480","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273480","url":null,"abstract":"Computing platforms operating at the limits of energy-efficiency need to contend with the issue of robustness. This energy vs. robustness trade-off is fundamental in such systems. This talk will describe a Shannon-inspired framework referred to as statistical information processing (SIP). SIP navigates the energy vs. robustness trade-off by treating the problem of energy-efficient computing as one of information processing on low-SNR and unreliable nanoscale device/circuit fabrics. In doing do, SIP seeks to transform computing from its von Neumann roots in data processing to a Shannon-inspired foundation for information processing. Key elements of SIP are the use of information-based metrics, a stochastic low-SNR circuit fabric, and statistical error compensation techniques based on estimation and detection theory, and machine learning. SIP has been used for designing energy-efficient and robust computation, communication, storage, and mixed-signal analog front-ends. This talk will conclude with a brief overview of the Systems On Nanoscale Information fabriCs (SONIC) Center, a 5-year multi-university research center, focused on developing a Shannon/brain-inspired foundation for information processing on CMOS and beyond CMOS nanoscale fabrics.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75681231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Memristor-based approximated computation 基于忆阻器的近似计算
Boxun Li, Yi Shan, Miao Hu, Yu Wang, Yiran Chen, Huazhong Yang
The cessation of Moore's Law has limited further improvements in power efficiency. In recent years, the physical realization of the memristor has demonstrated a promising solution to ultra-integrated hardware realization of neural networks, which can be leveraged for better performance and power efficiency gains. In this work, we introduce a power efficient framework for approximated computations by taking advantage of the memristor-based multilayer neural networks. A programmable memristor approximated computation unit (Memristor ACU) is introduced first to accelerate approximated computation and a memristor-based approximated computation framework with scalability is proposed on top of the Memristor ACU. We also introduce a parameter configuration algorithm of the Memristor ACU and a feedback state tuning circuit to program the Memristor ACU effectively. Our simulation results show that the maximum error of the Memristor ACU for 6 common complex functions is only 1.87% while the state tuning circuit can achieve 12-bit precision. The implementation of HMAX model atop our proposed memristor-based approximated computation framework demonstrates 22× power efficiency improvements than its pure digital implementation counterpart.
摩尔定律的终止限制了能效的进一步提高。近年来,忆阻器的物理实现为神经网络的超集成硬件实现提供了一个有前途的解决方案,可以利用它来获得更好的性能和功率效率。在这项工作中,我们通过利用基于忆阻器的多层神经网络,引入了一种高效节能的近似计算框架。首先引入了可编程忆阻器近似计算单元(memristor ACU)来加速近似计算,并在其基础上提出了一个基于忆阻器的具有可扩展性的近似计算框架。介绍了忆阻器ACU的参数组态算法和状态反馈调谐电路,实现了对忆阻器ACU的有效编程。仿真结果表明,忆阻器ACU对6种常见复杂函数的最大误差仅为1.87%,状态调谐电路可达到12位精度。在我们提出的基于忆阻器的近似计算框架上实现HMAX模型,其功率效率比纯数字实现提高了22倍。
{"title":"Memristor-based approximated computation","authors":"Boxun Li, Yi Shan, Miao Hu, Yu Wang, Yiran Chen, Huazhong Yang","doi":"10.1109/ISLPED.2013.6629302","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629302","url":null,"abstract":"The cessation of Moore's Law has limited further improvements in power efficiency. In recent years, the physical realization of the memristor has demonstrated a promising solution to ultra-integrated hardware realization of neural networks, which can be leveraged for better performance and power efficiency gains. In this work, we introduce a power efficient framework for approximated computations by taking advantage of the memristor-based multilayer neural networks. A programmable memristor approximated computation unit (Memristor ACU) is introduced first to accelerate approximated computation and a memristor-based approximated computation framework with scalability is proposed on top of the Memristor ACU. We also introduce a parameter configuration algorithm of the Memristor ACU and a feedback state tuning circuit to program the Memristor ACU effectively. Our simulation results show that the maximum error of the Memristor ACU for 6 common complex functions is only 1.87% while the state tuning circuit can achieve 12-bit precision. The implementation of HMAX model atop our proposed memristor-based approximated computation framework demonstrates 22× power efficiency improvements than its pure digital implementation counterpart.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74578326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 83
SRAM cell optimization for low AVT transistors 低AVT晶体管的SRAM单元优化
L. Clark, S. Leshner, G. Tien
In this paper, we describe a six-transistor static random access memory (SRAM) cell optimization methodology for transistors with significantly improved matching, while maintaining compatibility with the baseline design. We briefly describe the reduced AVT transistors and show that they allow substantially improved minimum SRAM operating voltage (Vmin) and improved array leakage. Using an efficient design of experiments (DOE) factorial as a pseudo-Monte Carlo generator, points on the tail of the distribution are directly simulated. The highly efficient method is shown to allow optimization and `what if' scenario investigations. Simulation and silicon results on a 65-nm process as well as simulation results on a 28-nm process are shown.
在本文中,我们描述了一种六晶体管静态随机存取存储器(SRAM)单元优化方法,该方法可以显著改善晶体管的匹配,同时保持与基线设计的兼容性。我们简要地描述了减小的AVT晶体管,并表明它们允许大幅提高最小SRAM工作电压(Vmin)和改进的阵列泄漏。采用一种有效的实验因子设计(DOE)作为伪蒙特卡罗发生器,直接模拟了分布尾部的点。高效的方法被证明可以进行优化和“假设”场景调查。给出了在65纳米制程上的模拟结果和在28纳米制程上的模拟结果。
{"title":"SRAM cell optimization for low AVT transistors","authors":"L. Clark, S. Leshner, G. Tien","doi":"10.1109/ISLPED.2013.6629267","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629267","url":null,"abstract":"In this paper, we describe a six-transistor static random access memory (SRAM) cell optimization methodology for transistors with significantly improved matching, while maintaining compatibility with the baseline design. We briefly describe the reduced AVT transistors and show that they allow substantially improved minimum SRAM operating voltage (Vmin) and improved array leakage. Using an efficient design of experiments (DOE) factorial as a pseudo-Monte Carlo generator, points on the tail of the distribution are directly simulated. The highly efficient method is shown to allow optimization and `what if' scenario investigations. Simulation and silicon results on a 65-nm process as well as simulation results on a 28-nm process are shown.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75096222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An ultralow-power memory-based big-data computing platform by nonvolatile domain-wall nanowire devices 基于非易失性畴壁纳米线器件的超低功耗存储器大数据计算平台
Yuhao Wang, Hao Yu
As one recently introduced non-volatile memory (NVM) device, domain-wall nanowire (or race-track) has shown potential for main memory storage but also computing capability. In this paper, the domain-wall nanowire is studied for a memory-based computing platform towards ultra-low-power big-data processing. One domain-wall nanowire based logic-in-memory architecture is proposed for big-data processing, where the domain-wall nanowire memory is deployed as main memory for data storage as well as XOR-logic for comparison and addition operations. The domain-wall nanowire based logic-in-memory circuits are evaluated by SPICE-level verifications. Further evaluated by applications of general-purpose SPEC2006 benchmark and also web-searching oriented Phoenix benchmark, the proposed computing platform can exhibit a significant power saving on both main memory and ALU under the similar performance when compared to CMOS based designs.
作为最近推出的一种非易失性存储器(NVM)设备,畴壁纳米线(或赛道)已经显示出主存储器存储和计算能力的潜力。本文研究了面向超低功耗大数据处理的基于内存的计算平台的畴壁纳米线。提出了一种基于域壁纳米线的大数据处理逻辑内存架构,其中域壁纳米线内存作为主存储器用于数据存储,异或逻辑用于比较和加法运算。通过spice级验证对基于畴壁纳米线的内存逻辑电路进行了评估。在通用SPEC2006基准测试和面向web搜索的Phoenix基准测试中进一步评估,与基于CMOS的设计相比,所提出的计算平台在性能相似的情况下,在主存和ALU上都能显著节省功耗。
{"title":"An ultralow-power memory-based big-data computing platform by nonvolatile domain-wall nanowire devices","authors":"Yuhao Wang, Hao Yu","doi":"10.5555/2648668.2648748","DOIUrl":"https://doi.org/10.5555/2648668.2648748","url":null,"abstract":"As one recently introduced non-volatile memory (NVM) device, domain-wall nanowire (or race-track) has shown potential for main memory storage but also computing capability. In this paper, the domain-wall nanowire is studied for a memory-based computing platform towards ultra-low-power big-data processing. One domain-wall nanowire based logic-in-memory architecture is proposed for big-data processing, where the domain-wall nanowire memory is deployed as main memory for data storage as well as XOR-logic for comparison and addition operations. The domain-wall nanowire based logic-in-memory circuits are evaluated by SPICE-level verifications. Further evaluated by applications of general-purpose SPEC2006 benchmark and also web-searching oriented Phoenix benchmark, the proposed computing platform can exhibit a significant power saving on both main memory and ALU under the similar performance when compared to CMOS based designs.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81631046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Breaking the boundary for whole-system performance optimization of big data 突破大数据全系统性能优化的边界
Yan Li, Kun Wang, Qi Guo, Xin Li, Xiaochen Zhang, Guancheng Chen, Tao Liu, Jian Li
MapReduce plays an critical role in finding insights in Big Data. The performance optimization of MapReduce programs is challenging because it requires a comprehensive understanding of the whole system including both hardware layers (processors, storages, networks and etc), and software stacks (operating systems, JVM, runtime, applications and etc). However, most of the existing performance tuning and optimization are based on empirical and heuristic attempts. It remains a blank on how to build a systematical framework which breaks the boundary of multiple layers for performance optimization. In this paper, we propose a performance evaluation framework by correlating performance metrics from different layers, which provides insights to efficiently pinpoint the performance issue. This framework is composed of a series of predefined patterns. Each pattern indicates one or more potential issues. The behavior of a MapReduce program is mapped to the corresponding resource utilization. The framework provides a holistic approach which allows users at different levels of experience to conduct MapReduce program performance optimization. We use Terasort benchmark running on a 10-node Power7R2 cluster as a real case to show how this framework improves the performance. By this framework, we finally get the Terasort result improved from 47 mins to less than 8 mins. In addition to the best practice on performance tuning, several key findings are summarized as valuable workload analysis for JVM, MapReduce runtime and application design.
MapReduce在寻找大数据洞察力方面发挥着关键作用。MapReduce程序的性能优化是具有挑战性的,因为它需要全面了解整个系统,包括硬件层(处理器、存储、网络等)和软件堆栈(操作系统、JVM、运行时、应用程序等)。然而,大多数现有的性能调优和优化都是基于经验和启发式的尝试。如何建立一个打破多层边界的系统框架来进行性能优化,目前仍是一个空白。在本文中,我们通过关联不同层次的绩效指标提出了一个绩效评估框架,该框架为有效地确定绩效问题提供了见解。该框架由一系列预定义的模式组成。每个模式表示一个或多个潜在问题。MapReduce程序的行为映射到相应的资源利用率。该框架提供了一个整体的方法,允许不同经验水平的用户进行MapReduce程序性能优化。我们使用在10个节点的Power7R2集群上运行的Terasort基准测试作为实际案例,以展示该框架如何提高性能。通过这个框架,我们最终得到了Terasort结果从47分钟提高到不到8分钟。除了关于性能调优的最佳实践之外,本文还总结了几个重要的发现,这些发现是对JVM、MapReduce运行时和应用程序设计有价值的工作负载分析。
{"title":"Breaking the boundary for whole-system performance optimization of big data","authors":"Yan Li, Kun Wang, Qi Guo, Xin Li, Xiaochen Zhang, Guancheng Chen, Tao Liu, Jian Li","doi":"10.5555/2648668.2648699","DOIUrl":"https://doi.org/10.5555/2648668.2648699","url":null,"abstract":"MapReduce plays an critical role in finding insights in Big Data. The performance optimization of MapReduce programs is challenging because it requires a comprehensive understanding of the whole system including both hardware layers (processors, storages, networks and etc), and software stacks (operating systems, JVM, runtime, applications and etc). However, most of the existing performance tuning and optimization are based on empirical and heuristic attempts. It remains a blank on how to build a systematical framework which breaks the boundary of multiple layers for performance optimization. In this paper, we propose a performance evaluation framework by correlating performance metrics from different layers, which provides insights to efficiently pinpoint the performance issue. This framework is composed of a series of predefined patterns. Each pattern indicates one or more potential issues. The behavior of a MapReduce program is mapped to the corresponding resource utilization. The framework provides a holistic approach which allows users at different levels of experience to conduct MapReduce program performance optimization. We use Terasort benchmark running on a 10-node Power7R2 cluster as a real case to show how this framework improves the performance. By this framework, we finally get the Terasort result improved from 47 mins to less than 8 mins. In addition to the best practice on performance tuning, several key findings are summarized as valuable workload analysis for JVM, MapReduce runtime and application design.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84956614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Semiconductor spintronics: switching spins at low voltage 半导体自旋电子学:在低电压下切换自旋
G. Salis
The emerging ability to measure and control electron spins in nano-structured materials down to the level of single spins is at the heart of the research field of spintronics, with potential applications in logic and quantum computation. In current semiconductor-based logic devices, the electron spin is a quantity that is mostly neglected. The switching functionality of a conventional field-effect transistor is based on charging a channel region with electrons. For a given on/off ratio of the source-drain current, the switching requires a minimum voltage swing related to the thermal energy, which sets a lower limit on the active power consumption of the device. Such a principal limitation is not present if the spin direction of electrons is switched. This observation has triggered huge interest in spintronics as a low-power alternative for logic devices. With the example of existing spintronics device concept, the challenges for using spin switches in logic applications will be discussed. Very large spin filtering efficiencies are needed to use a spin switch as a drop-in replacement for FET-based current switches, setting demanding requirements for the processes of spin injection and detection. An alternative approach is to encode the digital information directly into the spin state and omit excess spin-to-charge conversion, which however requires the development of spin amplification to achieve gain in the spin domain. Many spintronics device concepts comprise nonmagnetic regions where non-equilibrium spin polarization is switched by electrical fields. There, the spins have to be processed within the respective spin lifetime. We will discuss how spin-orbit interaction limits the spin lifetime but at the same time is needed for electrical spin switching. Experimental results based on time-resolved magneto-optical Kerr rotation will be shown that demonstrate fast switching of spins in GaAs-based semiconductor quantum structures with specially engineered spin-orbit interaction where the influence on the spin lifetime is lifted.
测量和控制纳米结构材料中电子自旋到单自旋水平的新能力是自旋电子学研究领域的核心,在逻辑和量子计算中具有潜在的应用。在目前基于半导体的逻辑器件中,电子自旋是一个通常被忽略的量。传统场效应晶体管的开关功能是基于对沟道区域进行电子充电。对于给定的源漏电流的开/关比,开关需要与热能相关的最小电压摆幅,这设置了器件有功功耗的下限。如果改变了电子的自旋方向,则不存在这种主要限制。这一发现引发了人们对自旋电子学作为低功耗逻辑器件替代品的巨大兴趣。以现有的自旋电子学器件概念为例,讨论了自旋开关在逻辑应用中的挑战。使用自旋开关来替代基于fet的电流开关需要非常高的自旋滤波效率,这对自旋注入和检测过程提出了苛刻的要求。另一种方法是将数字信息直接编码到自旋态中,忽略多余的自旋到电荷的转换,但这需要发展自旋放大来实现自旋域的增益。许多自旋电子学器件概念包括非磁性区域,其中非平衡自旋极化被电场切换。在那里,必须在各自的自旋寿命内处理自旋。我们将讨论自旋轨道相互作用如何限制自旋寿命,但同时又是电自旋开关所必需的。基于时间分辨磁光克尔旋转的实验结果将显示,在特殊设计的自旋轨道相互作用下,gaas基半导体量子结构中自旋的快速切换对自旋寿命的影响被解除。
{"title":"Semiconductor spintronics: switching spins at low voltage","authors":"G. Salis","doi":"10.1109/ISLPED.2013.6629283","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629283","url":null,"abstract":"The emerging ability to measure and control electron spins in nano-structured materials down to the level of single spins is at the heart of the research field of spintronics, with potential applications in logic and quantum computation. In current semiconductor-based logic devices, the electron spin is a quantity that is mostly neglected. The switching functionality of a conventional field-effect transistor is based on charging a channel region with electrons. For a given on/off ratio of the source-drain current, the switching requires a minimum voltage swing related to the thermal energy, which sets a lower limit on the active power consumption of the device. Such a principal limitation is not present if the spin direction of electrons is switched. This observation has triggered huge interest in spintronics as a low-power alternative for logic devices.\u0000 With the example of existing spintronics device concept, the challenges for using spin switches in logic applications will be discussed. Very large spin filtering efficiencies are needed to use a spin switch as a drop-in replacement for FET-based current switches, setting demanding requirements for the processes of spin injection and detection. An alternative approach is to encode the digital information directly into the spin state and omit excess spin-to-charge conversion, which however requires the development of spin amplification to achieve gain in the spin domain.\u0000 Many spintronics device concepts comprise nonmagnetic regions where non-equilibrium spin polarization is switched by electrical fields. There, the spins have to be processed within the respective spin lifetime. We will discuss how spin-orbit interaction limits the spin lifetime but at the same time is needed for electrical spin switching. Experimental results based on time-resolved magneto-optical Kerr rotation will be shown that demonstrate fast switching of spins in GaAs-based semiconductor quantum structures with specially engineered spin-orbit interaction where the influence on the spin lifetime is lifted.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80448700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Compiler assisted dynamic register file in GPGPU GPGPU中的编译器辅助动态寄存器文件
Naifeng Jing, Haopeng Liu, Yao Lu, Xiaoyao Liang
The large Register File (RF) in General Purpose Graphic Processing Units (GPGPUs) demands tremendous chip area and energy consumption. For a sustainable growth of the size of RF in future GPGPUs, emerging on-chip memory technologies such as embedded-DRAM (eDRAM) have been proposed to replace the conventional SRAM for higher density and lower leakage but with the possible penalty from the periodic refresh operations. This paper explicitly shows that the refresh penalty can be effectively mitigated by leveraging the uniqueness of GPGPU operations. A compiler assisted refresh rescheduling policy can greatly reduce the refresh overhead for maintaining the correctness of the RF operations. The proposed scheme adequately exploits the features in both architecture and compilation, and delivers comparable performance to the SRAM counterpart. At the same time, the energy savings via the removal of large SRAM leakage well compensate for the additional refresh energy. This study promotes the eDRAM-based RF as a promising alternative that enables larger capacity and better power efficiency for future GPGPUs.
通用图形处理单元(gpgpu)中的大寄存器文件(RF)需要巨大的芯片面积和能量消耗。为了在未来的gpgpu中实现RF尺寸的可持续增长,已经提出了诸如嵌入式dram (eDRAM)之类的新兴片上存储技术来取代传统的SRAM,以获得更高的密度和更低的泄漏,但可能会受到周期性刷新操作的影响。本文明确地表明,通过利用GPGPU操作的唯一性,可以有效地减轻刷新损失。编译器辅助的刷新重调度策略可以大大减少维护RF操作正确性的刷新开销。所提出的方案充分利用了体系结构和编译方面的特性,并提供了与SRAM相当的性能。同时,通过去除大量SRAM泄漏而节省的能源很好地补偿了额外的刷新能量。这项研究促进了基于edram的射频作为一个有前途的替代方案,为未来的gpgpu提供更大的容量和更好的功率效率。
{"title":"Compiler assisted dynamic register file in GPGPU","authors":"Naifeng Jing, Haopeng Liu, Yao Lu, Xiaoyao Liang","doi":"10.1109/ISLPED.2013.6629258","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629258","url":null,"abstract":"The large Register File (RF) in General Purpose Graphic Processing Units (GPGPUs) demands tremendous chip area and energy consumption. For a sustainable growth of the size of RF in future GPGPUs, emerging on-chip memory technologies such as embedded-DRAM (eDRAM) have been proposed to replace the conventional SRAM for higher density and lower leakage but with the possible penalty from the periodic refresh operations. This paper explicitly shows that the refresh penalty can be effectively mitigated by leveraging the uniqueness of GPGPU operations. A compiler assisted refresh rescheduling policy can greatly reduce the refresh overhead for maintaining the correctness of the RF operations. The proposed scheme adequately exploits the features in both architecture and compilation, and delivers comparable performance to the SRAM counterpart. At the same time, the energy savings via the removal of large SRAM leakage well compensate for the additional refresh energy. This study promotes the eDRAM-based RF as a promising alternative that enables larger capacity and better power efficiency for future GPGPUs.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78967576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Challenges on designing electrostatic discharge protection solutions for low power electronics 设计低功耗电子产品静电放电保护解决方案的挑战
J. Liou
Electrostatic discharge (ESD) is a process in which a finite amount of charge is transferred from one object (i.e., human body) to the other (i.e., microchip). This process can result in a very high current passing through the object within a very short period of time [1-2]. When a microchip or electronic system is subject to an ESD event, the huge ESD-induced current can likely damage the microchip and cause malfunction to the electronic system if the heat generated in the object cannot be dissipated quickly enough. It is estimated that about 35% of all damaged microchips are ESD related, resulting in a revenue loss of several hundred million dollars in the global semiconductor industry every year [3]. The continuing diminishing in the size of MOS devices makes the ESD-induced failures even more prominent, and one can predict with certainty that the availability of effective and robust ESD protection solutions will become a critical component to the successful development of the CMOS-based integrated circuits [4-7].
静电放电(ESD)是将有限数量的电荷从一个物体(即人体)转移到另一个物体(即微芯片)的过程。这个过程可以导致在很短的时间内通过物体的非常大的电流[1-2]。当微芯片或电子系统遭受ESD事件时,如果物体产生的热量不能迅速消散,则巨大的ESD感应电流可能会损坏微芯片并导致电子系统故障。据估计,所有损坏的微芯片中约有35%与ESD有关,导致全球半导体行业每年损失数亿美元的收入。随着MOS器件尺寸的不断缩小,ESD引起的故障更加突出,可以肯定地预测,有效和强大的ESD保护解决方案的可用性将成为成功开发基于cmos的集成电路的关键因素[4-7]。
{"title":"Challenges on designing electrostatic discharge protection solutions for low power electronics","authors":"J. Liou","doi":"10.1109/ISLPED.2013.6629303","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629303","url":null,"abstract":"Electrostatic discharge (ESD) is a process in which a finite amount of charge is transferred from one object (i.e., human body) to the other (i.e., microchip). This process can result in a very high current passing through the object within a very short period of time [1-2]. When a microchip or electronic system is subject to an ESD event, the huge ESD-induced current can likely damage the microchip and cause malfunction to the electronic system if the heat generated in the object cannot be dissipated quickly enough. It is estimated that about 35% of all damaged microchips are ESD related, resulting in a revenue loss of several hundred million dollars in the global semiconductor industry every year [3]. The continuing diminishing in the size of MOS devices makes the ESD-induced failures even more prominent, and one can predict with certainty that the availability of effective and robust ESD protection solutions will become a critical component to the successful development of the CMOS-based integrated circuits [4-7].","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87069360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Understanding the critical path in power state transition latencies 理解电源状态转换延迟中的关键路径
S. Xi, Marisabel Guevara, Jared Nelson, Patrick Pensabene, Benjamin C. Lee
Increasing demands on datacenter computing prompts research in energy-efficient warehouse scale systems. In one approach, server activation policies invoke low-power sleep states but the power state transition latency must be small to produce effective energy savings. Chrome OS and Arch Linux require 50ms and 650ms, respectively, to enter sleep states. These states consume merely 4-6% of nominal power. By analyzing the critical path, we propose strategies for selecting hardware components and optimizing kernel resume sequences to make datacenter server activation viable. With fast transitions, server activation can provide better performance at lower energy than dynamic voltage and frequency scaling.
日益增长的数据中心计算需求促使人们研究高效节能的仓库规模系统。在一种方法中,服务器激活策略调用低功耗睡眠状态,但功耗状态转换延迟必须很小,以产生有效的节能。Chrome OS和Arch Linux分别需要50ms和650ms才能进入睡眠状态。这些状态仅消耗名义功率的4-6%。通过分析关键路径,我们提出了选择硬件组件和优化内核恢复序列的策略,以使数据中心服务器激活可行。通过快速转换,服务器激活可以在较低的能量下提供比动态电压和频率缩放更好的性能。
{"title":"Understanding the critical path in power state transition latencies","authors":"S. Xi, Marisabel Guevara, Jared Nelson, Patrick Pensabene, Benjamin C. Lee","doi":"10.1109/ISLPED.2013.6629316","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629316","url":null,"abstract":"Increasing demands on datacenter computing prompts research in energy-efficient warehouse scale systems. In one approach, server activation policies invoke low-power sleep states but the power state transition latency must be small to produce effective energy savings. Chrome OS and Arch Linux require 50ms and 650ms, respectively, to enter sleep states. These states consume merely 4-6% of nominal power. By analyzing the critical path, we propose strategies for selecting hardware components and optimizing kernel resume sequences to make datacenter server activation viable. With fast transitions, server activation can provide better performance at lower energy than dynamic voltage and frequency scaling.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90584357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Design and analysis of 3D IC-based low power stereo matching processors 基于三维集成电路的低功耗立体匹配处理器的设计与分析
Seung-Ho Ok, Kyeong-Ryeol Bae, S. Lim, Byungin Moon
This paper presents comprehensive design and analysis results of 3D IC-based low-power stereo matching processors. Our design efforts range from architecture design and verification to RTL-to-GDSII design and sign-off analysis based on GlobalFoundries 130-nm PDK. We conduct comprehensive studies on the area, performance, and power benefits of our 3D IC designs over 2D IC designs. Our 2-tier 3D IC designs attain 43% area, 14% wire length, and 13% power saving over 2D IC designs. We also study a pipeline-based partitioning method shown to be effective at minimizing power consumption and the total number of TSVs while balancing the size of each tier.
本文给出了基于三维集成电路的低功耗立体匹配处理器的综合设计与分析结果。我们的设计工作范围从架构设计和验证到基于GlobalFoundries 130-nm PDK的RTL-to-GDSII设计和签署分析。我们对3D IC设计与2D IC设计相比的面积、性能和功耗优势进行了全面的研究。与2D IC设计相比,我们的2层3D IC设计实现了43%的面积,14%的线长和13%的功耗节约。我们还研究了一种基于管道的分区方法,该方法在平衡每个层的大小的同时,有效地减少了功耗和tsv的总数。
{"title":"Design and analysis of 3D IC-based low power stereo matching processors","authors":"Seung-Ho Ok, Kyeong-Ryeol Bae, S. Lim, Byungin Moon","doi":"10.1109/ISLPED.2013.6629260","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629260","url":null,"abstract":"This paper presents comprehensive design and analysis results of 3D IC-based low-power stereo matching processors. Our design efforts range from architecture design and verification to RTL-to-GDSII design and sign-off analysis based on GlobalFoundries 130-nm PDK. We conduct comprehensive studies on the area, performance, and power benefits of our 3D IC designs over 2D IC designs. Our 2-tier 3D IC designs attain 43% area, 14% wire length, and 13% power saving over 2D IC designs. We also study a pipeline-based partitioning method shown to be effective at minimizing power consumption and the total number of TSVs while balancing the size of each tier.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86981746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1