首页 > 最新文献

ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.最新文献

英文 中文
Power reduction by varying sampling rate 通过改变采样率来降低功率
W. Dieter, S. Datta, Wong Key Kai
The rate at which a digital signal processing (DSP) system operates depends on the highest frequency component in the input signal. DSP applications must sample their inputs at a frequency at least twice the highest frequency in the input signal (i.e., the Nyquist rate) to accurately reproduce the signal. Typically a fixed sampling rate, guaranteed to always be high enough, is used. However, an input signal may have periods when the signal has little high frequency content as well as periods of silence. When the input signal has no perceptible high frequency components, the system can reduce its sampling rate, thereby reducing the number of samples processed per second, allowing the CPU speed to be scaled down without reducing output quality. This paper describes how to reduce power consumption in DSP applications by varying the amount of processing based on the input signal, and reports results of experiments with a prototype implementation. Experiments with a prototype show that when the system performs little processing, the added overhead of the variable sampling rate technique increased power consumption. When the system performs more processing, 18 FIR filters per frame, the power consumption was reduced to 40 % of the power required for a static sampling rate, while not reducing sound quality.
数字信号处理(DSP)系统的工作速率取决于输入信号中的最高频率分量。DSP应用必须以输入信号中最高频率(即奈奎斯特速率)的至少两倍的频率对其输入进行采样,以准确地再现信号。通常使用固定的采样率,以保证始终足够高。然而,一个输入信号可能有一些周期,当信号有很少的高频内容,以及沉默的周期。当输入信号没有可察觉的高频成分时,系统可以降低其采样率,从而减少每秒处理的采样数,从而在不降低输出质量的情况下降低CPU速度。本文介绍了如何根据输入信号改变处理量来降低DSP应用中的功耗,并报告了一个原型实现的实验结果。样机实验表明,当系统执行少量处理时,可变采样率技术所增加的开销增加了功耗。当系统执行更多的处理时,每帧18个FIR滤波器,功耗降低到静态采样率所需功率的40%,同时不降低声音质量。
{"title":"Power reduction by varying sampling rate","authors":"W. Dieter, S. Datta, Wong Key Kai","doi":"10.1145/1077603.1077658","DOIUrl":"https://doi.org/10.1145/1077603.1077658","url":null,"abstract":"The rate at which a digital signal processing (DSP) system operates depends on the highest frequency component in the input signal. DSP applications must sample their inputs at a frequency at least twice the highest frequency in the input signal (i.e., the Nyquist rate) to accurately reproduce the signal. Typically a fixed sampling rate, guaranteed to always be high enough, is used. However, an input signal may have periods when the signal has little high frequency content as well as periods of silence. When the input signal has no perceptible high frequency components, the system can reduce its sampling rate, thereby reducing the number of samples processed per second, allowing the CPU speed to be scaled down without reducing output quality. This paper describes how to reduce power consumption in DSP applications by varying the amount of processing based on the input signal, and reports results of experiments with a prototype implementation. Experiments with a prototype show that when the system performs little processing, the added overhead of the variable sampling rate technique increased power consumption. When the system performs more processing, 18 FIR filters per frame, the power consumption was reduced to 40 % of the power required for a static sampling rate, while not reducing sound quality.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125878180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Self-timed circuits for energy harvesting AC power supplies 用于能量收集交流电源的自定时电路
J. Siebert, J. Collier, R. Amirtharajah
The recent explosion in capability of embedded and portable electronics has not been matched by battery technology. The slow growth of battery energy density has limited device lifetime and added weight and volume. Passive energy harvesting from vibration has potentially wide application in wearable and embedded sensors to complement or replace batteries. The authors proposed increasing energy harvesting efficiency by eliminating AC/DC conversion electronics. Self-timed circuits, power-on-reset circuitry and memory for energy harvesting AC power supplies has been investigated. The power-on-reset circuit achieves a substantial improvement over conventional approaches with 4.1 nW of simulated power dissipation and frequency-independent turn-on voltage. A chip is being fabricated to test the circuits presented here.
最近嵌入式和便携式电子设备的爆炸式发展并没有与电池技术相匹配。电池能量密度的缓慢增长限制了设备的使用寿命,增加了重量和体积。从振动中获取被动能量在可穿戴和嵌入式传感器中有广泛的应用前景,可以补充或替代电池。作者建议通过消除交流/直流转换电子设备来提高能量收集效率。研究了用于能量收集交流电源的自定时电路、电源复位电路和存储器。与传统方法相比,电源接通复位电路实现了实质性的改进,具有4.1 nW的模拟功耗和与频率无关的接通电压。正在制造一个芯片来测试这里提出的电路。
{"title":"Self-timed circuits for energy harvesting AC power supplies","authors":"J. Siebert, J. Collier, R. Amirtharajah","doi":"10.1145/1077603.1077678","DOIUrl":"https://doi.org/10.1145/1077603.1077678","url":null,"abstract":"The recent explosion in capability of embedded and portable electronics has not been matched by battery technology. The slow growth of battery energy density has limited device lifetime and added weight and volume. Passive energy harvesting from vibration has potentially wide application in wearable and embedded sensors to complement or replace batteries. The authors proposed increasing energy harvesting efficiency by eliminating AC/DC conversion electronics. Self-timed circuits, power-on-reset circuitry and memory for energy harvesting AC power supplies has been investigated. The power-on-reset circuit achieves a substantial improvement over conventional approaches with 4.1 nW of simulated power dissipation and frequency-independent turn-on voltage. A chip is being fabricated to test the circuits presented here.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128287291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Coordinated, distributed, formal energy management of chip multiprocessors 芯片多处理器协调、分布式、形式化的能量管理
Philo Juang, Qiang Wu, L. Peh, M. Martonosi, D. Clark
Designers are moving toward chip-multiprocessors (CMPs) to leverage application parallelism for higher performance while keeping design complexity under control. However, to date, no power management techniques have been proposed for coordinated power control of multiple processor cores. In this paper, we illustrate how the use of local, per-tile dynamic voltage and frequency scaling (DVFS) techniques can result in tiles counteracting each others' power management policies, significantly hurting chip power-performance. We then propose a coordinated DVFS scheme for CMPs, which eliminates the oscillations and ensures efficient and resilient DVFS control. Specifically, our proposed technique incorporates thread information collected at runtime across the chip. In addition, by extending a control-theoretic local DVFS control technique toward DVFS for chip-multiprocessors, our technique prescribes DVFS settings formally at each tile, thus ensuring stable, distributed, coordinated DVFS control of a CMP. Experimental results show that our technique achieves a 15.5% improvement in energy-delay product over a CMP with no DVFS control, and a 1% improvement in energy-delay product against the latest state-of-the-art local DVFS scheme.
设计人员正在转向芯片多处理器(cmp),以利用应用程序并行性来提高性能,同时控制设计复杂性。然而,到目前为止,还没有针对多处理器核心的协调电源控制提出电源管理技术。在本文中,我们说明了使用局部的、逐片动态电压和频率缩放(DVFS)技术如何导致片相互抵消电源管理策略,从而严重损害芯片的电源性能。然后,我们提出了一种用于cmp的协调DVFS方案,该方案消除了振荡并确保了有效和有弹性的DVFS控制。具体来说,我们提出的技术结合了在整个芯片运行时收集的线程信息。此外,通过将控制理论的局部DVFS控制技术扩展到芯片多处理器的DVFS,我们的技术在每个块上正式规定了DVFS设置,从而确保了CMP的稳定、分布式、协调的DVFS控制。实验结果表明,与没有DVFS控制的CMP相比,该技术的能量延迟积提高了15.5%,与最新的本地DVFS方案相比,该技术的能量延迟积提高了1%。
{"title":"Coordinated, distributed, formal energy management of chip multiprocessors","authors":"Philo Juang, Qiang Wu, L. Peh, M. Martonosi, D. Clark","doi":"10.1145/1077603.1077637","DOIUrl":"https://doi.org/10.1145/1077603.1077637","url":null,"abstract":"Designers are moving toward chip-multiprocessors (CMPs) to leverage application parallelism for higher performance while keeping design complexity under control. However, to date, no power management techniques have been proposed for coordinated power control of multiple processor cores. In this paper, we illustrate how the use of local, per-tile dynamic voltage and frequency scaling (DVFS) techniques can result in tiles counteracting each others' power management policies, significantly hurting chip power-performance. We then propose a coordinated DVFS scheme for CMPs, which eliminates the oscillations and ensures efficient and resilient DVFS control. Specifically, our proposed technique incorporates thread information collected at runtime across the chip. In addition, by extending a control-theoretic local DVFS control technique toward DVFS for chip-multiprocessors, our technique prescribes DVFS settings formally at each tile, thus ensuring stable, distributed, coordinated DVFS control of a CMP. Experimental results show that our technique achieves a 15.5% improvement in energy-delay product over a CMP with no DVFS control, and a 1% improvement in energy-delay product against the latest state-of-the-art local DVFS scheme.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134197813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 99
A low-power, multichannel gated oscillator-based CDR for short-haul applications 低功耗,多通道门控振荡器的CDR短距离应用
A. Tajalli, P. Muller, S. M. Atarodi, Y. Leblebici
A gated current-controlled oscillator (GCCO) based topology is used to implement a low-power multi-channel clock and data recovery (CDR) system in a 0.18/spl mu/m digital CMOS technology. A systematic approach is presented to design a reliable and low-power system based on the required specifications. Behavioral simulations are also used to estimate the achievable bit error rate (BER), jitter tolerance (JTOL), and frequency offset tolerance (FTOL) of the proposed CDR. Using a single 1.8 V supply voltage, the proposed 20Gbps 8-channel CDR consumes only 70.2mW or 3.51 mW/channel/Gbps while occupies 0.045mm/sup 2/ silicon area.
采用基于门控电流控制振荡器(GCCO)的拓扑结构,在0.18/spl mu/m的数字CMOS技术下实现了低功耗多通道时钟和数据恢复(CDR)系统。提出了一种系统的方法,根据要求设计一个可靠的低功耗系统。行为模拟还用于估计CDR的可实现误码率(BER)、抖动容限(JTOL)和频偏容限(FTOL)。使用单个1.8 V电源电压,所提出的20Gbps 8通道CDR仅消耗70.2mW或3.51 mW/通道/Gbps,而占用0.045mm/sup 2/硅面积。
{"title":"A low-power, multichannel gated oscillator-based CDR for short-haul applications","authors":"A. Tajalli, P. Muller, S. M. Atarodi, Y. Leblebici","doi":"10.1145/1077603.1077631","DOIUrl":"https://doi.org/10.1145/1077603.1077631","url":null,"abstract":"A gated current-controlled oscillator (GCCO) based topology is used to implement a low-power multi-channel clock and data recovery (CDR) system in a 0.18/spl mu/m digital CMOS technology. A systematic approach is presented to design a reliable and low-power system based on the required specifications. Behavioral simulations are also used to estimate the achievable bit error rate (BER), jitter tolerance (JTOL), and frequency offset tolerance (FTOL) of the proposed CDR. Using a single 1.8 V supply voltage, the proposed 20Gbps 8-channel CDR consumes only 70.2mW or 3.51 mW/channel/Gbps while occupies 0.045mm/sup 2/ silicon area.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116195171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effectiveness of low power dual-V/sub t/ designs in nano-scale technologies under process parameter variations 工艺参数变化下低功耗双v /sub /设计在纳米技术中的有效性
A. Agarwal, Kunhyuk Kang, S. Bhunia, J. D. Gallagher, K. Roy
This paper explores the effectiveness of dual-V/sub t/ design under aggressive scaling of technology, which results in significant increase in all components of leakage (subthreshold, gate and junction tunneling) while having large variations in process parameters. The present way of realizing high-V/sub t/ devices results in high junction tunneling leakage compared to low-V/sub t/, devices, which in turn may result in negligible leakage savings for dual-V/sub t/, designs in scaled technologies. Moreover, increase in process variation severely affects the yield of such designs. This paper suggests important measures that need to be incorporated in conventional dual-V/sub t/, design to achieve total leakage power improvement while ensuring yield. It also shows that different process options, such as metal gate work function engineering, are required to realize high-performance and low-leakage dual- V/sub t/ designs in sub-50nm technologies.
本文探讨了双v /sub /设计在积极缩放技术下的有效性,这导致泄漏的所有组成部分(亚阈值,栅极和结隧道)显着增加,同时工艺参数变化很大。与低v /sub - t/,器件相比,目前实现高v /sub - t/器件的方法导致高结隧穿漏,这反过来可能导致双v /sub - t/,设计在规模技术中可以忽略忽略的泄漏节省。此外,工艺变化的增加严重影响了此类设计的良率。本文提出了在传统的双v /sub /设计中,在保证良率的同时,提高总泄漏功率需要采取的重要措施。这也表明,在亚50nm技术中实现高性能和低泄漏的双V/sub / t/设计需要不同的工艺选择,例如金属栅功功能工程。
{"title":"Effectiveness of low power dual-V/sub t/ designs in nano-scale technologies under process parameter variations","authors":"A. Agarwal, Kunhyuk Kang, S. Bhunia, J. D. Gallagher, K. Roy","doi":"10.1145/1077603.1077609","DOIUrl":"https://doi.org/10.1145/1077603.1077609","url":null,"abstract":"This paper explores the effectiveness of dual-V/sub t/ design under aggressive scaling of technology, which results in significant increase in all components of leakage (subthreshold, gate and junction tunneling) while having large variations in process parameters. The present way of realizing high-V/sub t/ devices results in high junction tunneling leakage compared to low-V/sub t/, devices, which in turn may result in negligible leakage savings for dual-V/sub t/, designs in scaled technologies. Moreover, increase in process variation severely affects the yield of such designs. This paper suggests important measures that need to be incorporated in conventional dual-V/sub t/, design to achieve total leakage power improvement while ensuring yield. It also shows that different process options, such as metal gate work function engineering, are required to realize high-performance and low-leakage dual- V/sub t/ designs in sub-50nm technologies.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134159653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Inter-program optimizations for conserving disk energy 程序间优化,以节省磁盘能量
Jerry Hom, U. Kremer
Previous work has shown that intra-program optimizations, i.e., optimizations performed on individual programs in isolation, can be very effective in reducing disk energy in streaming applications. This paper investigates the potential additional benefits of inter-program optimizations where sets of programs are optimized together. Experimental results on different subsets of three streaming applications show that 7-49% additional energy savings (27.3% on average) can be obtained with negligible performance penalties using two novel inter-program optimizations, namely execution context sensitive buffer size selection and inverse barrier synchronization. These figures were obtained via physical measurements on two laptop disks.
以前的工作表明,程序内优化,即在单独的程序上执行的优化,可以非常有效地减少流应用程序中的磁盘能量。本文研究了程序间优化的潜在额外好处,其中程序集被一起优化。在三种流应用的不同子集上的实验结果表明,使用两种新的程序间优化,即执行上下文敏感缓冲区大小选择和逆屏障同步,可以在忽略性能损失的情况下获得7-49%的额外能源节约(平均27.3%)。这些数字是通过在两个笔记本电脑磁盘上进行物理测量获得的。
{"title":"Inter-program optimizations for conserving disk energy","authors":"Jerry Hom, U. Kremer","doi":"10.1145/1077603.1077684","DOIUrl":"https://doi.org/10.1145/1077603.1077684","url":null,"abstract":"Previous work has shown that intra-program optimizations, i.e., optimizations performed on individual programs in isolation, can be very effective in reducing disk energy in streaming applications. This paper investigates the potential additional benefits of inter-program optimizations where sets of programs are optimized together. Experimental results on different subsets of three streaming applications show that 7-49% additional energy savings (27.3% on average) can be obtained with negligible performance penalties using two novel inter-program optimizations, namely execution context sensitive buffer size selection and inverse barrier synchronization. These figures were obtained via physical measurements on two laptop disks.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133103373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A tunable bus encoder for off-chip data buses 片外数据总线的可调总线编码器
D. Suresh, B. Agrawal, Jun Yang, W. Najjar
Off-chip buses constitute a significant portion of the total system power in embedded systems. Past research has focused on encoding contiguous bit positions in data values to reduce the transition activity in the off-chip data buses. In this paper, the authors proposed tunable bus encoding (TUBE) scheme to reduce the power consumption in the data buses, which exploits repetition in contiguous as well as non-contiguous bit positions in order to encode data values. Problems of keeping just one control signal for the codec design were also solved. The results were compared with some of the already existing best schemes such as frequent value encoding (FVE) and FV-MSB-LSB encoding schemes. It is found that the scheme achieves an improvement of 21 % on average and up to 28% on some benchmarks over the FVE scheme and up to 84% over unencoded data. In comparison to FV-MSB-LSB encoding scheme, the presented scheme improves the energy savings by 10% on average and up to 21% for some media applications at the expense of minimal 0.45% performance overhead. A hardware design of the codec was presented and a detailed analysis of the hardware overhead in terms of area, delay and energy consumption were provided. It is again found that the codec can be easily implemented in an on-chip memory controller with small area requirement of 0.0521 mm/sup 2/.
在嵌入式系统中,片外总线占系统总功耗的很大一部分。过去的研究主要集中在对数据值中的连续位进行编码,以减少片外数据总线中的转换活动。为了降低数据总线的功耗,作者提出了可调总线编码(TUBE)方案,该方案利用连续和非连续位的重复来编码数据值。解决了编解码器设计中只保留一个控制信号的问题。结果与现有的一些最佳编码方案如频值编码(FVE)和FV-MSB-LSB编码方案进行了比较。研究发现,该方案比FVE方案平均提高21%,在一些基准测试中提高28%,比未编码数据提高84%。与FV-MSB-LSB编码方案相比,该方案平均节能10%,在某些媒体应用中节能高达21%,而性能开销仅为0.45%。给出了编解码器的硬件设计方案,并从面积、时延和能耗等方面详细分析了硬件开销。再次发现,编解码器可以很容易地实现在片上存储器控制器与小面积要求0.0521 mm/sup / 2/。
{"title":"A tunable bus encoder for off-chip data buses","authors":"D. Suresh, B. Agrawal, Jun Yang, W. Najjar","doi":"10.1145/1077603.1077680","DOIUrl":"https://doi.org/10.1145/1077603.1077680","url":null,"abstract":"Off-chip buses constitute a significant portion of the total system power in embedded systems. Past research has focused on encoding contiguous bit positions in data values to reduce the transition activity in the off-chip data buses. In this paper, the authors proposed tunable bus encoding (TUBE) scheme to reduce the power consumption in the data buses, which exploits repetition in contiguous as well as non-contiguous bit positions in order to encode data values. Problems of keeping just one control signal for the codec design were also solved. The results were compared with some of the already existing best schemes such as frequent value encoding (FVE) and FV-MSB-LSB encoding schemes. It is found that the scheme achieves an improvement of 21 % on average and up to 28% on some benchmarks over the FVE scheme and up to 84% over unencoded data. In comparison to FV-MSB-LSB encoding scheme, the presented scheme improves the energy savings by 10% on average and up to 21% for some media applications at the expense of minimal 0.45% performance overhead. A hardware design of the codec was presented and a detailed analysis of the hardware overhead in terms of area, delay and energy consumption were provided. It is again found that the codec can be easily implemented in an on-chip memory controller with small area requirement of 0.0521 mm/sup 2/.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125001011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A low-power crossroad switch architecture and its core placement for network-on-chip 片上网络的低功耗交叉交换架构及其核心布局
Kuei-Chung Chang, Jih-Sheng Shen, Tien-Fu Chen
As the number of cores on a chip increases, power consumed by the communication structures takes significant portion of the overall power-budget. The individual components of the SoCs will be heterogeneous in nature with widely varying functionality and communication requirements. The communication topology should possibly match communication workflows among these components. In this paper, the authors first proposed an interconnection architecture for SoC, which uses crossroad switches to construct a dedicated communication path dynamically between any two cores. Then a design methodology for constructing network on chip (NoC) was presented for application-specific computer systems with profiled communication characteristics. A core placement tool, which automatically maps cores to a communication topology such that the total communication energy can be minimized, was proposed. Experimental results show that the design methodology can generate optimized on-chip networks with fewer resources than meshes and tori, and the power saving approximates to 40%.
随着芯片上内核数量的增加,通信结构消耗的功率占总功率预算的很大一部分。soc的各个组件本质上是异构的,具有广泛不同的功能和通信需求。通信拓扑可能应该匹配这些组件之间的通信工作流。在本文中,作者首先提出了一种SoC互连架构,该架构使用十字路口交换机在任意两个核心之间动态构建专用通信路径。在此基础上,提出了一种针对特定通信特性的计算机系统构建片上网络的设计方法。提出了一种能够自动将核心映射到通信拓扑的核心放置工具,使总通信能量最小化。实验结果表明,该设计方法可以用比网格和环面更少的资源生成优化的片上网络,且功耗节省约40%。
{"title":"A low-power crossroad switch architecture and its core placement for network-on-chip","authors":"Kuei-Chung Chang, Jih-Sheng Shen, Tien-Fu Chen","doi":"10.1145/1077603.1077693","DOIUrl":"https://doi.org/10.1145/1077603.1077693","url":null,"abstract":"As the number of cores on a chip increases, power consumed by the communication structures takes significant portion of the overall power-budget. The individual components of the SoCs will be heterogeneous in nature with widely varying functionality and communication requirements. The communication topology should possibly match communication workflows among these components. In this paper, the authors first proposed an interconnection architecture for SoC, which uses crossroad switches to construct a dedicated communication path dynamically between any two cores. Then a design methodology for constructing network on chip (NoC) was presented for application-specific computer systems with profiled communication characteristics. A core placement tool, which automatically maps cores to a communication topology such that the total communication energy can be minimized, was proposed. Experimental results show that the design methodology can generate optimized on-chip networks with fewer resources than meshes and tori, and the power saving approximates to 40%.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126386089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Energy-aware fetch mechanism: trace cache and BTB customization 能量感知获取机制:跟踪缓存和BTB定制
D. Chaver, Miguel A. Rojas, L. Piñuel, M. Prieto, F. Tirado, Michael C. Huang
A highly-efficient fetch unit is essential not only to obtain good performance but also to achieve energy efficiency. However, existing designs are inflexible and depending on program behavior, can be either insufficient or an overkill. We introduce a phase-based adaptive fetch mechanism that can be dynamically adjusted based on feedback information of the program behavior. This design adds very little hardware complexity and relegates complex tasks to the software components. It is also very effective: saving 26.8% and 34.1% fetch energy on average compared with a conventional and a trace cache-based fetch unit, respectively. At the same time, performance is improved by 5.7% and 0.6%, respectively.
高效的取水装置不仅是获得良好性能的必要条件,也是实现能源效率的必要条件。然而,现有的设计是不灵活的,并且依赖于程序的行为,可能是不够的,也可能是过度的。我们引入了一种基于阶段的自适应获取机制,该机制可以根据程序行为的反馈信息进行动态调整。这种设计增加了很少的硬件复杂性,并将复杂的任务交给了软件组件。它也非常有效:与传统和基于跟踪缓存的获取单元相比,平均分别节省26.8%和34.1%的获取能量。同时,性能分别提高了5.7%和0.6%。
{"title":"Energy-aware fetch mechanism: trace cache and BTB customization","authors":"D. Chaver, Miguel A. Rojas, L. Piñuel, M. Prieto, F. Tirado, Michael C. Huang","doi":"10.1145/1077603.1077615","DOIUrl":"https://doi.org/10.1145/1077603.1077615","url":null,"abstract":"A highly-efficient fetch unit is essential not only to obtain good performance but also to achieve energy efficiency. However, existing designs are inflexible and depending on program behavior, can be either insufficient or an overkill. We introduce a phase-based adaptive fetch mechanism that can be dynamically adjusted based on feedback information of the program behavior. This design adds very little hardware complexity and relegates complex tasks to the software components. It is also very effective: saving 26.8% and 34.1% fetch energy on average compared with a conventional and a trace cache-based fetch unit, respectively. At the same time, performance is improved by 5.7% and 0.6%, respectively.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"1143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126735209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A low-power bus design using joint repeater insertion and coding 采用联合中继器插入和编码的低功耗总线设计
S. Sridhara, Naresh R Shanbhag
In this paper, we propose joint repeater insertion and crosstalk avoidance coding as a low-power alternative to repeater insertion for global bus design in nanometer technologies. We develop a methodology to calculate the repeater size and separation that minimize the total power dissipation for joint repeater insertion and coding for a specific delay target. This methodology is employed to obtain power vs. delay trade-offs for 130-nm, 90-nm, 65-nm, and 45-nm technology nodes. Using ITRS technology scaling data, we show that proposed technique provides 54%, 67%, and 69% power savings over optimally repeater-inserted 10-mm 32-bit bus at 90-nm, 65-nm, and 45-nm technology nodes, respectively, while achieving the same delay.
在本文中,我们提出了联合中继器插入和串扰避免编码,作为纳米技术全局总线设计中中继器插入的低功耗替代方案。我们开发了一种计算中继器尺寸和间隔的方法,以最小化联合中继器插入和编码特定延迟目标的总功耗。该方法用于获得130-nm、90-nm、65-nm和45-nm技术节点的功耗与延迟权衡。使用ITRS技术缩放数据,我们表明,在90纳米、65纳米和45纳米技术节点上,与最佳中继器插入的10毫米32位总线相比,所提出的技术分别节省了54%、67%和69%的功耗,同时实现了相同的延迟。
{"title":"A low-power bus design using joint repeater insertion and coding","authors":"S. Sridhara, Naresh R Shanbhag","doi":"10.1145/1077603.1077629","DOIUrl":"https://doi.org/10.1145/1077603.1077629","url":null,"abstract":"In this paper, we propose joint repeater insertion and crosstalk avoidance coding as a low-power alternative to repeater insertion for global bus design in nanometer technologies. We develop a methodology to calculate the repeater size and separation that minimize the total power dissipation for joint repeater insertion and coding for a specific delay target. This methodology is employed to obtain power vs. delay trade-offs for 130-nm, 90-nm, 65-nm, and 45-nm technology nodes. Using ITRS technology scaling data, we show that proposed technique provides 54%, 67%, and 69% power savings over optimally repeater-inserted 10-mm 32-bit bus at 90-nm, 65-nm, and 45-nm technology nodes, respectively, while achieving the same delay.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121605438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1