首页 > 最新文献

Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors最新文献

英文 中文
Improving processor performance by simplifying and bypassing trivial computations 通过简化和绕过琐碎的计算来提高处理器性能
J. Yi, D. Lilja
During the course of a program's execution, a processor performs mangy trivial computations; that is, computations that can be simplified or where the result is zero, one, or equal to one of the input operands. This paper shows that, despite compiling a program with aggressive optimizations (-O3), approximately 30% of all arithmetic instructions, which account for 12% of all dynamic instructions, are trivial computations. The amount of trivial computation is not heavily dependent on the program's specific input values. Our results show that eliminating trivial computations dynamically at run-time yields an average speedup of 8% for a typical processor. Even for a very aggressive processor (i.e. one with no functional unit constraints), the average speedup is still 6%. It also is important to note that the area cost (i.e. hardware) required to dynamically detect and eliminate these trivial computations is very low, consisting of only a few comparators and multiplexers.
在程序的执行过程中,处理器执行许多琐碎的计算;也就是说,可以简化计算,或者结果为0、1或等于其中一个输入操作数的计算。本文表明,尽管编译了一个具有积极优化(-O3)的程序,但大约30%的算术指令(占所有动态指令的12%)是琐碎的计算。琐碎计算的数量并不严重依赖于程序的特定输入值。我们的结果表明,对于一个典型的处理器来说,在运行时动态地消除琐碎的计算可以使平均速度提高8%。即使对于一个非常激进的处理器(即没有功能单元约束的处理器),平均加速速度仍然是6%。还需要注意的是,动态检测和消除这些琐碎计算所需的面积成本(即硬件)非常低,仅由几个比较器和多路复用器组成。
{"title":"Improving processor performance by simplifying and bypassing trivial computations","authors":"J. Yi, D. Lilja","doi":"10.1109/ICCD.2002.1106814","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106814","url":null,"abstract":"During the course of a program's execution, a processor performs mangy trivial computations; that is, computations that can be simplified or where the result is zero, one, or equal to one of the input operands. This paper shows that, despite compiling a program with aggressive optimizations (-O3), approximately 30% of all arithmetic instructions, which account for 12% of all dynamic instructions, are trivial computations. The amount of trivial computation is not heavily dependent on the program's specific input values. Our results show that eliminating trivial computations dynamically at run-time yields an average speedup of 8% for a typical processor. Even for a very aggressive processor (i.e. one with no functional unit constraints), the average speedup is still 6%. It also is important to note that the area cost (i.e. hardware) required to dynamically detect and eliminate these trivial computations is very low, consisting of only a few comparators and multiplexers.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"42 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129188094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
A low energy set-associative I-Cache with extended BTB 具有扩展BTB的低能集关联I-Cache
Koji Inoue, V. Moshnyaga, K. Murakami
This paper proposes a low-energy instruction-cache architecture, called history-based tag-comparison (HBTC) cache. The HBTC cache attempts to re-use tag-comparison results for avoiding unnecessary way activation in set-associative caches. The cache records tag-comparison results in an extended BTB, and re-uses them for directly selecting only the hit-way which includes the target instruction. In our simulation, it is observed that the HBTC cache can achieve 62% of energy reduction, with less than 1% performance degradation, compared with a conventional cache.
本文提出了一种低能耗的指令缓存架构,称为基于历史的标签比较(HBTC)缓存。HBTC缓存尝试重用标签比较结果,以避免在集合关联缓存中不必要的方式激活。缓存在扩展的BTB中记录标签比较结果,并重用它们直接选择包含目标指令的hit-way。在我们的模拟中,观察到与传统缓存相比,HBTC缓存可以实现62%的能量减少,而性能下降不到1%。
{"title":"A low energy set-associative I-Cache with extended BTB","authors":"Koji Inoue, V. Moshnyaga, K. Murakami","doi":"10.1109/ICCD.2002.1106768","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106768","url":null,"abstract":"This paper proposes a low-energy instruction-cache architecture, called history-based tag-comparison (HBTC) cache. The HBTC cache attempts to re-use tag-comparison results for avoiding unnecessary way activation in set-associative caches. The cache records tag-comparison results in an extended BTB, and re-uses them for directly selecting only the hit-way which includes the target instruction. In our simulation, it is observed that the HBTC cache can achieve 62% of energy reduction, with less than 1% performance degradation, compared with a conventional cache.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121193261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Low-power, high-speed CMOS VLSI design 低功耗、高速CMOS VLSI设计
T. Kuroda
Ubiquitous computing is a next generation information technology where computers and communications will be scaled further, merged together, and materialized in consumer applications. Computers will be invisible behind broadband networks as servers, while terminals will come closer to us as wearable/implantable devices, more friendly devices with sophisticated human-computer interactions. IC chips will be implanted everywhere so that things can think and talk for distributed information processing. Key technologies here are low power, low cost, and good interfaces, especially for wireless data communications. Low-power, high-speed CMOS circuit techniques are presented in this paper, including low-voltage design with variable/multiple V/sub DD//V/sub TH/ control, embedded memory technology for reducing capacitance, and low-switching activity design.
普适计算是下一代信息技术,计算机和通信将进一步扩展,融合在一起,并在消费者应用程序中具体化。计算机将以服务器的形式隐藏在宽带网络之后,而终端将以可穿戴/可植入设备的形式离我们更近,这些设备更友好,具有复杂的人机交互。集成电路芯片将被植入各处,使事物能够思考和说话,进行分布式信息处理。这里的关键技术是低功耗、低成本和良好的接口,特别是无线数据通信。本文介绍了低功耗、高速CMOS电路技术,包括可变/多V/sub DD//V/sub TH控制的低电压设计、减小电容的嵌入式存储器技术和低开关活度设计。
{"title":"Low-power, high-speed CMOS VLSI design","authors":"T. Kuroda","doi":"10.1109/ICCD.2002.1106787","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106787","url":null,"abstract":"Ubiquitous computing is a next generation information technology where computers and communications will be scaled further, merged together, and materialized in consumer applications. Computers will be invisible behind broadband networks as servers, while terminals will come closer to us as wearable/implantable devices, more friendly devices with sophisticated human-computer interactions. IC chips will be implanted everywhere so that things can think and talk for distributed information processing. Key technologies here are low power, low cost, and good interfaces, especially for wireless data communications. Low-power, high-speed CMOS circuit techniques are presented in this paper, including low-voltage design with variable/multiple V/sub DD//V/sub TH/ control, embedded memory technology for reducing capacitance, and low-switching activity design.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128982781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Subword sorting with versatile permutation instructions 子字排序与通用的排列指令
Z. Shi, R. Lee
Subword parallelism has succeeded in accelerating many multimedia applications. Subword permutation instructions have been proposed to efficiently rearrange subwords in or among registers. Bit-level permutation instructions have also been proposed recently for their importance in cryptography. However, important algorithms, especially those with many conditional control dependencies such as sorting, have not exploited the advantage of subword parallel instructions. In this paper, we show how one of the bit permutation instructions, GRP, can be used for fast sorting. In the process, we demonstrate the versatility of this permutation instruction for uses other than bit permutations. This versatility is important in considering the addition of a new instruction to a general-purpose processor. The results show that our sorting methods have a significant speedup even when compared with the fastest sorting algorithms. We also discuss the hardware implementation of the GRP instruction and compare its latency to a typical processor's cycle time.
子词并行已经成功地加速了许多多媒体应用。为了有效地重新排列寄存器内或寄存器间的子字,提出了子字置换指令。位级排列指令由于其在密码学中的重要性最近也被提出。然而,重要的算法,特别是那些有许多条件控制依赖的算法,如排序,并没有利用子词并行指令的优势。在本文中,我们展示了如何使用位置换指令GRP来进行快速排序。在这个过程中,我们演示了这种排列指令的多功能性,用于比特排列以外的其他用途。在考虑向通用处理器添加新指令时,这种通用性很重要。结果表明,即使与最快的排序算法相比,我们的排序方法也有显着的加速。我们还讨论了GRP指令的硬件实现,并将其延迟与典型处理器的周期时间进行了比较。
{"title":"Subword sorting with versatile permutation instructions","authors":"Z. Shi, R. Lee","doi":"10.1109/ICCD.2002.1106776","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106776","url":null,"abstract":"Subword parallelism has succeeded in accelerating many multimedia applications. Subword permutation instructions have been proposed to efficiently rearrange subwords in or among registers. Bit-level permutation instructions have also been proposed recently for their importance in cryptography. However, important algorithms, especially those with many conditional control dependencies such as sorting, have not exploited the advantage of subword parallel instructions. In this paper, we show how one of the bit permutation instructions, GRP, can be used for fast sorting. In the process, we demonstrate the versatility of this permutation instruction for uses other than bit permutations. This versatility is important in considering the addition of a new instruction to a general-purpose processor. The results show that our sorting methods have a significant speedup even when compared with the fastest sorting algorithms. We also discuss the hardware implementation of the GRP instruction and compare its latency to a typical processor's cycle time.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129732400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Locating tiny sensors in time and space: a case study 在时间和空间中定位微型传感器:一个案例研究
Lewis Girod, Vladimir Bychkovskiy, J. Elson, D. Estrin
As the cost of embedded sensors and actuators drops, new applications will arise that exploit high density networks of small devices capable of a variety of sensing tasks. Although individual devices may have limited functionality, the true value of the system comes from the emergent behavior that arises when data from many places in the system is combined. This type of data fusion has a number of requirements, but two of the most important are: 1) synchronized time, precise enough to resolve movement in the sensed phenomenon (e.g., sound); and 2) known geographic locations, on a similar scale to the sensors' size and deployment density. However, the installation cost of a localization system with sufficient granularity is considerable, because of the large amount of effort required to deploy such a system and make all the measurements required to tune it. In this paper, we describe a system based on COTS components that incorporates our novel time synchronization and acoustic ranging techniques. The result is a low-cost, readily available platform for distributed, coherent signal processing.
随着嵌入式传感器和执行器成本的下降,新的应用将出现,利用能够执行各种传感任务的小型设备的高密度网络。尽管单个设备可能具有有限的功能,但系统的真正价值来自于当系统中许多地方的数据组合在一起时产生的紧急行为。这种类型的数据融合有许多要求,但最重要的两个是:1)同步的时间,足够精确,以解决运动感测现象(例如,声音);2)已知的地理位置,与传感器的大小和部署密度相似。然而,具有足够粒度的定位系统的安装成本是相当可观的,因为部署这样的系统并进行调优所需的所有测量需要大量的工作。在本文中,我们描述了一个基于COTS组件的系统,该系统结合了我们新颖的时间同步和声测距技术。其结果是一个低成本、易于获得的分布式、相干信号处理平台。
{"title":"Locating tiny sensors in time and space: a case study","authors":"Lewis Girod, Vladimir Bychkovskiy, J. Elson, D. Estrin","doi":"10.1109/ICCD.2002.1106773","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106773","url":null,"abstract":"As the cost of embedded sensors and actuators drops, new applications will arise that exploit high density networks of small devices capable of a variety of sensing tasks. Although individual devices may have limited functionality, the true value of the system comes from the emergent behavior that arises when data from many places in the system is combined. This type of data fusion has a number of requirements, but two of the most important are: 1) synchronized time, precise enough to resolve movement in the sensed phenomenon (e.g., sound); and 2) known geographic locations, on a similar scale to the sensors' size and deployment density. However, the installation cost of a localization system with sufficient granularity is considerable, because of the large amount of effort required to deploy such a system and make all the measurements required to tune it. In this paper, we describe a system based on COTS components that incorporates our novel time synchronization and acoustic ranging techniques. The result is a low-cost, readily available platform for distributed, coherent signal processing.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128087691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 287
期刊
Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1