首页 > 最新文献

Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)最新文献

英文 中文
HomeRun 全垒打
Chih-Kai Kang, Chun-Han Lin, P. Hsiu, Ming-Syan Chen
Self-powered intermittent systems featuring nonvolatile processors (NVPs) allow for accumulative execution in unstable power environments. However, frequent power failures may cause incorrect NVP execution results due to invalid data generated intermittently. This paper presents a HW/SW co-design, called HomeRun, to guarantee atomicity by ensuring that an uninterruptible program section can be run through at one execution. We design a HW module to ensure that a power pulse is sufficient for an atomic section, and develop a SW mechanism for programmers to protect atomic sections. The proposed design is validated through the development of a prototype pattern locking system. Experimental results demonstrate that the proposed design can completely guarantee atomicity and significantly improve the energy utilization of self-powered intermittent systems.
{"title":"HomeRun","authors":"Chih-Kai Kang, Chun-Han Lin, P. Hsiu, Ming-Syan Chen","doi":"10.1145/3218603.3218633","DOIUrl":"https://doi.org/10.1145/3218603.3218633","url":null,"abstract":"Self-powered intermittent systems featuring nonvolatile processors (NVPs) allow for accumulative execution in unstable power environments. However, frequent power failures may cause incorrect NVP execution results due to invalid data generated intermittently. This paper presents a HW/SW co-design, called HomeRun, to guarantee atomicity by ensuring that an uninterruptible program section can be run through at one execution. We design a HW module to ensure that a power pulse is sufficient for an atomic section, and develop a SW mechanism for programmers to protect atomic sections. The proposed design is validated through the development of a prototype pattern locking system. Experimental results demonstrate that the proposed design can completely guarantee atomicity and significantly improve the energy utilization of self-powered intermittent systems.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78372311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Designing Efficient Imprecise Adders using Multi-bit Approximate Building Blocks 利用多比特近似构建块设计高效的不精确加法器
Sarvenaz Tajasob, Morteza Rezaalipour, M. Dehyadegari, M. N. Bojnordi
Energy-efficiency has become a major concern in designing computer systems. One of the most promising solutions to enhance power and energy-efficiency in error tolerant applications is approximate computing that balances accuracy, area, delay, and power consumption based on the computational needs. By trading accuracy of computation, approximate computing may achieve significant improvements in speed, power, and area consumption. Adders are important arithmetic units widely used in almost every digital processing system, which contribute to significant amounts of power dissipation. With the emergence of deep learning tasks and fault tolerant big data processing in every aspect of today's computing, the demand for low-power and energy-efficient approximate adders has increased significantly. Numerous designs have been proposed in the literature that build multi-bit adders using novel approximate full adder circuits. Regrettably, relying on single-bit building blocks only limits the design space of approximate adders and prevents the designers from achieving the most significant benefits of approximate circuits. This paper presents a novel approach to designing imprecise multi-bit adders, based on four novel approximate 2 and 3-bit adder building blocks. The proposed circuits are evaluated and compared with the existing low power adders in terms of various design characteristics, such as area, delay, power, and error tolerance. Our simulation results indicate that the proposed adders achieve more than 60% reduction in power and area consumption compared to the state-of-the-art approximate adders while introducing 12-17% less error in computation.
能源效率已成为设计计算机系统的一个主要问题。在容错应用程序中提高功率和能源效率的最有前途的解决方案之一是近似计算,它根据计算需求平衡精度、面积、延迟和功耗。通过牺牲计算的准确性,近似计算可以在速度、功率和面积消耗方面取得显著的改进。加法器是几乎所有数字处理系统中广泛使用的重要算术单元,它造成了大量的功耗。随着深度学习任务和容错大数据处理在当今计算的各个方面的出现,对低功耗、节能的近似加法器的需求显著增加。文献中提出了许多设计,使用新颖的近似全加法器电路构建多比特加法器。遗憾的是,依赖于单比特构建块只会限制近似加法器的设计空间,并阻止设计者实现近似电路的最显著优势。本文提出了一种设计不精确多位加法器的新方法,该方法基于四个新颖的近似2位和3位加法器构建块。对所提出的电路进行了评估,并与现有的低功率加法器在各种设计特性(如面积、延迟、功率和容错性)方面进行了比较。我们的仿真结果表明,与最先进的近似加法器相比,所提出的加法器的功耗和面积消耗降低了60%以上,而计算误差减少了12-17%。
{"title":"Designing Efficient Imprecise Adders using Multi-bit Approximate Building Blocks","authors":"Sarvenaz Tajasob, Morteza Rezaalipour, M. Dehyadegari, M. N. Bojnordi","doi":"10.1145/3218603.3218638","DOIUrl":"https://doi.org/10.1145/3218603.3218638","url":null,"abstract":"Energy-efficiency has become a major concern in designing computer systems. One of the most promising solutions to enhance power and energy-efficiency in error tolerant applications is approximate computing that balances accuracy, area, delay, and power consumption based on the computational needs. By trading accuracy of computation, approximate computing may achieve significant improvements in speed, power, and area consumption. Adders are important arithmetic units widely used in almost every digital processing system, which contribute to significant amounts of power dissipation. With the emergence of deep learning tasks and fault tolerant big data processing in every aspect of today's computing, the demand for low-power and energy-efficient approximate adders has increased significantly. Numerous designs have been proposed in the literature that build multi-bit adders using novel approximate full adder circuits. Regrettably, relying on single-bit building blocks only limits the design space of approximate adders and prevents the designers from achieving the most significant benefits of approximate circuits. This paper presents a novel approach to designing imprecise multi-bit adders, based on four novel approximate 2 and 3-bit adder building blocks. The proposed circuits are evaluated and compared with the existing low power adders in terms of various design characteristics, such as area, delay, power, and error tolerance. Our simulation results indicate that the proposed adders achieve more than 60% reduction in power and area consumption compared to the state-of-the-art approximate adders while introducing 12-17% less error in computation.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80403239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
EcoMicro EcoMicro
Cheng-Ting Lee, Yun-Hao Liang, Pai H. Chou, A. Gorji, Seyede Mahya Safavi, Wen-Chan Shih, Wen-Tsuen Chen
This paper describes EcoMicro, a miniature, self-powered, wireless inertial-sensing node in the volume of 8 x 13 x 9.5 mm3, including energy storage and solar cells. It is smaller than existing systems with similar functionality while retaining rich functionality and efficiency. It is capable of measuring motion using a inertial measurement unit (IMU) and communication over Bluetooth Low Energy (BLE) protocol. It is self-powered by miniature solar cells and can perform maximum power point tracking (MPPT). Its integrated energy-storage device combines the longevity and power density of supercapacitors with the relatively flat discharge curve of batteries. Our power-ground gating circuit minimizes leakage current during sleep mode and is used in conjunction with the real-time-clock for duty cycling. Experimental results show EcoMicro to be operational and efficient for a class of wireless sensing applications.
{"title":"EcoMicro","authors":"Cheng-Ting Lee, Yun-Hao Liang, Pai H. Chou, A. Gorji, Seyede Mahya Safavi, Wen-Chan Shih, Wen-Tsuen Chen","doi":"10.1145/3218603.3218648","DOIUrl":"https://doi.org/10.1145/3218603.3218648","url":null,"abstract":"This paper describes EcoMicro, a miniature, self-powered, wireless inertial-sensing node in the volume of 8 x 13 x 9.5 mm3, including energy storage and solar cells. It is smaller than existing systems with similar functionality while retaining rich functionality and efficiency. It is capable of measuring motion using a inertial measurement unit (IMU) and communication over Bluetooth Low Energy (BLE) protocol. It is self-powered by miniature solar cells and can perform maximum power point tracking (MPPT). Its integrated energy-storage device combines the longevity and power density of supercapacitors with the relatively flat discharge curve of batteries. Our power-ground gating circuit minimizes leakage current during sleep mode and is used in conjunction with the real-time-clock for duty cycling. Experimental results show EcoMicro to be operational and efficient for a class of wireless sensing applications.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75432139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Design Optimization of 3D Multi-Processor System-on-Chip with Integrated Flow Cell Arrays 集成流池阵列的三维多处理器片上系统设计优化
A. Andreev, Fulya Kaplan, Marina Zapater, A. Coskun, David Atienza Alonso
Integrated flow cell array (FCA) is an emerging technology, targeting the cooling and power delivery challenges of modern 2D/3D Multi-Processor Systems-on-Chip (MPSoCs). In FCA, electrolytic solutions are pumped through microchannels etched in the silicon of the chips, removing heat from the system, while, at the same time, generating power on-chip. In this work, we explore the impact of FCA system design on various 3D architectures and propose a methodology to optimize a 3D MPSoC with integrated FCA to run a given workload in the most energy-efficient way. Our results show that an optimized configuration can save up to 50% energy with respect to sub-optimal 3D MPSoC configurations.
集成液流电池阵列(FCA)是一项新兴技术,旨在解决现代2D/3D多处理器片上系统(mpsoc)在冷却和供电方面的挑战。在FCA中,电解溶液通过蚀刻在芯片硅上的微通道泵送,从系统中除去热量,同时在芯片上发电。在这项工作中,我们探讨了FCA系统设计对各种3D架构的影响,并提出了一种方法来优化具有集成FCA的3D MPSoC,以最节能的方式运行给定的工作负载。我们的研究结果表明,与次优的3D MPSoC配置相比,优化后的配置可以节省高达50%的能量。
{"title":"Design Optimization of 3D Multi-Processor System-on-Chip with Integrated Flow Cell Arrays","authors":"A. Andreev, Fulya Kaplan, Marina Zapater, A. Coskun, David Atienza Alonso","doi":"10.1145/3218603.3218606","DOIUrl":"https://doi.org/10.1145/3218603.3218606","url":null,"abstract":"Integrated flow cell array (FCA) is an emerging technology, targeting the cooling and power delivery challenges of modern 2D/3D Multi-Processor Systems-on-Chip (MPSoCs). In FCA, electrolytic solutions are pumped through microchannels etched in the silicon of the chips, removing heat from the system, while, at the same time, generating power on-chip. In this work, we explore the impact of FCA system design on various 3D architectures and propose a methodology to optimize a 3D MPSoC with integrated FCA to run a given workload in the most energy-efficient way. Our results show that an optimized configuration can save up to 50% energy with respect to sub-optimal 3D MPSoC configurations.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90492232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multiple Combined Write-Read Peripheral Assists in 6T FinFET SRAMs for Low-VMIN IoT and Cognitive Applications 用于低vmin物联网和认知应用的6T FinFET sram中的多个组合读写外设辅助
A. Banerjee, Sumanth Kamineni, B. Calhoun
Battery-operated or energy-harvested IoT and cognitive SoCs in modern FinFET processes prefer the use of low-VMIN SRAMs for ultra-low power (ULP) operations. However, the 1:1:1 high-density (HD) FinFET 6T bitcell faces challenges in achieving a lower VMIN across process variation. The 6T bitcell VMIN improves either by increasing the size of the bitcell or by using combinations of peripheral assists (PAs) since a single PA cannot achieve the best VMIN across process variation. State-of-the-art works show some combinations of write and read PAs that lower the VMIN of 6T FinFET SRAMs. However, the better combinations of PA for 14nm HD 6T FinFET SRAMs are unknown. This work compares all the possible dual combinations of PAs and reveals the better ones. We show that in a usual column mux scenario the combination of negative bitline with VDD boosting and VDD collapse with VDD boosting in a proportion of 14% and 6% (total 20%), respectively, maximize the static VMIN improvement close to 191mV for ULP IoT and cognitive applications. We also show that a combination of wordline boosting with negative bitline and wordline boosting with VSS lowering achieve a 150mV and 25mV of dynamic VMIN improvement at the 5GHz frequency for the worst-case write and read corners, respectively, beating other combinations.
在现代FinFET工艺中,电池供电或能量收集的物联网和认知soc更倾向于使用低vmin sram进行超低功耗(ULP)操作。然而,1:1:1高密度(HD) FinFET 6T位单元在实现低VMIN跨工艺变化方面面临挑战。6T位单元VMIN通过增加位单元的大小或使用外围辅助(PA)的组合来改进,因为单个PA不能实现最佳的VMIN。最新的研究表明,一些写入和读取PAs的组合可以降低6T FinFET sram的VMIN。然而,对于14nm HD 6T FinFET sram来说,PA的更好组合是未知的。这项工作比较了所有可能的PAs双组合,并揭示了较好的组合。我们表明,在通常的列复用场景中,负位线与VDD增强的组合和VDD崩溃与VDD增强的组合分别占14%和6%(总占20%),对于ULP物联网和认知应用,最大限度地提高了接近191mV的静态VMIN。我们还表明,在最坏情况下,在5GHz频率下,带负位线的字线增强和带VSS降低的字线增强组合分别实现了150mV和25mV的动态VMIN改进,优于其他组合。
{"title":"Multiple Combined Write-Read Peripheral Assists in 6T FinFET SRAMs for Low-VMIN IoT and Cognitive Applications","authors":"A. Banerjee, Sumanth Kamineni, B. Calhoun","doi":"10.1145/3218603.3218628","DOIUrl":"https://doi.org/10.1145/3218603.3218628","url":null,"abstract":"Battery-operated or energy-harvested IoT and cognitive SoCs in modern FinFET processes prefer the use of low-VMIN SRAMs for ultra-low power (ULP) operations. However, the 1:1:1 high-density (HD) FinFET 6T bitcell faces challenges in achieving a lower VMIN across process variation. The 6T bitcell VMIN improves either by increasing the size of the bitcell or by using combinations of peripheral assists (PAs) since a single PA cannot achieve the best VMIN across process variation. State-of-the-art works show some combinations of write and read PAs that lower the VMIN of 6T FinFET SRAMs. However, the better combinations of PA for 14nm HD 6T FinFET SRAMs are unknown. This work compares all the possible dual combinations of PAs and reveals the better ones. We show that in a usual column mux scenario the combination of negative bitline with VDD boosting and VDD collapse with VDD boosting in a proportion of 14% and 6% (total 20%), respectively, maximize the static VMIN improvement close to 191mV for ULP IoT and cognitive applications. We also show that a combination of wordline boosting with negative bitline and wordline boosting with VSS lowering achieve a 150mV and 25mV of dynamic VMIN improvement at the 5GHz frequency for the worst-case write and read corners, respectively, beating other combinations.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79108014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RMAC RMAC
M. Imani, Ricardo Garcia, Saransh Gupta, Tajana Rosing
Approximate computing is a way to build fast and energy efficient systems, which provides responses of good enough quality tailored for different purposes. In this paper, we propose a novel approximate floating point multiplier which efficiently multiplies two floating numbers and yields a high precision product. RMAC approximates the costly mantissa multiplication to a simple addition between the mantissa of input operands. To tune the level of accuracy, RMAC looks at the first bit of the input mantissas as well as the first N bits of the result of addition to dynamically estimate the maximum multiplication error rate. Then, RMAC decides to either accept the approximate result or re-execute the exact multiplication. Depending on the value of N, the proposed RMAC can be configured to achieve different levels of accuracy. We integrate the proposed RMAC in AMD southern Island GPU, by replacing RMAC with the existing floating point units. We test the efficiency and accuracy of the enhanced GPU on a wide range of applications including multimedia and machine learning applications. Our evaluations show that a GPU enhanced by the proposed RMAC can achieve 5.2x energydelay product improvement as opposed to GPU using conventional FPUs while ensuring less than 2% quality loss. Comparing our approach with other state-of-the-art approximate multipliers shows that RMAC can achieve 3.1x faster and 1.8x more energy efficient computations while providing the same quality of service.
{"title":"RMAC","authors":"M. Imani, Ricardo Garcia, Saransh Gupta, Tajana Rosing","doi":"10.1145/3218603.3218621","DOIUrl":"https://doi.org/10.1145/3218603.3218621","url":null,"abstract":"Approximate computing is a way to build fast and energy efficient systems, which provides responses of good enough quality tailored for different purposes. In this paper, we propose a novel approximate floating point multiplier which efficiently multiplies two floating numbers and yields a high precision product. RMAC approximates the costly mantissa multiplication to a simple addition between the mantissa of input operands. To tune the level of accuracy, RMAC looks at the first bit of the input mantissas as well as the first N bits of the result of addition to dynamically estimate the maximum multiplication error rate. Then, RMAC decides to either accept the approximate result or re-execute the exact multiplication. Depending on the value of N, the proposed RMAC can be configured to achieve different levels of accuracy. We integrate the proposed RMAC in AMD southern Island GPU, by replacing RMAC with the existing floating point units. We test the efficiency and accuracy of the enhanced GPU on a wide range of applications including multimedia and machine learning applications. Our evaluations show that a GPU enhanced by the proposed RMAC can achieve 5.2x energydelay product improvement as opposed to GPU using conventional FPUs while ensuring less than 2% quality loss. Comparing our approach with other state-of-the-art approximate multipliers shows that RMAC can achieve 3.1x faster and 1.8x more energy efficient computations while providing the same quality of service.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80028691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Efficient and Secure Group Key Management in IoT using Multistage Interconnected PUF 使用多级互联PUF的物联网中高效安全的组密钥管理
Hongxiang Gu, M. Potkonjak
Secure group-oriented communication is crucial to a wide range of applications in Internet of Things (IoT). Security problems related to group-oriented communications in IoT-based applications placed in a privacy-sensitive environment have become a major concern along with the development of the technology. Unfortunately, many IoT devices are designed to be portable and light-weight; thus, their functionalities, including security modules, are heavily constrained by the limited energy resources (e.g., battery capacity). To address these problems, we propose a group key management scheme based on a novel physically unclonable function (PUF) design: multistage interconnected PUF (MIPUF) to secure group communications in an energy-constrained environment. Our design is capable of performing key management tasks such as key distribution, key storage and rekeying securely and efficiently. We show that our design is secure against multiple attack methods and our experimental results show that our design saves 47.33% of energy globally comparing to state-of-the-art Elliptic-curve cryptography (ECC)-based key management scheme on average.
安全的群体通信对于物联网(IoT)的广泛应用至关重要。随着物联网技术的发展,与隐私敏感环境中基于物联网的应用中面向组的通信相关的安全问题已经成为人们关注的主要问题。不幸的是,许多物联网设备的设计都是便携轻便的;因此,它们的功能,包括安全模块,受到有限的能源资源(例如,电池容量)的严重限制。为了解决这些问题,我们提出了一种基于新型物理不可克隆功能(PUF)设计的组密钥管理方案:多级互联PUF (MIPUF),以保护能量受限环境下的组通信。我们的设计能够安全有效地执行密钥分发、密钥存储和密钥更新等密钥管理任务。实验结果表明,与最先进的基于椭圆曲线加密(ECC)的密钥管理方案相比,我们的设计在全球范围内平均节省了47.33%的能源。
{"title":"Efficient and Secure Group Key Management in IoT using Multistage Interconnected PUF","authors":"Hongxiang Gu, M. Potkonjak","doi":"10.1145/3218603.3218646","DOIUrl":"https://doi.org/10.1145/3218603.3218646","url":null,"abstract":"Secure group-oriented communication is crucial to a wide range of applications in Internet of Things (IoT). Security problems related to group-oriented communications in IoT-based applications placed in a privacy-sensitive environment have become a major concern along with the development of the technology. Unfortunately, many IoT devices are designed to be portable and light-weight; thus, their functionalities, including security modules, are heavily constrained by the limited energy resources (e.g., battery capacity). To address these problems, we propose a group key management scheme based on a novel physically unclonable function (PUF) design: multistage interconnected PUF (MIPUF) to secure group communications in an energy-constrained environment. Our design is capable of performing key management tasks such as key distribution, key storage and rekeying securely and efficiently. We show that our design is secure against multiple attack methods and our experimental results show that our design saves 47.33% of energy globally comparing to state-of-the-art Elliptic-curve cryptography (ECC)-based key management scheme on average.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74284187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
SPONGE: A Scalable Pivot-based On/Off Gating Engine for Reducing Static Power in NoC Routers 海绵:一个可扩展的基于枢轴的开关门控引擎,用于降低NoC路由器的静态功率
Hossein Farrokhbakht, Hadi Mardani Kamali, Natalie D. Enright Jerger, S. Hessabi
Due to high aggregate idle time of Networks-on-Chip (NoCs) routers in practical applications, power-gating techniques have been proposed to combat the ever-increasing ratio of static power. Nevertheless, the sporadic packet arrivals compromise the effectiveness of power-gating by incurring significant latency and energy overhead. In this paper, we propose a Scalable Pivot-based On/Off Gating Engine (SPONGE) which efficiently manages power-gating decisions and routing mechanism by adaptively selecting a small set of powered-on columns of routers and keeping the others in power-gated state. To this end, a router architecture augmented with a novel routing algorithm is proposed in which a packet can traverse powered-off routers without waking them up, and can only turn in predetermined powered-on routers. Experimental results on SPLASH-2 benchmarks demonstrate that, compared to the conventional power-gating method, SPONGE on average not only improves static power consumption by 81.7%, it also improves average packet latency by 63%.
由于片上网络(noc)路由器在实际应用中的总空闲时间较高,功率门控技术被提出以对抗不断增加的静态功率比率。然而,零星的数据包到达会导致显著的延迟和能量开销,从而影响功率门控的有效性。本文提出了一种可扩展的基于枢轴的开/关门控引擎(SPONGE),该引擎通过自适应地选择一小部分上电的路由器列,并使其他列保持上电的状态,从而有效地管理电源门控决策和路由机制。为此,提出了一种新的路由算法增强的路由器架构,该架构使数据包可以不唤醒已关闭的路由器,并且只能在预定的已打开的路由器中转过去。在SPLASH-2基准测试上的实验结果表明,与传统的功率门控方法相比,SPONGE不仅平均提高了81.7%的静态功耗,而且平均数据包延迟提高了63%。
{"title":"SPONGE: A Scalable Pivot-based On/Off Gating Engine for Reducing Static Power in NoC Routers","authors":"Hossein Farrokhbakht, Hadi Mardani Kamali, Natalie D. Enright Jerger, S. Hessabi","doi":"10.1145/3218603.3218635","DOIUrl":"https://doi.org/10.1145/3218603.3218635","url":null,"abstract":"Due to high aggregate idle time of Networks-on-Chip (NoCs) routers in practical applications, power-gating techniques have been proposed to combat the ever-increasing ratio of static power. Nevertheless, the sporadic packet arrivals compromise the effectiveness of power-gating by incurring significant latency and energy overhead. In this paper, we propose a Scalable Pivot-based On/Off Gating Engine (SPONGE) which efficiently manages power-gating decisions and routing mechanism by adaptively selecting a small set of powered-on columns of routers and keeping the others in power-gated state. To this end, a router architecture augmented with a novel routing algorithm is proposed in which a packet can traverse powered-off routers without waking them up, and can only turn in predetermined powered-on routers. Experimental results on SPLASH-2 benchmarks demonstrate that, compared to the conventional power-gating method, SPONGE on average not only improves static power consumption by 81.7%, it also improves average packet latency by 63%.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74042981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Road to High-Performance 3D ICs: Performance Optimization Methodologies for Monolithic 3D ICs 高性能3D集成电路之路:单片3D集成电路的性能优化方法
Kyungwook Chang, S. Pentapati, D. Shim, S. Lim
As we approach the limits of 2D device scaling, monolithic 3D IC (M3D) has emerged as a potential solution offering performance and power benefits. Although various studies have been done to increase power savings of M3D designs, efforts to improve their performance are rarely made. In this paper, we, for the first time, perform in-depth analysis of the factors that affect the performance of M3D, and present methodologies to improve the performance. Our methodologies outperform the state-of-the-art M3D design flow by offering 15.6% performance improvement and 16.2% energy-delay product (EDP) benefit over 2D designs.
随着我们接近2D器件扩展的极限,单片3D IC (M3D)已经成为提供性能和功耗优势的潜在解决方案。尽管已经进行了各种各样的研究来提高M3D设计的功耗节约,但很少有人努力提高其性能。在本文中,我们首次对影响M3D性能的因素进行了深入的分析,并提出了提高性能的方法。我们的方法优于最先进的M3D设计流程,比2D设计提供15.6%的性能改进和16.2%的能量延迟产品(EDP)优势。
{"title":"Road to High-Performance 3D ICs: Performance Optimization Methodologies for Monolithic 3D ICs","authors":"Kyungwook Chang, S. Pentapati, D. Shim, S. Lim","doi":"10.1145/3218603.3218636","DOIUrl":"https://doi.org/10.1145/3218603.3218636","url":null,"abstract":"As we approach the limits of 2D device scaling, monolithic 3D IC (M3D) has emerged as a potential solution offering performance and power benefits. Although various studies have been done to increase power savings of M3D designs, efforts to improve their performance are rarely made. In this paper, we, for the first time, perform in-depth analysis of the factors that affect the performance of M3D, and present methodologies to improve the performance. Our methodologies outperform the state-of-the-art M3D design flow by offering 15.6% performance improvement and 16.2% energy-delay product (EDP) benefit over 2D designs.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77302552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Low-power 4096x2160@30fps H.265/HEVC Video Encoder for Smart Video Surveillance 用于智能视频监控的低功耗4096x2160@30fps H.265/HEVC视频编码器
Ke Xu, Yu Li, Bo Huang, Xiangkai Liu, Hong Wang, Zhuoyan Wu, Zhanpeng Yan, Xueying Tu, Tongqing Wu, Daibing Zeng
This paper presents the design and VLSI implementation of a low-power HEVC main profile encoder, which is able to process up to 4096x2160@30fps 4:2:0 encoding in real-time with five-stage pipeline architecture. A pyramid ME (Motion Estimation) engine is employed to reduce search complexity. To compensate for the video sequences with fast moving objects, GME (Global Motion Estimation) are introduced to alleviate the effect of limited search range. We also implement an alternative 5x5 search along with 3x3 to boost video quality. For intra mode decision, original pixels, instead of reconstructed ones are used to reduce pipeline stall. The encoder supports DVFS (Dynamic Voltage and Frequency Scaling) and features three operating modes, which helps to reduce power consumption by 25%. Scalable quality that trades encoding quality for power by reducing size of search range and intra prediction candidates, achieves 11.4% power reduction with 3.5% quality degradation. Furthermore, a lossless frame buffer compression is proposed which reduced DDR bandwidth by 49.1% and power consumption by 13.6%. The entire video surveillance SoC is fabricated with TSMC 28nm technology with 1.96 mm2 area. It consumes 2.88M logic gates and 117KB SRAM. The measured power consumption is 103mW at 350MHz for 4K encoding with high-quality mode. The 0.39nJ/pixel of energy efficiency of this work, which achieves 42% ~ 97% power reduction as compared with reference designs, make it ideal for real-time low-power smart video surveillance applications.
本文介绍了一种低功耗HEVC主轮廓编码器的设计和VLSI实现,该编码器采用五级流水线架构,能够实时处理4096x2160@30fps 4:2:0的编码。采用金字塔运动估计引擎来降低搜索复杂度。为了补偿视频序列中快速运动的物体,引入了全局运动估计(GME)来缓解搜索范围有限的影响。我们还实现了一个替代的5x5搜索和3x3搜索,以提高视频质量。对于模式内决策,使用原始像素而不是重构像素来减少管道失速。该编码器支持DVFS(动态电压和频率缩放),并具有三种工作模式,有助于降低25%的功耗。可扩展的质量通过减少搜索范围和内部预测候选的大小来换取编码质量,实现了11.4%的功耗降低和3.5%的质量降低。在此基础上,提出了一种无损帧缓冲压缩方法,使DDR带宽降低49.1%,功耗降低13.6%。整个视频监控SoC采用台积电28纳米技术制造,面积为1.96 mm2。它消耗288万个逻辑门和117KB SRAM。采用高质量模式进行4K编码时,在350MHz下的实测功耗为103mW。本工作的能效为0.39nJ/pixel,与参考设计相比,功耗降低42% ~ 97%,是实时低功耗智能视频监控应用的理想选择。
{"title":"A Low-power 4096x2160@30fps H.265/HEVC Video Encoder for Smart Video Surveillance","authors":"Ke Xu, Yu Li, Bo Huang, Xiangkai Liu, Hong Wang, Zhuoyan Wu, Zhanpeng Yan, Xueying Tu, Tongqing Wu, Daibing Zeng","doi":"10.1145/3218603.3218604","DOIUrl":"https://doi.org/10.1145/3218603.3218604","url":null,"abstract":"This paper presents the design and VLSI implementation of a low-power HEVC main profile encoder, which is able to process up to 4096x2160@30fps 4:2:0 encoding in real-time with five-stage pipeline architecture. A pyramid ME (Motion Estimation) engine is employed to reduce search complexity. To compensate for the video sequences with fast moving objects, GME (Global Motion Estimation) are introduced to alleviate the effect of limited search range. We also implement an alternative 5x5 search along with 3x3 to boost video quality. For intra mode decision, original pixels, instead of reconstructed ones are used to reduce pipeline stall. The encoder supports DVFS (Dynamic Voltage and Frequency Scaling) and features three operating modes, which helps to reduce power consumption by 25%. Scalable quality that trades encoding quality for power by reducing size of search range and intra prediction candidates, achieves 11.4% power reduction with 3.5% quality degradation. Furthermore, a lossless frame buffer compression is proposed which reduced DDR bandwidth by 49.1% and power consumption by 13.6%. The entire video surveillance SoC is fabricated with TSMC 28nm technology with 1.96 mm2 area. It consumes 2.88M logic gates and 117KB SRAM. The measured power consumption is 103mW at 350MHz for 4K encoding with high-quality mode. The 0.39nJ/pixel of energy efficiency of this work, which achieves 42% ~ 97% power reduction as compared with reference designs, make it ideal for real-time low-power smart video surveillance applications.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85661034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1