Chih-Kai Kang, Chun-Han Lin, P. Hsiu, Ming-Syan Chen
Self-powered intermittent systems featuring nonvolatile processors (NVPs) allow for accumulative execution in unstable power environments. However, frequent power failures may cause incorrect NVP execution results due to invalid data generated intermittently. This paper presents a HW/SW co-design, called HomeRun, to guarantee atomicity by ensuring that an uninterruptible program section can be run through at one execution. We design a HW module to ensure that a power pulse is sufficient for an atomic section, and develop a SW mechanism for programmers to protect atomic sections. The proposed design is validated through the development of a prototype pattern locking system. Experimental results demonstrate that the proposed design can completely guarantee atomicity and significantly improve the energy utilization of self-powered intermittent systems.
{"title":"HomeRun","authors":"Chih-Kai Kang, Chun-Han Lin, P. Hsiu, Ming-Syan Chen","doi":"10.1145/3218603.3218633","DOIUrl":"https://doi.org/10.1145/3218603.3218633","url":null,"abstract":"Self-powered intermittent systems featuring nonvolatile processors (NVPs) allow for accumulative execution in unstable power environments. However, frequent power failures may cause incorrect NVP execution results due to invalid data generated intermittently. This paper presents a HW/SW co-design, called HomeRun, to guarantee atomicity by ensuring that an uninterruptible program section can be run through at one execution. We design a HW module to ensure that a power pulse is sufficient for an atomic section, and develop a SW mechanism for programmers to protect atomic sections. The proposed design is validated through the development of a prototype pattern locking system. Experimental results demonstrate that the proposed design can completely guarantee atomicity and significantly improve the energy utilization of self-powered intermittent systems.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78372311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarvenaz Tajasob, Morteza Rezaalipour, M. Dehyadegari, M. N. Bojnordi
Energy-efficiency has become a major concern in designing computer systems. One of the most promising solutions to enhance power and energy-efficiency in error tolerant applications is approximate computing that balances accuracy, area, delay, and power consumption based on the computational needs. By trading accuracy of computation, approximate computing may achieve significant improvements in speed, power, and area consumption. Adders are important arithmetic units widely used in almost every digital processing system, which contribute to significant amounts of power dissipation. With the emergence of deep learning tasks and fault tolerant big data processing in every aspect of today's computing, the demand for low-power and energy-efficient approximate adders has increased significantly. Numerous designs have been proposed in the literature that build multi-bit adders using novel approximate full adder circuits. Regrettably, relying on single-bit building blocks only limits the design space of approximate adders and prevents the designers from achieving the most significant benefits of approximate circuits. This paper presents a novel approach to designing imprecise multi-bit adders, based on four novel approximate 2 and 3-bit adder building blocks. The proposed circuits are evaluated and compared with the existing low power adders in terms of various design characteristics, such as area, delay, power, and error tolerance. Our simulation results indicate that the proposed adders achieve more than 60% reduction in power and area consumption compared to the state-of-the-art approximate adders while introducing 12-17% less error in computation.
{"title":"Designing Efficient Imprecise Adders using Multi-bit Approximate Building Blocks","authors":"Sarvenaz Tajasob, Morteza Rezaalipour, M. Dehyadegari, M. N. Bojnordi","doi":"10.1145/3218603.3218638","DOIUrl":"https://doi.org/10.1145/3218603.3218638","url":null,"abstract":"Energy-efficiency has become a major concern in designing computer systems. One of the most promising solutions to enhance power and energy-efficiency in error tolerant applications is approximate computing that balances accuracy, area, delay, and power consumption based on the computational needs. By trading accuracy of computation, approximate computing may achieve significant improvements in speed, power, and area consumption. Adders are important arithmetic units widely used in almost every digital processing system, which contribute to significant amounts of power dissipation. With the emergence of deep learning tasks and fault tolerant big data processing in every aspect of today's computing, the demand for low-power and energy-efficient approximate adders has increased significantly. Numerous designs have been proposed in the literature that build multi-bit adders using novel approximate full adder circuits. Regrettably, relying on single-bit building blocks only limits the design space of approximate adders and prevents the designers from achieving the most significant benefits of approximate circuits. This paper presents a novel approach to designing imprecise multi-bit adders, based on four novel approximate 2 and 3-bit adder building blocks. The proposed circuits are evaluated and compared with the existing low power adders in terms of various design characteristics, such as area, delay, power, and error tolerance. Our simulation results indicate that the proposed adders achieve more than 60% reduction in power and area consumption compared to the state-of-the-art approximate adders while introducing 12-17% less error in computation.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80403239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheng-Ting Lee, Yun-Hao Liang, Pai H. Chou, A. Gorji, Seyede Mahya Safavi, Wen-Chan Shih, Wen-Tsuen Chen
This paper describes EcoMicro, a miniature, self-powered, wireless inertial-sensing node in the volume of 8 x 13 x 9.5 mm3, including energy storage and solar cells. It is smaller than existing systems with similar functionality while retaining rich functionality and efficiency. It is capable of measuring motion using a inertial measurement unit (IMU) and communication over Bluetooth Low Energy (BLE) protocol. It is self-powered by miniature solar cells and can perform maximum power point tracking (MPPT). Its integrated energy-storage device combines the longevity and power density of supercapacitors with the relatively flat discharge curve of batteries. Our power-ground gating circuit minimizes leakage current during sleep mode and is used in conjunction with the real-time-clock for duty cycling. Experimental results show EcoMicro to be operational and efficient for a class of wireless sensing applications.
{"title":"EcoMicro","authors":"Cheng-Ting Lee, Yun-Hao Liang, Pai H. Chou, A. Gorji, Seyede Mahya Safavi, Wen-Chan Shih, Wen-Tsuen Chen","doi":"10.1145/3218603.3218648","DOIUrl":"https://doi.org/10.1145/3218603.3218648","url":null,"abstract":"This paper describes EcoMicro, a miniature, self-powered, wireless inertial-sensing node in the volume of 8 x 13 x 9.5 mm3, including energy storage and solar cells. It is smaller than existing systems with similar functionality while retaining rich functionality and efficiency. It is capable of measuring motion using a inertial measurement unit (IMU) and communication over Bluetooth Low Energy (BLE) protocol. It is self-powered by miniature solar cells and can perform maximum power point tracking (MPPT). Its integrated energy-storage device combines the longevity and power density of supercapacitors with the relatively flat discharge curve of batteries. Our power-ground gating circuit minimizes leakage current during sleep mode and is used in conjunction with the real-time-clock for duty cycling. Experimental results show EcoMicro to be operational and efficient for a class of wireless sensing applications.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75432139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Andreev, Fulya Kaplan, Marina Zapater, A. Coskun, David Atienza Alonso
Integrated flow cell array (FCA) is an emerging technology, targeting the cooling and power delivery challenges of modern 2D/3D Multi-Processor Systems-on-Chip (MPSoCs). In FCA, electrolytic solutions are pumped through microchannels etched in the silicon of the chips, removing heat from the system, while, at the same time, generating power on-chip. In this work, we explore the impact of FCA system design on various 3D architectures and propose a methodology to optimize a 3D MPSoC with integrated FCA to run a given workload in the most energy-efficient way. Our results show that an optimized configuration can save up to 50% energy with respect to sub-optimal 3D MPSoC configurations.
{"title":"Design Optimization of 3D Multi-Processor System-on-Chip with Integrated Flow Cell Arrays","authors":"A. Andreev, Fulya Kaplan, Marina Zapater, A. Coskun, David Atienza Alonso","doi":"10.1145/3218603.3218606","DOIUrl":"https://doi.org/10.1145/3218603.3218606","url":null,"abstract":"Integrated flow cell array (FCA) is an emerging technology, targeting the cooling and power delivery challenges of modern 2D/3D Multi-Processor Systems-on-Chip (MPSoCs). In FCA, electrolytic solutions are pumped through microchannels etched in the silicon of the chips, removing heat from the system, while, at the same time, generating power on-chip. In this work, we explore the impact of FCA system design on various 3D architectures and propose a methodology to optimize a 3D MPSoC with integrated FCA to run a given workload in the most energy-efficient way. Our results show that an optimized configuration can save up to 50% energy with respect to sub-optimal 3D MPSoC configurations.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90492232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Battery-operated or energy-harvested IoT and cognitive SoCs in modern FinFET processes prefer the use of low-VMIN SRAMs for ultra-low power (ULP) operations. However, the 1:1:1 high-density (HD) FinFET 6T bitcell faces challenges in achieving a lower VMIN across process variation. The 6T bitcell VMIN improves either by increasing the size of the bitcell or by using combinations of peripheral assists (PAs) since a single PA cannot achieve the best VMIN across process variation. State-of-the-art works show some combinations of write and read PAs that lower the VMIN of 6T FinFET SRAMs. However, the better combinations of PA for 14nm HD 6T FinFET SRAMs are unknown. This work compares all the possible dual combinations of PAs and reveals the better ones. We show that in a usual column mux scenario the combination of negative bitline with VDD boosting and VDD collapse with VDD boosting in a proportion of 14% and 6% (total 20%), respectively, maximize the static VMIN improvement close to 191mV for ULP IoT and cognitive applications. We also show that a combination of wordline boosting with negative bitline and wordline boosting with VSS lowering achieve a 150mV and 25mV of dynamic VMIN improvement at the 5GHz frequency for the worst-case write and read corners, respectively, beating other combinations.
在现代FinFET工艺中,电池供电或能量收集的物联网和认知soc更倾向于使用低vmin sram进行超低功耗(ULP)操作。然而,1:1:1高密度(HD) FinFET 6T位单元在实现低VMIN跨工艺变化方面面临挑战。6T位单元VMIN通过增加位单元的大小或使用外围辅助(PA)的组合来改进,因为单个PA不能实现最佳的VMIN。最新的研究表明,一些写入和读取PAs的组合可以降低6T FinFET sram的VMIN。然而,对于14nm HD 6T FinFET sram来说,PA的更好组合是未知的。这项工作比较了所有可能的PAs双组合,并揭示了较好的组合。我们表明,在通常的列复用场景中,负位线与VDD增强的组合和VDD崩溃与VDD增强的组合分别占14%和6%(总占20%),对于ULP物联网和认知应用,最大限度地提高了接近191mV的静态VMIN。我们还表明,在最坏情况下,在5GHz频率下,带负位线的字线增强和带VSS降低的字线增强组合分别实现了150mV和25mV的动态VMIN改进,优于其他组合。
{"title":"Multiple Combined Write-Read Peripheral Assists in 6T FinFET SRAMs for Low-VMIN IoT and Cognitive Applications","authors":"A. Banerjee, Sumanth Kamineni, B. Calhoun","doi":"10.1145/3218603.3218628","DOIUrl":"https://doi.org/10.1145/3218603.3218628","url":null,"abstract":"Battery-operated or energy-harvested IoT and cognitive SoCs in modern FinFET processes prefer the use of low-VMIN SRAMs for ultra-low power (ULP) operations. However, the 1:1:1 high-density (HD) FinFET 6T bitcell faces challenges in achieving a lower VMIN across process variation. The 6T bitcell VMIN improves either by increasing the size of the bitcell or by using combinations of peripheral assists (PAs) since a single PA cannot achieve the best VMIN across process variation. State-of-the-art works show some combinations of write and read PAs that lower the VMIN of 6T FinFET SRAMs. However, the better combinations of PA for 14nm HD 6T FinFET SRAMs are unknown. This work compares all the possible dual combinations of PAs and reveals the better ones. We show that in a usual column mux scenario the combination of negative bitline with VDD boosting and VDD collapse with VDD boosting in a proportion of 14% and 6% (total 20%), respectively, maximize the static VMIN improvement close to 191mV for ULP IoT and cognitive applications. We also show that a combination of wordline boosting with negative bitline and wordline boosting with VSS lowering achieve a 150mV and 25mV of dynamic VMIN improvement at the 5GHz frequency for the worst-case write and read corners, respectively, beating other combinations.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79108014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Imani, Ricardo Garcia, Saransh Gupta, Tajana Rosing
Approximate computing is a way to build fast and energy efficient systems, which provides responses of good enough quality tailored for different purposes. In this paper, we propose a novel approximate floating point multiplier which efficiently multiplies two floating numbers and yields a high precision product. RMAC approximates the costly mantissa multiplication to a simple addition between the mantissa of input operands. To tune the level of accuracy, RMAC looks at the first bit of the input mantissas as well as the first N bits of the result of addition to dynamically estimate the maximum multiplication error rate. Then, RMAC decides to either accept the approximate result or re-execute the exact multiplication. Depending on the value of N, the proposed RMAC can be configured to achieve different levels of accuracy. We integrate the proposed RMAC in AMD southern Island GPU, by replacing RMAC with the existing floating point units. We test the efficiency and accuracy of the enhanced GPU on a wide range of applications including multimedia and machine learning applications. Our evaluations show that a GPU enhanced by the proposed RMAC can achieve 5.2x energydelay product improvement as opposed to GPU using conventional FPUs while ensuring less than 2% quality loss. Comparing our approach with other state-of-the-art approximate multipliers shows that RMAC can achieve 3.1x faster and 1.8x more energy efficient computations while providing the same quality of service.
{"title":"RMAC","authors":"M. Imani, Ricardo Garcia, Saransh Gupta, Tajana Rosing","doi":"10.1145/3218603.3218621","DOIUrl":"https://doi.org/10.1145/3218603.3218621","url":null,"abstract":"Approximate computing is a way to build fast and energy efficient systems, which provides responses of good enough quality tailored for different purposes. In this paper, we propose a novel approximate floating point multiplier which efficiently multiplies two floating numbers and yields a high precision product. RMAC approximates the costly mantissa multiplication to a simple addition between the mantissa of input operands. To tune the level of accuracy, RMAC looks at the first bit of the input mantissas as well as the first N bits of the result of addition to dynamically estimate the maximum multiplication error rate. Then, RMAC decides to either accept the approximate result or re-execute the exact multiplication. Depending on the value of N, the proposed RMAC can be configured to achieve different levels of accuracy. We integrate the proposed RMAC in AMD southern Island GPU, by replacing RMAC with the existing floating point units. We test the efficiency and accuracy of the enhanced GPU on a wide range of applications including multimedia and machine learning applications. Our evaluations show that a GPU enhanced by the proposed RMAC can achieve 5.2x energydelay product improvement as opposed to GPU using conventional FPUs while ensuring less than 2% quality loss. Comparing our approach with other state-of-the-art approximate multipliers shows that RMAC can achieve 3.1x faster and 1.8x more energy efficient computations while providing the same quality of service.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80028691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Secure group-oriented communication is crucial to a wide range of applications in Internet of Things (IoT). Security problems related to group-oriented communications in IoT-based applications placed in a privacy-sensitive environment have become a major concern along with the development of the technology. Unfortunately, many IoT devices are designed to be portable and light-weight; thus, their functionalities, including security modules, are heavily constrained by the limited energy resources (e.g., battery capacity). To address these problems, we propose a group key management scheme based on a novel physically unclonable function (PUF) design: multistage interconnected PUF (MIPUF) to secure group communications in an energy-constrained environment. Our design is capable of performing key management tasks such as key distribution, key storage and rekeying securely and efficiently. We show that our design is secure against multiple attack methods and our experimental results show that our design saves 47.33% of energy globally comparing to state-of-the-art Elliptic-curve cryptography (ECC)-based key management scheme on average.
{"title":"Efficient and Secure Group Key Management in IoT using Multistage Interconnected PUF","authors":"Hongxiang Gu, M. Potkonjak","doi":"10.1145/3218603.3218646","DOIUrl":"https://doi.org/10.1145/3218603.3218646","url":null,"abstract":"Secure group-oriented communication is crucial to a wide range of applications in Internet of Things (IoT). Security problems related to group-oriented communications in IoT-based applications placed in a privacy-sensitive environment have become a major concern along with the development of the technology. Unfortunately, many IoT devices are designed to be portable and light-weight; thus, their functionalities, including security modules, are heavily constrained by the limited energy resources (e.g., battery capacity). To address these problems, we propose a group key management scheme based on a novel physically unclonable function (PUF) design: multistage interconnected PUF (MIPUF) to secure group communications in an energy-constrained environment. Our design is capable of performing key management tasks such as key distribution, key storage and rekeying securely and efficiently. We show that our design is secure against multiple attack methods and our experimental results show that our design saves 47.33% of energy globally comparing to state-of-the-art Elliptic-curve cryptography (ECC)-based key management scheme on average.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74284187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hossein Farrokhbakht, Hadi Mardani Kamali, Natalie D. Enright Jerger, S. Hessabi
Due to high aggregate idle time of Networks-on-Chip (NoCs) routers in practical applications, power-gating techniques have been proposed to combat the ever-increasing ratio of static power. Nevertheless, the sporadic packet arrivals compromise the effectiveness of power-gating by incurring significant latency and energy overhead. In this paper, we propose a Scalable Pivot-based On/Off Gating Engine (SPONGE) which efficiently manages power-gating decisions and routing mechanism by adaptively selecting a small set of powered-on columns of routers and keeping the others in power-gated state. To this end, a router architecture augmented with a novel routing algorithm is proposed in which a packet can traverse powered-off routers without waking them up, and can only turn in predetermined powered-on routers. Experimental results on SPLASH-2 benchmarks demonstrate that, compared to the conventional power-gating method, SPONGE on average not only improves static power consumption by 81.7%, it also improves average packet latency by 63%.
{"title":"SPONGE: A Scalable Pivot-based On/Off Gating Engine for Reducing Static Power in NoC Routers","authors":"Hossein Farrokhbakht, Hadi Mardani Kamali, Natalie D. Enright Jerger, S. Hessabi","doi":"10.1145/3218603.3218635","DOIUrl":"https://doi.org/10.1145/3218603.3218635","url":null,"abstract":"Due to high aggregate idle time of Networks-on-Chip (NoCs) routers in practical applications, power-gating techniques have been proposed to combat the ever-increasing ratio of static power. Nevertheless, the sporadic packet arrivals compromise the effectiveness of power-gating by incurring significant latency and energy overhead. In this paper, we propose a Scalable Pivot-based On/Off Gating Engine (SPONGE) which efficiently manages power-gating decisions and routing mechanism by adaptively selecting a small set of powered-on columns of routers and keeping the others in power-gated state. To this end, a router architecture augmented with a novel routing algorithm is proposed in which a packet can traverse powered-off routers without waking them up, and can only turn in predetermined powered-on routers. Experimental results on SPLASH-2 benchmarks demonstrate that, compared to the conventional power-gating method, SPONGE on average not only improves static power consumption by 81.7%, it also improves average packet latency by 63%.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74042981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As we approach the limits of 2D device scaling, monolithic 3D IC (M3D) has emerged as a potential solution offering performance and power benefits. Although various studies have been done to increase power savings of M3D designs, efforts to improve their performance are rarely made. In this paper, we, for the first time, perform in-depth analysis of the factors that affect the performance of M3D, and present methodologies to improve the performance. Our methodologies outperform the state-of-the-art M3D design flow by offering 15.6% performance improvement and 16.2% energy-delay product (EDP) benefit over 2D designs.
随着我们接近2D器件扩展的极限,单片3D IC (M3D)已经成为提供性能和功耗优势的潜在解决方案。尽管已经进行了各种各样的研究来提高M3D设计的功耗节约,但很少有人努力提高其性能。在本文中,我们首次对影响M3D性能的因素进行了深入的分析,并提出了提高性能的方法。我们的方法优于最先进的M3D设计流程,比2D设计提供15.6%的性能改进和16.2%的能量延迟产品(EDP)优势。
{"title":"Road to High-Performance 3D ICs: Performance Optimization Methodologies for Monolithic 3D ICs","authors":"Kyungwook Chang, S. Pentapati, D. Shim, S. Lim","doi":"10.1145/3218603.3218636","DOIUrl":"https://doi.org/10.1145/3218603.3218636","url":null,"abstract":"As we approach the limits of 2D device scaling, monolithic 3D IC (M3D) has emerged as a potential solution offering performance and power benefits. Although various studies have been done to increase power savings of M3D designs, efforts to improve their performance are rarely made. In this paper, we, for the first time, perform in-depth analysis of the factors that affect the performance of M3D, and present methodologies to improve the performance. Our methodologies outperform the state-of-the-art M3D design flow by offering 15.6% performance improvement and 16.2% energy-delay product (EDP) benefit over 2D designs.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77302552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Xu, Yu Li, Bo Huang, Xiangkai Liu, Hong Wang, Zhuoyan Wu, Zhanpeng Yan, Xueying Tu, Tongqing Wu, Daibing Zeng
This paper presents the design and VLSI implementation of a low-power HEVC main profile encoder, which is able to process up to 4096x2160@30fps 4:2:0 encoding in real-time with five-stage pipeline architecture. A pyramid ME (Motion Estimation) engine is employed to reduce search complexity. To compensate for the video sequences with fast moving objects, GME (Global Motion Estimation) are introduced to alleviate the effect of limited search range. We also implement an alternative 5x5 search along with 3x3 to boost video quality. For intra mode decision, original pixels, instead of reconstructed ones are used to reduce pipeline stall. The encoder supports DVFS (Dynamic Voltage and Frequency Scaling) and features three operating modes, which helps to reduce power consumption by 25%. Scalable quality that trades encoding quality for power by reducing size of search range and intra prediction candidates, achieves 11.4% power reduction with 3.5% quality degradation. Furthermore, a lossless frame buffer compression is proposed which reduced DDR bandwidth by 49.1% and power consumption by 13.6%. The entire video surveillance SoC is fabricated with TSMC 28nm technology with 1.96 mm2 area. It consumes 2.88M logic gates and 117KB SRAM. The measured power consumption is 103mW at 350MHz for 4K encoding with high-quality mode. The 0.39nJ/pixel of energy efficiency of this work, which achieves 42% ~ 97% power reduction as compared with reference designs, make it ideal for real-time low-power smart video surveillance applications.
{"title":"A Low-power 4096x2160@30fps H.265/HEVC Video Encoder for Smart Video Surveillance","authors":"Ke Xu, Yu Li, Bo Huang, Xiangkai Liu, Hong Wang, Zhuoyan Wu, Zhanpeng Yan, Xueying Tu, Tongqing Wu, Daibing Zeng","doi":"10.1145/3218603.3218604","DOIUrl":"https://doi.org/10.1145/3218603.3218604","url":null,"abstract":"This paper presents the design and VLSI implementation of a low-power HEVC main profile encoder, which is able to process up to 4096x2160@30fps 4:2:0 encoding in real-time with five-stage pipeline architecture. A pyramid ME (Motion Estimation) engine is employed to reduce search complexity. To compensate for the video sequences with fast moving objects, GME (Global Motion Estimation) are introduced to alleviate the effect of limited search range. We also implement an alternative 5x5 search along with 3x3 to boost video quality. For intra mode decision, original pixels, instead of reconstructed ones are used to reduce pipeline stall. The encoder supports DVFS (Dynamic Voltage and Frequency Scaling) and features three operating modes, which helps to reduce power consumption by 25%. Scalable quality that trades encoding quality for power by reducing size of search range and intra prediction candidates, achieves 11.4% power reduction with 3.5% quality degradation. Furthermore, a lossless frame buffer compression is proposed which reduced DDR bandwidth by 49.1% and power consumption by 13.6%. The entire video surveillance SoC is fabricated with TSMC 28nm technology with 1.96 mm2 area. It consumes 2.88M logic gates and 117KB SRAM. The measured power consumption is 103mW at 350MHz for 4K encoding with high-quality mode. The 0.39nJ/pixel of energy efficiency of this work, which achieves 42% ~ 97% power reduction as compared with reference designs, make it ideal for real-time low-power smart video surveillance applications.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85661034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}