Proceedings of the 2016 International Symposium on Low Power Electronics and Design最新文献

A Low Power Current-Mode Flash ADC with Spin Hall Effect based Multi-Threshold Comparator 基于自旋霍尔效应的多阈值比较器的低功耗电流模式Flash ADC

Proceedings of the 2016 International Symposium on Low Power Electronics and Design

Pub Date : 2016-08-08 DOI: 10.1145/2934583.2934642

Zhezhi He, Deliang Fan

Current-mode Analog-to-Digital Converter (ADC) has drawn many attentions due to its high operating speed, power and ground noise immunity, and etc. However, 2n -- 1 comparators are required in traditional n-bit current-mode ADC design, leading to inevitable high power consumption and large chip area. In this work, we propose a low power and compact current mode Multi-Threshold Comparator (MTC) based on giant Spin Hall Effect (SHE). The two threshold currents of the proposed SHE-MTC are 200μA and 250μA with 1ns switching time, respectively. The proposed current-mode hybrid spin-CMOS flash ADC based on SHE-MTC reduces the number of comparators almost by half (2n-1), thus correspondingly reducing the required current mirror branches, total power consumption and chip area. Moreover, due to the non-volatility of SHE-MTC, the front-end analog circuits can be switched off when it is not required to further increase power efficiency. The device dynamics of SHE-MTC is simulated using a numerical device model based on Landau-Lifshitz-Gilbert (LLG) equation with Spin-Transfer Torque (STT) term and SHE term. The device-circuit co-simulation in SPICE (45nm CMOS technology) have shown that the average power dissipation of proposed ADC is 1.9mW, operating at 500MS/s with 1.2 V power supply. The INL and DNL are in the range of 0.23LSB and 0.32LSB, respectively.

电流型模数转换器(ADC)因其工作速度快、功率大、抗地噪声强等优点而受到广泛关注。然而，在传统的n位电流模式ADC设计中，需要2n—1比较器，导致不可避免的高功耗和大芯片面积。在这项工作中，我们提出了一种基于巨自旋霍尔效应的低功耗、紧凑的电流模式多阈值比较器(MTC)。该SHE-MTC的两个阈值电流分别为200μA和250μA，开关时间为1ns。本文提出的基于SHE-MTC的电流模式混合自旋cmos闪存ADC将比较器的数量减少了近一半(2n-1)，从而相应减少了所需的电流镜像支路、总功耗和芯片面积。此外，由于SHE-MTC的非易失性，可以在不需要时关闭前端模拟电路，以进一步提高功率效率。采用基于Landau-Lifshitz-Gilbert (LLG)方程的自旋传递转矩(STT)项和SHE项的数值器件模型，对SHE- mtc的器件动力学进行了仿真。在SPICE (45nm CMOS技术)中进行的器件电路联合仿真表明，该ADC的平均功耗为1.9mW，工作速度为500MS/s，电源为1.2 V。INL和DNL分别为0.23LSB和0.32LSB。

{"title":"A Low Power Current-Mode Flash ADC with Spin Hall Effect based Multi-Threshold Comparator","authors":"Zhezhi He, Deliang Fan","doi":"10.1145/2934583.2934642","DOIUrl":"https://doi.org/10.1145/2934583.2934642","url":null,"abstract":"Current-mode Analog-to-Digital Converter (ADC) has drawn many attentions due to its high operating speed, power and ground noise immunity, and etc. However, 2n -- 1 comparators are required in traditional n-bit current-mode ADC design, leading to inevitable high power consumption and large chip area. In this work, we propose a low power and compact current mode Multi-Threshold Comparator (MTC) based on giant Spin Hall Effect (SHE). The two threshold currents of the proposed SHE-MTC are 200μA and 250μA with 1ns switching time, respectively. The proposed current-mode hybrid spin-CMOS flash ADC based on SHE-MTC reduces the number of comparators almost by half (2n-1), thus correspondingly reducing the required current mirror branches, total power consumption and chip area. Moreover, due to the non-volatility of SHE-MTC, the front-end analog circuits can be switched off when it is not required to further increase power efficiency. The device dynamics of SHE-MTC is simulated using a numerical device model based on Landau-Lifshitz-Gilbert (LLG) equation with Spin-Transfer Torque (STT) term and SHE term. The device-circuit co-simulation in SPICE (45nm CMOS technology) have shown that the average power dissipation of proposed ADC is 1.9mW, operating at 500MS/s with 1.2 V power supply. The INL and DNL are in the range of 0.23LSB and 0.32LSB, respectively.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117250573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Overview of IEEE1801-2015: Standard for Design and Verification of Low-Power, Energy-Aware Electronic Systems: Invited Paper IEEE1801-2015概述:低功耗，节能电子系统的设计和验证标准:特邀论文

Proceedings of the 2016 International Symposium on Low Power Electronics and Design

Pub Date : 2016-08-08 DOI: 10.1145/2934583.2962724

Sushma Honnavara-Prasad

P1801 is an IEEE-SA entity based work-group that consists of a wide range of participants including EDA, Semiconductor companies and IP-providers. 1801-2015 is the latest revision of the standard that made it,s first debut in 2009. This talk will give an overview of IEEE 1801-2015, and elaborate on the new capabilities and updates in 2015 compared to the previous standard release. Some of the key updates that will be discussed include: • A major revision of the definition of power states and transitions. • New concepts related to hard and soft macros and bottom-up implementation. • Support for IP component power modeling for system level power analysis. • Generalization of supply states to apply to all supply objects, not just ports. • Generalization of power models to represent hard and soft macros, power models, etc. • New Power groups concept for defining a group of related power states. • Support for user-defined supply resolution functions and related semantics. • Support for find_objects to find supply ports. • Ability to map repeater strategies to library cells. • New model for nominal supply values, supply source variation, and correlation. • Clarification of level shifting insertion algorithm, and removal of default level shifter strategy. • New Information Model (clause 10) and API (UPF packages and Tcl bindings). • New Query functions. • New usage examples illustrating Successive Refinement and Bottom-Up implementation.

P1801是一个基于IEEE-SA实体的工作组，由包括EDA、半导体公司和ip提供商在内的广泛参与者组成。1801-2015是该标准的最新修订版，该标准于2009年首次亮相。本次演讲将概述IEEE 1801-2015，并详细介绍2015年与之前标准版本相比的新功能和更新。将讨论的一些关键更新包括:•对权力状态和转换定义的重大修订。•有关硬宏和软宏以及自底向上实现的新概念。•支持IP组件功率建模，用于系统级功率分析。•将供应状态一般化，适用于所有供应对象，而不仅仅是端口。•权力模型的泛化，以表示软硬宏、权力模型等•新的权力组概念，用于定义一组相关的权力状态。•支持用户定义的供应解析函数和相关语义。•支持find_objects来查找供电端口。•能够将中继器策略映射到库单元。•标称供应值、供应来源变化和相关性的新模型。•澄清电平移位插入算法，并删除默认的电平移位策略。•新的信息模型(条款10)和API (UPF包和Tcl绑定)。•新增查询功能。•新的用法示例说明连续细化和自底向上实现。

{"title":"Overview of IEEE1801-2015: Standard for Design and Verification of Low-Power, Energy-Aware Electronic Systems: Invited Paper","authors":"Sushma Honnavara-Prasad","doi":"10.1145/2934583.2962724","DOIUrl":"https://doi.org/10.1145/2934583.2962724","url":null,"abstract":"P1801 is an IEEE-SA entity based work-group that consists of a wide range of participants including EDA, Semiconductor companies and IP-providers. 1801-2015 is the latest revision of the standard that made it,s first debut in 2009. This talk will give an overview of IEEE 1801-2015, and elaborate on the new capabilities and updates in 2015 compared to the previous standard release. Some of the key updates that will be discussed include: • A major revision of the definition of power states and transitions. • New concepts related to hard and soft macros and bottom-up implementation. • Support for IP component power modeling for system level power analysis. • Generalization of supply states to apply to all supply objects, not just ports. • Generalization of power models to represent hard and soft macros, power models, etc. • New Power groups concept for defining a group of related power states. • Support for user-defined supply resolution functions and related semantics. • Support for find_objects to find supply ports. • Ability to map repeater strategies to library cells. • New model for nominal supply values, supply source variation, and correlation. • Clarification of level shifting insertion algorithm, and removal of default level shifter strategy. • New Information Model (clause 10) and API (UPF packages and Tcl bindings). • New Query functions. • New usage examples illustrating Successive Refinement and Bottom-Up implementation.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125409741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Extending the Moore's law by exploring new data center architecture: Invited Paper 通过探索新的数据中心架构来扩展摩尔定律:特邀论文

Proceedings of the 2016 International Symposium on Low Power Electronics and Design

Pub Date : 2016-08-08 DOI: 10.1145/2934583.2953981

Jian Ouyang, Wei Qi, Yong Wang

In recent ten years, lots of new applications emerged, such as AI, big data and cloud. Though the workloads of these applications are very diverse, they demand huge resource of data center. In contrast, the silicon technology moves slower and slower because the Moore's law is going to the end. Consequently, the data center building from commodity hardware cannot provide enough cost-efficiency and power-efficiency. To meet the increasingly resource needs of emerging applications, the scale of data center is become much larger and larger. It consumes huge power and cost of hardware. From the business perspective, the slow development of hardware technology limits the value creation of emerging applications. We, Baidu, the largest search engine in China, have faced this challenge in several years ago. We find that the server number increases much faster than the scale of business. And this case is common for internet companies. Because the iteration of general processor becomes slower and slower. For example, Intel announced that the Tick-Tock production strategic was out of date in this early year. This problem drive us to look for new methods to boost business. From Internet Company's perspective, building new chips or new architecture based on its applications' characteristics makes sense. This method can break the limitation of commodity chips and commodity hardware. And according to academic and industry experiences, domain-specified architecture can achieve much better performance and power efficiency than general architecture. Consequently, we are exploring new architecture to extend Moore's law. In this paper, we present the works on exploring new architecture for data center. The data center resource includes storage, memory, computing and networking. Hence, we focus on these four areas. Firstly, we implemented SDF for large-scale distributed storage system. The SDF aims to low cost and high performance flash storage system. Secondly, we implemented SDA for deep learning big data. The SDA is dedicated to solve the computing bottle of emerging applications. The left paper is organized as following. The section 2 is about SDF [1]. The section 3 describes SDA for deep learning [2]. Section 4 presents SDA for big data [3]. And the last section is the conclusion.

近十年来，人工智能、大数据、云计算等新应用层出不穷。虽然这些应用程序的工作负载非常多样化，但它们需要大量的数据中心资源。相比之下，硅技术的发展越来越慢，因为摩尔定律正在走向终结。因此，从商用硬件构建的数据中心无法提供足够的成本效益和能效。为了满足新兴应用日益增长的资源需求，数据中心的规模越来越大。它消耗巨大的电力和硬件成本。从商业角度来看，硬件技术的缓慢发展限制了新兴应用的价值创造。我们，百度，中国最大的搜索引擎，几年前就面临过这样的挑战。我们发现服务器数量的增长速度远远快于业务规模的增长速度。这种情况在互联网公司中很常见。因为通用处理器的迭代变得越来越慢。例如，英特尔在今年年初宣布，Tick-Tock生产战略已经过时。这个问题促使我们寻找新的方法来促进业务。从互联网公司的角度来看，根据其应用程序的特点构建新的芯片或新的架构是有意义的。这种方法可以打破商品芯片和商品硬件的限制。根据学术界和工业界的经验，领域特定架构可以获得比通用架构更好的性能和能效。因此，我们正在探索新的架构来扩展摩尔定律。在本文中，我们介绍了探索新的数据中心体系结构的工作。数据中心资源包括存储、内存、计算和网络。因此，我们重点关注这四个方面。首先，我们实现了大规模分布式存储系统的SDF。SDF旨在开发低成本、高性能的闪存存储系统。二是实现深度学习大数据的SDA。SDA是专门解决新兴应用的计算瓶。左边的论文组织如下。第2部分是关于SDF[1]。第3节描述了深度学习的SDA[2]。第4节介绍了大数据的SDA[3]。最后一部分是结论。

{"title":"Extending the Moore's law by exploring new data center architecture: Invited Paper","authors":"Jian Ouyang, Wei Qi, Yong Wang","doi":"10.1145/2934583.2953981","DOIUrl":"https://doi.org/10.1145/2934583.2953981","url":null,"abstract":"In recent ten years, lots of new applications emerged, such as AI, big data and cloud. Though the workloads of these applications are very diverse, they demand huge resource of data center. In contrast, the silicon technology moves slower and slower because the Moore's law is going to the end. Consequently, the data center building from commodity hardware cannot provide enough cost-efficiency and power-efficiency. To meet the increasingly resource needs of emerging applications, the scale of data center is become much larger and larger. It consumes huge power and cost of hardware. From the business perspective, the slow development of hardware technology limits the value creation of emerging applications. We, Baidu, the largest search engine in China, have faced this challenge in several years ago. We find that the server number increases much faster than the scale of business. And this case is common for internet companies. Because the iteration of general processor becomes slower and slower. For example, Intel announced that the Tick-Tock production strategic was out of date in this early year. This problem drive us to look for new methods to boost business. From Internet Company's perspective, building new chips or new architecture based on its applications' characteristics makes sense. This method can break the limitation of commodity chips and commodity hardware. And according to academic and industry experiences, domain-specified architecture can achieve much better performance and power efficiency than general architecture. Consequently, we are exploring new architecture to extend Moore's law. In this paper, we present the works on exploring new architecture for data center. The data center resource includes storage, memory, computing and networking. Hence, we focus on these four areas. Firstly, we implemented SDF for large-scale distributed storage system. The SDF aims to low cost and high performance flash storage system. Secondly, we implemented SDA for deep learning big data. The SDA is dedicated to solve the computing bottle of emerging applications. The left paper is organized as following. The section 2 is about SDF [1]. The section 3 describes SDA for deep learning [2]. Section 4 presents SDA for big data [3]. And the last section is the conclusion.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"1650 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122705067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Can We Guarantee Performance Requirements under Workload and Process Variations? 我们能保证工作负荷和过程变化下的性能需求吗?

Proceedings of the 2016 International Symposium on Low Power Electronics and Design

Pub Date : 2016-08-08 DOI: 10.1145/2934583.2934641

Dimitrios Stamoulis, Diana Marculescu

Modern many-core systems must cope with a wide range of heterogeneity due to both manufacturing process variations and extreme requirements of multi-application, multithreaded workloads. The latter is increasingly challenging in the context of different performance constraints per multithreaded application. Existing thread mapping methods primarily focus on maximizing performance under a global power budget, failing to provide thread- and application-specific performance guarantees. This paper provides a comprehensive approach for variation- and workload-aware thread mapping on heterogeneous multi-core systems that satisfies per-application performance requirements and is manufacturing process variation-aware, while providing an analysis of its robustness to uncertainties in the power and performance models. We formulate the variation-aware mapping problem as a constrained 0-1 integer linear program (ILP) and we propose a heuristic-based algorithm for efficiently solving it. Compared with an optimal solver, our method produces results less than 10% away from optimum on average, with four orders of magnitude improvement in runtime. Moreover, the newly proposed method is robust to model uncertainty and in meeting per application performance requirements, while agnostic approaches result in performance bound violations (up to 100% in many cases).

现代多核系统必须应对由于制造工艺变化和多应用程序、多线程工作负载的极端要求而产生的广泛的异构性。后者在每个多线程应用程序的不同性能约束上下文中越来越具有挑战性。现有的线程映射方法主要关注全局功率预算下的性能最大化，而不能提供特定于线程和应用程序的性能保证。本文为异构多核系统上的变化和工作负载感知线程映射提供了一种全面的方法，该方法满足每个应用程序的性能要求，并且是制造过程变化感知的，同时提供了其对功率和性能模型中不确定性的鲁棒性分析。我们将变化感知映射问题表述为一个约束0-1整数线性规划(ILP)，并提出了一种基于启发式的算法来有效地求解该问题。与最优求解器相比，我们的方法产生的结果平均距离最优值不到10%，运行时间提高了4个数量级。此外，新提出的方法对建模不确定性和满足每个应用程序的性能需求具有鲁棒性，而不可知方法会导致性能边界违规(在许多情况下高达100%)。

{"title":"Can We Guarantee Performance Requirements under Workload and Process Variations?","authors":"Dimitrios Stamoulis, Diana Marculescu","doi":"10.1145/2934583.2934641","DOIUrl":"https://doi.org/10.1145/2934583.2934641","url":null,"abstract":"Modern many-core systems must cope with a wide range of heterogeneity due to both manufacturing process variations and extreme requirements of multi-application, multithreaded workloads. The latter is increasingly challenging in the context of different performance constraints per multithreaded application. Existing thread mapping methods primarily focus on maximizing performance under a global power budget, failing to provide thread- and application-specific performance guarantees. This paper provides a comprehensive approach for variation- and workload-aware thread mapping on heterogeneous multi-core systems that satisfies per-application performance requirements and is manufacturing process variation-aware, while providing an analysis of its robustness to uncertainties in the power and performance models. We formulate the variation-aware mapping problem as a constrained 0-1 integer linear program (ILP) and we propose a heuristic-based algorithm for efficiently solving it. Compared with an optimal solver, our method produces results less than 10% away from optimum on average, with four orders of magnitude improvement in runtime. Moreover, the newly proposed method is robust to model uncertainty and in meeting per application performance requirements, while agnostic approaches result in performance bound violations (up to 100% in many cases).","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128702230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Modeling and implementation of a fully-digital integrated per-core voltage regulation system in a 28nm high performance 64-bit processor 在28nm高性能64位处理器上的全数字集成单核电压调节系统的建模和实现

Proceedings of the 2016 International Symposium on Low Power Electronics and Design

Pub Date : 2016-08-08 DOI: 10.1145/2934583.2934586

R. Rachala, Miguel Rodriguez, S. Kosonocky, Milos Trajkovic

This paper describes modeling and implementation of a fully digital integrated linear voltage regulation system implemented in a 28nm x86-64 core to reduce power gating entry or exit latency. Running on a 100 MHz clock, the controller samples voltage using a time-to-digital converter, and controls a set of PFETs organized in a ring topology around the CPU cores to drop voltage down to a specified target value. A simple analytical model is developed and validated through fast Matlab-Simulink simulation, enabling quick design turnaround and reducing schedule impact. The regulation system is designed to support input-output voltages in the range 1.3 V - 0.55 V. Digitally-controlled header resistance values range from 1.5 Ω to 2 mΩ. Stable processor behavior is observed down to 0.6 V, enabling fast pseudo-power gating entry and exit. In a high-performance x86-64 dual-core microprocessor chip, the controller enables an effective 6% frequency increase for lightly threaded applications by increasing the boost state residency.

本文描述了在28nm x86-64内核中实现的全数字集成线性电压调节系统的建模和实现，以减少功率门控进入或退出延迟。在100mhz时钟上运行，控制器使用时间-数字转换器对电压进行采样，并控制一组围绕CPU内核以环形拓扑结构组织的pfet，以将电压降至指定的目标值。通过快速Matlab-Simulink仿真，开发并验证了一个简单的分析模型，从而实现快速设计周转并减少进度影响。调节系统的设计支持输入输出电压范围在1.3 V - 0.55 V。数字控制头电阻值范围从1.5 Ω到2 mΩ。稳定的处理器行为被观察到低至0.6 V，使快速伪功率门控进入和退出。在高性能x86-64双核微处理器芯片中，该控制器通过增加升压状态驻留，使轻线程应用的频率有效提高6%。

{"title":"Modeling and implementation of a fully-digital integrated per-core voltage regulation system in a 28nm high performance 64-bit processor","authors":"R. Rachala, Miguel Rodriguez, S. Kosonocky, Milos Trajkovic","doi":"10.1145/2934583.2934586","DOIUrl":"https://doi.org/10.1145/2934583.2934586","url":null,"abstract":"This paper describes modeling and implementation of a fully digital integrated linear voltage regulation system implemented in a 28nm x86-64 core to reduce power gating entry or exit latency. Running on a 100 MHz clock, the controller samples voltage using a time-to-digital converter, and controls a set of PFETs organized in a ring topology around the CPU cores to drop voltage down to a specified target value. A simple analytical model is developed and validated through fast Matlab-Simulink simulation, enabling quick design turnaround and reducing schedule impact. The regulation system is designed to support input-output voltages in the range 1.3 V - 0.55 V. Digitally-controlled header resistance values range from 1.5 Ω to 2 mΩ. Stable processor behavior is observed down to 0.6 V, enabling fast pseudo-power gating entry and exit. In a high-performance x86-64 dual-core microprocessor chip, the controller enables an effective 6% frequency increase for lightly threaded applications by increasing the boost state residency.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129475561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Session details: Non-Volatile Memory: Technology & System 非易失性存储器:技术与系统

Proceedings of the 2016 International Symposium on Low Power Electronics and Design

Pub Date : 2016-08-08 DOI: 10.1145/3256023

Yongpan Liu, S. Mukhopadhyay

引用次数: 0

Four-tier Monolithic 3D ICs: Tier Partitioning Methodology and Power Benefit Study 四层单片3D集成电路:层划分方法和功耗效益研究

Proceedings of the 2016 International Symposium on Low Power Electronics and Design

Pub Date : 2016-08-08 DOI: 10.1145/2934583.2934623

Kwang Min Kim, S. Sinha, B. Cline, G. Yeric, S. Lim

Monolithic 3D IC is an emerging technology to continuously satisfy demands for power reduction under challenges posed by traditional device scaling. In this paper, for the first time, we study power benefits of 4-tier monolithic 3D ICs compared with 2-tier monolithic 3D and 2D ICs. We present a tier partitioning methodology that significantly extends the capability of a state-of-the-art flow built for 2-tier monolithic 3D ICs. We develop two complete RTL-to-GDSII design flows to achieve this goal and offer quantitative comparisons. In addition, we study impacts of inter-tier via usage on 2-tier and 4-tier monolithic 3D ICs. Our experiments show that poorly controlled inter-tier via usage results in up to 6.05% degradation in total power savings. Thus, we develop an effective strategy to achieve inter-tier via configurations to optimize power metrics. Experiments show that 4-tier monolithic 3D ICs outperform 2-tier and 2D IC by 15% and 50% in terms of power and 25% and 75% in area under the same performance.

单片3D集成电路是一种新兴的技术，可以不断满足传统器件缩放带来的挑战。在本文中，我们首次研究了4层单片3D集成电路与2层单片3D和2D集成电路的功耗优势。我们提出了一种层划分方法，显着扩展了为2层单片3D集成电路构建的最先进流的能力。我们开发了两个完整的RTL-to-GDSII设计流程来实现这一目标，并提供了定量比较。此外，我们还研究了在2层和4层单片3D集成电路中使用inter-tier via的影响。我们的实验表明，控制不佳的层间通道使用导致总功耗下降高达6.05%。因此，我们开发了一种有效的策略来实现层间通过配置来优化功率指标。实验表明，在相同性能下，4层单片3D集成电路的功耗分别比2层和2D集成电路高15%和50%，面积分别比2层和2D集成电路高25%和75%。

引用次数: 9

T-DVS: Temperature-aware DVS based on Temperature Inversion Phenomenon T-DVS:基于逆温现象的温度感知分布式交换机

Proceedings of the 2016 International Symposium on Low Power Electronics and Design

Pub Date : 2016-08-08 DOI: 10.1145/2934583.2934631

Jinsoo Park, H. Cha

Dynamic Voltage and Frequency Scaling (DVFS) is a widely used methodology to reduce the power consumption of mobile devices. This scheme performs frequency scaling in accordance with a specific governor and sets an operating voltage to be paired with the frequency. Temperature is one of the critical parameters affecting device operation. Practically, a guard-band exists in the operating voltage to ensure safe processor operation even at the worst temperature. DVFS can be optimized in terms of operating voltage under nominal conditions. In this paper, we propose a Temperature-aware DVS (T-DVS) that aggressively reduces the voltage guard-band. We explore the opportunity of providing the minimum operating voltages for frequencies at different temperatures and realize a dynamic voltage control scheme to optimize power consumption. The effectiveness of T-DVS is validated under various thermal conditions by using multi-core application processor. We experimentally observe that T-DVS leads to voltage gain without performance degradation regardless of both thermal conditions and chip characteristics. We show by using off-the-shelf smartphones that the voltage gain achieved by the scheme results in battery lifetime increment.

动态电压和频率缩放(DVFS)是一种广泛使用的降低移动设备功耗的方法。该方案根据特定的调速器进行频率缩放，并设置与频率配对的工作电压。温度是影响设备运行的关键参数之一。实际上，在工作电压中存在一个保护带，以确保即使在最恶劣的温度下处理器也能安全运行。DVFS可以根据标称条件下的工作电压进行优化。在本文中，我们提出了一种温度感知的分布式交换机(T-DVS)，它可以有效地降低电压保护带。我们探索在不同温度下为频率提供最小工作电压的机会，并实现动态电压控制方案以优化功耗。利用多核应用处理器验证了T-DVS在各种热条件下的有效性。我们通过实验观察到，无论热条件和芯片特性如何，T-DVS都会导致电压增益而不会导致性能下降。我们通过使用现成的智能手机表明，该方案获得的电压增益导致电池寿命增加。

{"title":"T-DVS: Temperature-aware DVS based on Temperature Inversion Phenomenon","authors":"Jinsoo Park, H. Cha","doi":"10.1145/2934583.2934631","DOIUrl":"https://doi.org/10.1145/2934583.2934631","url":null,"abstract":"Dynamic Voltage and Frequency Scaling (DVFS) is a widely used methodology to reduce the power consumption of mobile devices. This scheme performs frequency scaling in accordance with a specific governor and sets an operating voltage to be paired with the frequency. Temperature is one of the critical parameters affecting device operation. Practically, a guard-band exists in the operating voltage to ensure safe processor operation even at the worst temperature. DVFS can be optimized in terms of operating voltage under nominal conditions. In this paper, we propose a Temperature-aware DVS (T-DVS) that aggressively reduces the voltage guard-band. We explore the opportunity of providing the minimum operating voltages for frequencies at different temperatures and realize a dynamic voltage control scheme to optimize power consumption. The effectiveness of T-DVS is validated under various thermal conditions by using multi-core application processor. We experimentally observe that T-DVS leads to voltage gain without performance degradation regardless of both thermal conditions and chip characteristics. We show by using off-the-shelf smartphones that the voltage gain achieved by the scheme results in battery lifetime increment.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132944931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

FVCAG: A framework for formal verification driven power modeling and verification FVCAG:一个用于正式验证驱动的功率建模和验证的框架

Proceedings of the 2016 International Symposium on Low Power Electronics and Design

Pub Date : 2016-08-08 DOI: 10.1145/2934583.2934633

Arun Joseph, Spandana Rachamalla, R. Rao, A. Haridass, P. K. Nalla

Generation of accurate IP power models requires determination of correct simulation conditions for the different input pins of the IP. Determining such a set of inputs for individual IP blocks in a design is expensive in cost and time, and is also highly error prone. Additionally, it is desirable to identify IP instances in a design, where these simulation conditions are not met. These are relevant problems in the context of modern day microprocessor designs, which are designed using a very large number of IPs, either developed in-house or sourced from external vendors. In this paper, we examine these problems in an industrial context and introduce FVCAG, a framework for enabling efficient and accurate power modelling. FVCAG enables a more thorough IP power modelling than that can be accomplished using current state of the art techniques. Experimental evaluation of the proposed framework on the standard cell library and macros used in the design of an industry class high performance microprocessor design demonstrates the accuracy and efficiency of proposed framework.

生成精确的IP功率模型需要为IP的不同输入引脚确定正确的仿真条件。为设计中的单个IP块确定这样一组输入在成本和时间上都很昂贵，而且也很容易出错。此外，在不满足这些模拟条件的设计中，需要识别IP实例。这些都是现代微处理器设计背景下的相关问题，这些设计使用了大量的ip，要么是内部开发的，要么是从外部供应商那里采购的。在本文中，我们在工业背景下研究这些问题，并介绍FVCAG，这是一个实现高效和准确功率建模的框架。FVCAG可以实现比使用当前最先进技术更彻底的IP功率建模。在一个工业级高性能微处理器设计中使用标准单元库和宏对所提出的框架进行了实验评估，验证了所提出框架的准确性和效率。

{"title":"FVCAG: A framework for formal verification driven power modeling and verification","authors":"Arun Joseph, Spandana Rachamalla, R. Rao, A. Haridass, P. K. Nalla","doi":"10.1145/2934583.2934633","DOIUrl":"https://doi.org/10.1145/2934583.2934633","url":null,"abstract":"Generation of accurate IP power models requires determination of correct simulation conditions for the different input pins of the IP. Determining such a set of inputs for individual IP blocks in a design is expensive in cost and time, and is also highly error prone. Additionally, it is desirable to identify IP instances in a design, where these simulation conditions are not met. These are relevant problems in the context of modern day microprocessor designs, which are designed using a very large number of IPs, either developed in-house or sourced from external vendors. In this paper, we examine these problems in an industrial context and introduce FVCAG, a framework for enabling efficient and accurate power modelling. FVCAG enables a more thorough IP power modelling than that can be accomplished using current state of the art techniques. Experimental evaluation of the proposed framework on the standard cell library and macros used in the design of an industry class high performance microprocessor design demonstrates the accuracy and efficiency of proposed framework.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124201142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Fully Parameterizable Low Power Design of Vector Fused Multiply-Add Using Active Clock-Gating Techniques 基于有源时钟门控技术的矢量融合乘加全参数化低功耗设计

Proceedings of the 2016 International Symposium on Low Power Electronics and Design

Pub Date : 2016-08-08 DOI: 10.1145/2934583.2934587

Ivan Ratković, Oscar Palomar, Milan Stanic, O. Unsal, A. Cristal, M. Valero

The need for power-efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a re-tailoring for the mobile market that they are entering now. Floating point fused multiply-add, being a power consuming functional unit, deserves special attention. Although clock-gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector fused multiply-add units (VFU). These techniques ensure power savings without jeopardizing the timing. Using vector masking and vector multi-lane-aware clock-gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector floating-point instructions. We perform this research in a fully parameterizable and automated fashion using various tools at both architectural and circuit levels.

对能效的需求促使人们重新思考处理器架构中的设计决策。虽然矢量处理器过去在高性能市场取得了成功，但它们现在需要重新调整，以适应它们正在进入的移动市场。浮点融合乘加运算作为一种耗电的功能单元，值得特别关注。虽然时钟门控在同步设计中是一种众所周知的降低开关功率的方法，但它在矢量处理器中的应用仍有未开发的机会，特别是在考虑主动工作模式时。在这项研究中，我们全面地确定、提出并评估了向量融合乘加单元(VFU)最合适的时钟门控技术。这些技术确保在不影响时间的情况下节省电力。使用矢量掩蔽和矢量多通道感知时钟门控，我们报告了高达52%的功耗降低，假设有源VFU在峰值性能下工作。在其他发现中，我们观察到基于矢量指令的时钟门控技术可以为所有矢量浮点指令节省功耗。我们在架构和电路级别使用各种工具以完全可参数化和自动化的方式进行这项研究。

{"title":"A Fully Parameterizable Low Power Design of Vector Fused Multiply-Add Using Active Clock-Gating Techniques","authors":"Ivan Ratković, Oscar Palomar, Milan Stanic, O. Unsal, A. Cristal, M. Valero","doi":"10.1145/2934583.2934587","DOIUrl":"https://doi.org/10.1145/2934583.2934587","url":null,"abstract":"The need for power-efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a re-tailoring for the mobile market that they are entering now. Floating point fused multiply-add, being a power consuming functional unit, deserves special attention. Although clock-gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector fused multiply-add units (VFU). These techniques ensure power savings without jeopardizing the timing. Using vector masking and vector multi-lane-aware clock-gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector floating-point instructions. We perform this research in a fully parameterizable and automated fashion using various tools at both architectural and circuit levels.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"396 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123468935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2