首页 > 最新文献

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)最新文献

英文 中文
Specification-Driven Automated Conformance Checking for Virtual Prototype and Post-Silicon Designs 虚拟样机和后硅设计的规范驱动的自动一致性检查
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196119
Haifeng Gu, Mingsong Chen, Tongquan Wei, Li Lei, Fei Xie
Due to the increasing complexity of System-on-Chip (SoC) design, how to ensure that silicon implementations conform to their high-level specifications is becoming a major challenge. To address this problem, we propose a novel specification-driven conformance checking approach that can automatically identify inconsistencies between different levels of designs. By extending SystemRDL specifications, our approach enables the generation of high-level Formal Device Models (FDMs) that specify access behaviors of interface registers triggered by driver requests. Based on the symbolic execution of the generated FDMs with the same driver requests to virtual/silicon devices, our approach can efficiently check whether the designs of an SoC at different levels exhibit unexpected behaviors that are not modeled in the given specification. Experiments on two industrial network adapters demonstrate the effectiveness of our approach in troubleshooting bugs caused by inconsistencies in both virtual and post-silicon prototypes.
由于片上系统(SoC)设计的复杂性日益增加,如何确保芯片实现符合其高级规范成为一个主要挑战。为了解决这个问题,我们提出了一种新的规范驱动的一致性检查方法,可以自动识别不同级别设计之间的不一致性。通过扩展SystemRDL规范,我们的方法能够生成高级正式设备模型(fdm),该模型指定由驱动程序请求触发的接口寄存器的访问行为。基于对虚拟/硅器件具有相同驱动程序请求的生成fdm的符号执行,我们的方法可以有效地检查不同级别的SoC设计是否表现出给定规范中未建模的意外行为。在两个工业网络适配器上的实验证明了我们的方法在排除由虚拟和后硅原型不一致引起的错误方面的有效性。
{"title":"Specification-Driven Automated Conformance Checking for Virtual Prototype and Post-Silicon Designs","authors":"Haifeng Gu, Mingsong Chen, Tongquan Wei, Li Lei, Fei Xie","doi":"10.1145/3195970.3196119","DOIUrl":"https://doi.org/10.1145/3195970.3196119","url":null,"abstract":"Due to the increasing complexity of System-on-Chip (SoC) design, how to ensure that silicon implementations conform to their high-level specifications is becoming a major challenge. To address this problem, we propose a novel specification-driven conformance checking approach that can automatically identify inconsistencies between different levels of designs. By extending SystemRDL specifications, our approach enables the generation of high-level Formal Device Models (FDMs) that specify access behaviors of interface registers triggered by driver requests. Based on the symbolic execution of the generated FDMs with the same driver requests to virtual/silicon devices, our approach can efficiently check whether the designs of an SoC at different levels exhibit unexpected behaviors that are not modeled in the given specification. Experiments on two industrial network adapters demonstrate the effectiveness of our approach in troubleshooting bugs caused by inconsistencies in both virtual and post-silicon prototypes.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"8 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86590352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
[DAC 2018 Awards - various Awards] [DAC 2018奖项-各类奖项]
Pub Date : 2018-06-01 DOI: 10.1109/dac.2018.8465923
{"title":"[DAC 2018 Awards - various Awards]","authors":"","doi":"10.1109/dac.2018.8465923","DOIUrl":"https://doi.org/10.1109/dac.2018.8465923","url":null,"abstract":"","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91209695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Hyperdimensional Computing for Energy Efficient Classification 能效分类的层次超维计算
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196060
M. Imani, Chenyu Huang, Deqian Kong, T. Simunic
Brain-inspired Hyperdimensional (HD) computing emulates cognition tasks by computing with hypervectors rather than traditional numerical values. In HD, an encoder maps inputs to high dimensional vectors (hypervectors) and combines them to generate a model for each existing class. During inference, HD performs the task of reasoning by looking for similarities of the input hypervector and each pre-stored class hypervector However, there is not a unique encoding in HD which can perfectly map inputs to hypervectors. This results in low HD classification accuracy over complex tasks such as speech recognition. In this paper we propose MHD, a multi-encoder hierarchical classifier, which enables HD to take full advantages of multiple encoders without increasing the cost of classification. MHD consists of two HD stages: a main stage and a decider stage. The main stage makes use of multiple classifiers with different encoders to classify a wide range of input data. Each classifier in the main stage can trade between efficiency and accuracy by dynamically varying the hypervectors’ dimensions. The decider stage, located before the main stage, learns the difficulty of the input data and selects an encoder within the main stage that will provide the maximum accuracy, while also maximizing the efficiency of the classification task. We test the accuracy/efficiency of the proposed MHD on speech recognition application. Our evaluation shows that MHD can provide a 6.6× improvement in energy efficiency and a 6.3× speedup, as compared to baseline single level HD.
脑启发的超维计算(HD)通过计算超向量而不是传统的数值来模拟认知任务。在HD中,编码器将输入映射到高维向量(超向量),并将它们组合为每个现有类生成模型。在推理过程中,HD通过寻找输入超向量和每个预先存储的类超向量的相似性来执行推理任务,然而,HD中没有唯一的编码可以完美地将输入映射到超向量。这导致在复杂任务(如语音识别)中高清分类精度较低。本文提出了一种多编码器分层分类器MHD,它可以在不增加分类成本的情况下充分利用多个编码器的优势。MHD由两个HD阶段组成:主阶段和决定阶段。主阶段使用具有不同编码器的多个分类器对大范围的输入数据进行分类。主阶段的每个分类器都可以通过动态改变超向量的维数来在效率和精度之间进行权衡。决策阶段位于主阶段之前,它学习输入数据的难度,并在主阶段内选择一个编码器,该编码器将提供最大的精度,同时也将分类任务的效率最大化。我们在语音识别应用中测试了所提出的MHD的准确性和效率。我们的评估表明,与基线单级HD相比,MHD可以提供6.6倍的能源效率改进和6.3倍的加速。
{"title":"Hierarchical Hyperdimensional Computing for Energy Efficient Classification","authors":"M. Imani, Chenyu Huang, Deqian Kong, T. Simunic","doi":"10.1145/3195970.3196060","DOIUrl":"https://doi.org/10.1145/3195970.3196060","url":null,"abstract":"Brain-inspired Hyperdimensional (HD) computing emulates cognition tasks by computing with hypervectors rather than traditional numerical values. In HD, an encoder maps inputs to high dimensional vectors (hypervectors) and combines them to generate a model for each existing class. During inference, HD performs the task of reasoning by looking for similarities of the input hypervector and each pre-stored class hypervector However, there is not a unique encoding in HD which can perfectly map inputs to hypervectors. This results in low HD classification accuracy over complex tasks such as speech recognition. In this paper we propose MHD, a multi-encoder hierarchical classifier, which enables HD to take full advantages of multiple encoders without increasing the cost of classification. MHD consists of two HD stages: a main stage and a decider stage. The main stage makes use of multiple classifiers with different encoders to classify a wide range of input data. Each classifier in the main stage can trade between efficiency and accuracy by dynamically varying the hypervectors’ dimensions. The decider stage, located before the main stage, learns the difficulty of the input data and selects an encoder within the main stage that will provide the maximum accuracy, while also maximizing the efficiency of the classification task. We test the accuracy/efficiency of the proposed MHD on speech recognition application. Our evaluation shows that MHD can provide a 6.6× improvement in energy efficiency and a 6.3× speedup, as compared to baseline single level HD.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"11 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84357135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors 补偿-深度神经网络:基于量化误差补偿的高能效低精度深度神经网络
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196012
Shubham Jain, Swagath Venkataramani, V. Srinivasan, Jungwook Choi, P. Chuang, Le Chang
Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, especially for energy-constrained inference tasks at the edge using wearable and IoT devices. One promising approach to alleviate the computational challenges is implementing DNNs using low-precision fixed point (<16 bits) representation. However, the quantization error inherent in any Fixed Point (FxP) implementation limits the choice of bit-widths to maintain application-level accuracy. Prior efforts recommend increasing the network size and/or re-training the DNN to minimize loss due to quantization, albeit with limited success.Complementary to the above approaches, we present Compensated-DNN, wherein we propose to dynamically compensate the error introduced due to quantization during execution. To this end, we introduce a new fixed-point representation viz. Fixed Point with Error Compensation (FPEC). The bits in FPEC are split between computation bits vs. compensation bits. The computation bits use conventional FxP notation to represent the number at low-precision. On the other hand, the compensation bits (1 or 2 bits at most) explicitly capture an estimate (direction and magnitude) of the quantization error in the representation. For a given word length, since FPEC uses fewer computation bits compared to FxP representation, we achieve a near-quadratic improvement in energy in the multiply-and-accumulate (MAC) operations. The compensation bits are simultaneously used by a low-overhead sparse compensation scheme to estimate the error accrued during MAC operations, which is then added to the MAC output to minimize the impact of quantization. We build compensated-DNNs for 7 popular image recognition benchmarks with 0.05–20.5 million neurons and 0.01–15.5 billion connections. Based on gate-level analysis at 14nm technology, we achieve 2.65 × –4.88 × and 1.13 × –1.7 × improvement in energy compared to 16-bit and 8-bit FxP implementations respectively, while maintaining <0.5% loss in classification accuracy.
深度神经网络(dnn)在涉及图像、视频、文本和自然语言的许多人工智能(AI)任务中代表了最先进的技术。它们的普遍采用受到深度神经网络的高计算和存储要求的限制,特别是对于使用可穿戴设备和物联网设备的边缘能量受限的推理任务。缓解计算挑战的一个有希望的方法是使用低精度定点(<16位)表示实现dnn。然而,任何固定点(FxP)实现中固有的量化误差限制了比特宽度的选择,以保持应用级精度。之前的研究建议增加网络规模和/或重新训练深度神经网络,以尽量减少量化造成的损失,尽管收效甚微。作为上述方法的补充,我们提出了补偿深度神经网络,其中我们建议动态补偿由于执行过程中量化而引入的误差。为此,我们引入了一种新的不动点表示,即误差补偿不动点(FPEC)。FPEC中的位分为计算位和补偿位。计算位使用传统的FxP符号来表示低精度的数字。另一方面,补偿位(最多1或2位)明确地捕获表示中量化误差的估计(方向和幅度)。对于给定的单词长度,由于FPEC比FxP表示使用更少的计算位,因此我们在乘法累加(MAC)操作中实现了接近二次的能量改进。补偿位同时用于低开销的稀疏补偿方案来估计MAC操作期间累积的误差,然后将其添加到MAC输出中以最小化量化的影响。我们为7个流行的图像识别基准构建了补偿dnn,其中包含0.05 - 2050万个神经元和0.01 - 155亿个连接。基于14纳米技术的门级分析,与16位和8位FxP实现相比,我们分别实现了2.65 × -4.88 ×和1.13 × -1.7 ×的能量提升,同时保持了<0.5%的分类精度损失。
{"title":"Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors","authors":"Shubham Jain, Swagath Venkataramani, V. Srinivasan, Jungwook Choi, P. Chuang, Le Chang","doi":"10.1145/3195970.3196012","DOIUrl":"https://doi.org/10.1145/3195970.3196012","url":null,"abstract":"Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, especially for energy-constrained inference tasks at the edge using wearable and IoT devices. One promising approach to alleviate the computational challenges is implementing DNNs using low-precision fixed point (<16 bits) representation. However, the quantization error inherent in any Fixed Point (FxP) implementation limits the choice of bit-widths to maintain application-level accuracy. Prior efforts recommend increasing the network size and/or re-training the DNN to minimize loss due to quantization, albeit with limited success.Complementary to the above approaches, we present Compensated-DNN, wherein we propose to dynamically compensate the error introduced due to quantization during execution. To this end, we introduce a new fixed-point representation viz. Fixed Point with Error Compensation (FPEC). The bits in FPEC are split between computation bits vs. compensation bits. The computation bits use conventional FxP notation to represent the number at low-precision. On the other hand, the compensation bits (1 or 2 bits at most) explicitly capture an estimate (direction and magnitude) of the quantization error in the representation. For a given word length, since FPEC uses fewer computation bits compared to FxP representation, we achieve a near-quadratic improvement in energy in the multiply-and-accumulate (MAC) operations. The compensation bits are simultaneously used by a low-overhead sparse compensation scheme to estimate the error accrued during MAC operations, which is then added to the MAC output to minimize the impact of quantization. We build compensated-DNNs for 7 popular image recognition benchmarks with 0.05–20.5 million neurons and 0.01–15.5 billion connections. Based on gate-level analysis at 14nm technology, we achieve 2.65 × –4.88 × and 1.13 × –1.7 × improvement in energy compared to 16-bit and 8-bit FxP implementations respectively, while maintaining <0.5% loss in classification accuracy.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"96 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85301195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Test Cost Reduction for X-Value Elimination By Scan Slice Correlation Analysis 利用扫描片相关分析降低x值消除的测试成本
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196127
Hyunsu Chae, Joon-Sung Yang
X-values in test output responses corrupt an output response compaction and can cause a fault coverage loss. X-Masking and X-Canceling MISR methods have been suggested to eliminate X-values, however, there are control data volume and test time overhead issues. These issues become significant as the complexity and the density of the circuits increase. This paper proposes a method to eliminate X's by applying a scan slice granularity X-value correlation analysis. The proposed method exploits scan slice correlation analysis, determines unique control data for the scan slice groups sharing the same control data, and applies them for each scan slice. Hence, the volume of control data can be significantly reduced. The simulation results demonstrate that the proposed method achieves greater control data and test time reduction compared to the conventional methods, without loss of fault coverage.
测试输出响应中的x值破坏了输出响应压缩,并可能导致故障覆盖损失。已经建议使用X-Masking和x - cancellation MISR方法来消除x值,但是存在控制数据量和测试时间开销问题。随着电路的复杂性和密度的增加,这些问题变得越来越重要。本文提出了一种利用扫描片粒度X值相关分析来消除X的方法。该方法利用扫描切片相关性分析,为共享相同控制数据的扫描切片组确定唯一的控制数据,并将其应用于每个扫描切片。因此,控制数据的数量可以显著减少。仿真结果表明,与传统方法相比,该方法在不损失故障覆盖率的情况下,获得了更多的控制数据和测试时间。
{"title":"Test Cost Reduction for X-Value Elimination By Scan Slice Correlation Analysis","authors":"Hyunsu Chae, Joon-Sung Yang","doi":"10.1145/3195970.3196127","DOIUrl":"https://doi.org/10.1145/3195970.3196127","url":null,"abstract":"X-values in test output responses corrupt an output response compaction and can cause a fault coverage loss. X-Masking and X-Canceling MISR methods have been suggested to eliminate X-values, however, there are control data volume and test time overhead issues. These issues become significant as the complexity and the density of the circuits increase. This paper proposes a method to eliminate X's by applying a scan slice granularity X-value correlation analysis. The proposed method exploits scan slice correlation analysis, determines unique control data for the scan slice groups sharing the same control data, and applies them for each scan slice. Hence, the volume of control data can be significantly reduced. The simulation results demonstrate that the proposed method achieves greater control data and test time reduction compared to the conventional methods, without loss of fault coverage.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"2 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82722701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application Level Hardware Tracing for Scaling Post-Silicon Debug 用于扩展后硅调试的应用级硬件跟踪
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3195992
D. Pal, Abhishek Sharma, S. Ray, F. M. D. Paula, Shobha Vasudevan
We present a method for selecting trace messages for post-silicon validation of Systems-on-a-Chips (SoCs) with diverse usage scenarios. We model specifications of interacting flows in typical applications. Our method optimizes trace buffer utilization and flow specification coverage. We present debugging and root cause analysis of subtle bugs in the industry scale OpenSPARC T2 processor. We demonstrate that this scale is beyond the capacity of current tracing approaches. We achieve trace buffer utilization of 98.96% with a flow specification coverage of 94.3% (average). We localize bugs to 21.11% (average) of the potential root causes in our large-scale debugging effort.
我们提出了一种选择跟踪消息的方法,用于具有不同使用场景的片上系统(soc)的硅后验证。我们对典型应用程序中交互流的规范进行建模。我们的方法优化了跟踪缓冲区利用率和流规范覆盖率。本文介绍了工业规模OpenSPARC T2处理器中细微错误的调试和根本原因分析。我们证明了这种规模超出了当前跟踪方法的能力。我们实现了98.96%的跟踪缓冲区利用率,流规范覆盖率为94.3%(平均)。在大规模调试工作中,我们将bug定位为21.11%(平均)的潜在根本原因。
{"title":"Application Level Hardware Tracing for Scaling Post-Silicon Debug","authors":"D. Pal, Abhishek Sharma, S. Ray, F. M. D. Paula, Shobha Vasudevan","doi":"10.1145/3195970.3195992","DOIUrl":"https://doi.org/10.1145/3195970.3195992","url":null,"abstract":"We present a method for selecting trace messages for post-silicon validation of Systems-on-a-Chips (SoCs) with diverse usage scenarios. We model specifications of interacting flows in typical applications. Our method optimizes trace buffer utilization and flow specification coverage. We present debugging and root cause analysis of subtle bugs in the industry scale OpenSPARC T2 processor. We demonstrate that this scale is beyond the capacity of current tracing approaches. We achieve trace buffer utilization of 98.96% with a flow specification coverage of 94.3% (average). We localize bugs to 21.11% (average) of the potential root causes in our large-scale debugging effort.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"68 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78722216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Efficient and Reliable Power Delivery in Voltage-Stacked Manycore System with Hybrid Charge-Recycling Regulators 具有混合电荷回收调节器的电压堆叠多核系统高效可靠的电力输送
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196037
An Zou, Jingwen Leng, Xin He, Yazhou Zu, V. Reddi, Xuan Zhang
Voltage stacking (VS) fundamentally improves power delivery efficiency (PDE) by series-stacking multiple voltage domains to eliminate explicit step-down voltage conversion and reduce energy loss along the power delivery path. However, it suffers from aggravated supply noise, preventing its adoption in mainstream computing systems. In this paper, we investigate a practical approach to enabling efficient and reliable power delivery in voltage-stacked manycore systems that can ensure worst-case supply noise reliability without excessive costly over-design. We start by developing an analytical model to capture the essential noise behaviors in VS. It allows us to identify dominant noise contributor and derive the worst-case conditions. With this in-depth understanding, we propose a hybrid voltage regulation solution to effectively mitigate noise with worst-case guarantees. When evaluated with real-world benchmarks, our solution can achieve 93.8% power delivery efficiency, an improvement of 13.9% over the conventional baseline.
电压叠加(VS)通过串联叠加多个电压域,消除显式降压电压转换,降低功率传递路径上的能量损失,从根本上提高了功率传递效率(PDE)。然而,它受到严重的电源噪声的影响,阻碍了它在主流计算系统中的采用。在本文中,我们研究了一种在电压堆叠多核系统中实现高效可靠供电的实用方法,该方法可以确保最坏情况下供电噪声的可靠性,而无需过度昂贵的过度设计。我们首先开发一个分析模型来捕捉vs中的基本噪声行为,它使我们能够识别主要的噪声贡献者并得出最坏情况。有了深入的了解,我们提出了一种混合电压调节解决方案,以有效地降低噪音,并保证最坏情况。当使用实际基准进行评估时,我们的解决方案可以实现93.8%的功率传输效率,比传统基准提高13.9%。
{"title":"Efficient and Reliable Power Delivery in Voltage-Stacked Manycore System with Hybrid Charge-Recycling Regulators","authors":"An Zou, Jingwen Leng, Xin He, Yazhou Zu, V. Reddi, Xuan Zhang","doi":"10.1145/3195970.3196037","DOIUrl":"https://doi.org/10.1145/3195970.3196037","url":null,"abstract":"Voltage stacking (VS) fundamentally improves power delivery efficiency (PDE) by series-stacking multiple voltage domains to eliminate explicit step-down voltage conversion and reduce energy loss along the power delivery path. However, it suffers from aggravated supply noise, preventing its adoption in mainstream computing systems. In this paper, we investigate a practical approach to enabling efficient and reliable power delivery in voltage-stacked manycore systems that can ensure worst-case supply noise reliability without excessive costly over-design. We start by developing an analytical model to capture the essential noise behaviors in VS. It allows us to identify dominant noise contributor and derive the worst-case conditions. With this in-depth understanding, we propose a hybrid voltage regulation solution to effectively mitigate noise with worst-case guarantees. When evaluated with real-world benchmarks, our solution can achieve 93.8% power delivery efficiency, an improvement of 13.9% over the conventional baseline.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"7 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78876127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Approximation-Aware Coordinated Power/Performance Management for Heterogeneous Multi-cores 异构多核近似感知协调功率/性能管理
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3195994
A. Kanduri, A. Miele, A. Rahmani, P. Liljeberg, C. Bolchini, N. Dutt
Run-time resource management of heterogeneous multi-core systems is challenging due to i) dynamic workloads, that often result in ii) conflicting knob actuation decisions, which potentially iii) compromise on performance for thermal safety. We present a runtime resource management strategy for performance guarantees under power constraints using functionally approximate kernels that exploit accuracy-performance trade-offs within error resilient applications. Our controller integrates approximation with power knobs - DVFS, CPU quota, task migration - in coordinated manner to make performance-aware decisions on power management under variable workloads. Experimental results on Odroid XU3 show the effectiveness of this strategy in meeting performance requirements without power violations compared to existing solutions.
异构多核系统的运行时资源管理具有挑战性,因为i)动态工作负载,这通常会导致ii)冲突的旋钮驱动决策,这可能会iii)损害热安全性能。我们提出了一种运行时资源管理策略,在功率限制下使用功能近似的内核来保证性能,该内核在具有错误弹性的应用程序中利用精度和性能之间的权衡。我们的控制器以协调的方式将近似与功率旋钮(DVFS, CPU配额,任务迁移)集成在一起,以便在可变工作负载下对电源管理做出性能敏感的决策。在Odroid XU3上的实验结果表明,与现有的解决方案相比,该策略在满足性能要求的同时没有功率违规。
{"title":"Approximation-Aware Coordinated Power/Performance Management for Heterogeneous Multi-cores","authors":"A. Kanduri, A. Miele, A. Rahmani, P. Liljeberg, C. Bolchini, N. Dutt","doi":"10.1145/3195970.3195994","DOIUrl":"https://doi.org/10.1145/3195970.3195994","url":null,"abstract":"Run-time resource management of heterogeneous multi-core systems is challenging due to i) dynamic workloads, that often result in ii) conflicting knob actuation decisions, which potentially iii) compromise on performance for thermal safety. We present a runtime resource management strategy for performance guarantees under power constraints using functionally approximate kernels that exploit accuracy-performance trade-offs within error resilient applications. Our controller integrates approximation with power knobs - DVFS, CPU quota, task migration - in coordinated manner to make performance-aware decisions on power management under variable workloads. Experimental results on Odroid XU3 show the effectiveness of this strategy in meeting performance requirements without power violations compared to existing solutions.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"12 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84956573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
An Efficient Bayesian Yield Estimation Method for High Dimensional and High Sigma SRAM Circuits 高维高西格玛SRAM电路的高效贝叶斯良率估计方法
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3195987
Jinyuan Zhai, Changhao Yan, Sheng-Guo Wang, Dian Zhou
With increasing dimension of variation space and computational intensive circuit simulation, accurate and fast yield estimation of realistic SRAM chip remains a significant and complicated challenge. In this paper, du Experiment results show that the proposed method has an almost constant time complexity as the dimension increases, and gains 6× speedup over the state-of-the- art method in the 485D cases.
随着变化空间维度的增加和计算密集型电路仿真,准确快速地估计实际SRAM芯片的成品率仍然是一个重大而复杂的挑战。实验结果表明,随着维数的增加,该方法具有几乎恒定的时间复杂度,并且在485D情况下,该方法的速度比目前的方法提高了6倍。
{"title":"An Efficient Bayesian Yield Estimation Method for High Dimensional and High Sigma SRAM Circuits","authors":"Jinyuan Zhai, Changhao Yan, Sheng-Guo Wang, Dian Zhou","doi":"10.1145/3195970.3195987","DOIUrl":"https://doi.org/10.1145/3195970.3195987","url":null,"abstract":"With increasing dimension of variation space and computational intensive circuit simulation, accurate and fast yield estimation of realistic SRAM chip remains a significant and complicated challenge. In this paper, du Experiment results show that the proposed method has an almost constant time complexity as the dimension increases, and gains 6× speedup over the state-of-the- art method in the 485D cases.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85063142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Data Prediction for Response Flows in Packet Processing Cache 包处理缓存中响应流的数据预测
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196021
Hayato Yamaki, H. Nishi, Shinobu Miwa, H. Honda
We propose a technique to reduce compulsory misses of packet processing cache (PPC), which largely affects both throughput and energy of core routers. Rather than prefetching data, our technique called response prediction cache (RPC) speculatively stores predicted data into PPC without additional access to the low-throughput and power-consuming memory (i.e., TCAM). RPC predicts the data related to a response flow at the arrival of the corresponding request flow, based on the request-response model of internet communications. RPC can improve the cache miss rate, throughput, and energy-efficiency of PPC systems by 15.3%, 17.9%, and 17.8%, respectively.
本文提出了一种减少分组处理缓存(PPC)的强制缺失的技术,这在很大程度上影响了核心路由器的吞吐量和能量。我们称为响应预测缓存(RPC)的技术不是预取数据,而是推测性地将预测的数据存储到PPC中,而无需额外访问低吞吐量和功耗内存(即TCAM)。RPC基于internet通信的请求-响应模型,在相应的请求流到达时预测与响应流相关的数据。RPC可以将PPC系统的缓存丢失率、吞吐量和能源效率分别提高15.3%、17.9%和17.8%。
{"title":"Data Prediction for Response Flows in Packet Processing Cache","authors":"Hayato Yamaki, H. Nishi, Shinobu Miwa, H. Honda","doi":"10.1145/3195970.3196021","DOIUrl":"https://doi.org/10.1145/3195970.3196021","url":null,"abstract":"We propose a technique to reduce compulsory misses of packet processing cache (PPC), which largely affects both throughput and energy of core routers. Rather than prefetching data, our technique called response prediction cache (RPC) speculatively stores predicted data into PPC without additional access to the low-throughput and power-consuming memory (i.e., TCAM). RPC predicts the data related to a response flow at the arrival of the corresponding request flow, based on the request-response model of internet communications. RPC can improve the cache miss rate, throughput, and energy-efficiency of PPC systems by 15.3%, 17.9%, and 17.8%, respectively.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"13 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83401202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1