首页 > 最新文献

2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
Design of an ultra-low power device for aircraft structural health monitoring 飞机结构健康监测超低功率装置的设计
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.236
A. Perelli, Carlo Caione, L. Marchi, D. Brunelli, A. Marzani, L. Benini
One of the popular structural health monitoring (SHM) applications of both automotive and aeronautic fields is devoted to the non-destructive localization of impacts in plate-like structures. The aim of this paper is to develop a miniaturized, self-contained and low power device for automated impact detection that can be used in a distributed fashion without central coordination. The proposed device uses an array of four piezoelectric transducers, bonded to the plate, capable to detect the guided waves generated by an impact, to a STM32F4 board equipped with an ARM Cortex-M4 microcontroller and a IEEE802.15.4 wireless transceiver. The waves processing and the localization algorithm are implemented on-board and optimized for speed and power consumption. In particular, the localization of the impact point is obtained by cross-correlating the signals related to the same event acquired by the different sensors in the warped frequency domain. Finally the performance of the whole system is analysed in terms of localization accuracy and power consumption, showing the effectiveness of the proposed implementation.
在汽车和航空领域,结构健康监测(SHM)的一个热门应用是致力于板状结构冲击的无损定位。本文的目的是开发一种小型化、自成一体、低功耗的自动冲击检测设备,可以在没有中央协调的情况下以分布式方式使用。该装置使用了一个由四个压电换能器组成的阵列,连接在板上,能够检测由撞击产生的导波,并将其连接到配备ARM Cortex-M4微控制器和IEEE802.15.4无线收发器的STM32F4板上。波处理和定位算法在机载上实现,并对速度和功耗进行了优化。特别是,通过在翘曲频域内将不同传感器采集到的与同一事件相关的信号进行交叉相关来获得撞击点的定位。最后从定位精度和功耗两方面对整个系统的性能进行了分析,证明了所提实现的有效性。
{"title":"Design of an ultra-low power device for aircraft structural health monitoring","authors":"A. Perelli, Carlo Caione, L. Marchi, D. Brunelli, A. Marzani, L. Benini","doi":"10.7873/DATE.2013.236","DOIUrl":"https://doi.org/10.7873/DATE.2013.236","url":null,"abstract":"One of the popular structural health monitoring (SHM) applications of both automotive and aeronautic fields is devoted to the non-destructive localization of impacts in plate-like structures. The aim of this paper is to develop a miniaturized, self-contained and low power device for automated impact detection that can be used in a distributed fashion without central coordination. The proposed device uses an array of four piezoelectric transducers, bonded to the plate, capable to detect the guided waves generated by an impact, to a STM32F4 board equipped with an ARM Cortex-M4 microcontroller and a IEEE802.15.4 wireless transceiver. The waves processing and the localization algorithm are implemented on-board and optimized for speed and power consumption. In particular, the localization of the impact point is obtained by cross-correlating the signals related to the same event acquired by the different sensors in the warped frequency domain. Finally the performance of the whole system is analysed in terms of localization accuracy and power consumption, showing the effectiveness of the proposed implementation.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75435008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Synchronizing code execution on ultra-low-power embedded multi-channel signal analysis platforms 超低功耗嵌入式多通道信号分析平台上的同步代码执行
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.090
A. Dogan, R. Braojos, J. Constantin, G. Ansaloni, A. Burg, David Atienza Alonso
Embedded biosignal analysis involves a considerable amount of parallel computations, which can be exploited by employing low-voltage and ultra-low-power (ULP) parallel computing architectures. By allowing data and instruction broadcasting, single instruction multiple data (SIMD) processing paradigm enables considerable power savings and application speedup, in turn allowing for a lower voltage supply for a given workload. The state-of-the-art multi-core architectures for biosignal analysis however lack a bare, yet smart, synchronization technique among the cores, allowing lockstep execution of algorithm parts that can be performed using the SIMD, even in the presence of data-dependent execution flows. In this paper, we propose a lightweight synchronization technique to enhance an ULP multi-core processor, resulting in improved energy efficiency through lockstep SIMD execution. Our results show that the proposed improvements accomplish tangible power savings, up to 64% for an 8-core system operating at a workload of 89 MOps/s while exploiting voltage scaling.
嵌入式生物信号分析涉及大量的并行计算,可以通过采用低电压和超低功耗(ULP)并行计算架构来利用。通过允许数据和指令广播,单指令多数据(SIMD)处理范式可以显著节省功耗和提高应用程序速度,从而为给定的工作负载提供更低的电压。然而,用于生物信号分析的最先进的多核架构在核心之间缺乏一种简单而智能的同步技术,即使在存在依赖数据的执行流的情况下,也不能使用SIMD执行算法部分的同步执行。在本文中,我们提出了一种轻量级同步技术来增强ULP多核处理器,从而通过同步执行SIMD来提高能源效率。我们的结果表明,所提出的改进实现了切实的节能,在利用电压缩放的情况下,在工作负载为89 MOps/s的8核系统中,节能高达64%。
{"title":"Synchronizing code execution on ultra-low-power embedded multi-channel signal analysis platforms","authors":"A. Dogan, R. Braojos, J. Constantin, G. Ansaloni, A. Burg, David Atienza Alonso","doi":"10.7873/DATE.2013.090","DOIUrl":"https://doi.org/10.7873/DATE.2013.090","url":null,"abstract":"Embedded biosignal analysis involves a considerable amount of parallel computations, which can be exploited by employing low-voltage and ultra-low-power (ULP) parallel computing architectures. By allowing data and instruction broadcasting, single instruction multiple data (SIMD) processing paradigm enables considerable power savings and application speedup, in turn allowing for a lower voltage supply for a given workload. The state-of-the-art multi-core architectures for biosignal analysis however lack a bare, yet smart, synchronization technique among the cores, allowing lockstep execution of algorithm parts that can be performed using the SIMD, even in the presence of data-dependent execution flows. In this paper, we propose a lightweight synchronization technique to enhance an ULP multi-core processor, resulting in improved energy efficiency through lockstep SIMD execution. Our results show that the proposed improvements accomplish tangible power savings, up to 64% for an 8-core system operating at a workload of 89 MOps/s while exploiting voltage scaling.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73338611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
HW-SW integration for energy-efficient/variability-aware computing HW-SW集成节能/可变性感知计算
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.133
Gasser Ayad, A. Acquaviva, E. Macii, Brahim Sahbi, R. Lemaire
Recent trends in embedded system architectures brought a rapid shift towards multicore, heterogeneous and reconfigurable platforms. This imposes a large effort for programmers to develop their applications to efficiently exploit the underlying architecture. In addition, process variability issues lead to performance and power uncertainties, impacting expected quality of service and energy efficiency of the running software. In particular, variability may lead to sub-optimal runtime task allocation.
嵌入式系统架构的最新趋势带来了向多核、异构和可重构平台的快速转变。这给程序员增加了很大的工作量来开发他们的应用程序,以便有效地利用底层体系结构。此外,过程可变性问题会导致性能和功率的不确定性,从而影响预期的服务质量和运行软件的能源效率。特别是,可变性可能导致次优的运行时任务分配。
{"title":"HW-SW integration for energy-efficient/variability-aware computing","authors":"Gasser Ayad, A. Acquaviva, E. Macii, Brahim Sahbi, R. Lemaire","doi":"10.7873/DATE.2013.133","DOIUrl":"https://doi.org/10.7873/DATE.2013.133","url":null,"abstract":"Recent trends in embedded system architectures brought a rapid shift towards multicore, heterogeneous and reconfigurable platforms. This imposes a large effort for programmers to develop their applications to efficiently exploit the underlying architecture. In addition, process variability issues lead to performance and power uncertainties, impacting expected quality of service and energy efficiency of the running software. In particular, variability may lead to sub-optimal runtime task allocation.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74191863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Thermomechanical stress-aware management for 3D IC designs 三维集成电路设计的热机械应力感知管理
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.260
Qiaosha Zou, Zhang Tao, E. Kursun, Yuan Xie
The thermomechanical stress has been considered as one of the most challenging problems in three-dimensional integration circuits (3D ICs), due to the thermal expansion coefficient mismatch between the through-silicon vias (TSVs) and silicon substrate, and the presence of elevated thermal gradients. To address the stress issue, we propose a thorough solution that combines design-time and run-time techniques for the relief of thermomechanical stress and the associated reliability issues. A sophisticated TSV stress-aware floorplan policy is proposed to minimize the possibility of wafer cracking and interfacial delamination. In addition, the run-time thermal management scheme effectively eliminates large thermal gradients between layers. The experimental results show that the reliability of 3D design can be significantly improved due to the reduced TSV thermal load and the elimination of mechanical damaging thermal cycling pattern.
由于硅通孔(tsv)与硅衬底之间的热膨胀系数不匹配以及热梯度升高,热机械应力一直被认为是三维集成电路(3D ic)中最具挑战性的问题之一。为了解决应力问题,我们提出了一种结合设计时和运行时技术的彻底解决方案,以减轻热机械应力和相关的可靠性问题。提出了一种复杂的TSV应力敏感平面图策略,以最大限度地减少晶圆开裂和界面分层的可能性。此外,运行时热管理方案有效地消除了层间较大的热梯度。实验结果表明,由于降低了TSV热负荷,消除了机械损伤热循环模式,可以显著提高三维设计的可靠性。
{"title":"Thermomechanical stress-aware management for 3D IC designs","authors":"Qiaosha Zou, Zhang Tao, E. Kursun, Yuan Xie","doi":"10.7873/DATE.2013.260","DOIUrl":"https://doi.org/10.7873/DATE.2013.260","url":null,"abstract":"The thermomechanical stress has been considered as one of the most challenging problems in three-dimensional integration circuits (3D ICs), due to the thermal expansion coefficient mismatch between the through-silicon vias (TSVs) and silicon substrate, and the presence of elevated thermal gradients. To address the stress issue, we propose a thorough solution that combines design-time and run-time techniques for the relief of thermomechanical stress and the associated reliability issues. A sophisticated TSV stress-aware floorplan policy is proposed to minimize the possibility of wafer cracking and interfacial delamination. In addition, the run-time thermal management scheme effectively eliminates large thermal gradients between layers. The experimental results show that the reliability of 3D design can be significantly improved due to the reduced TSV thermal load and the elimination of mechanical damaging thermal cycling pattern.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75088166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
An enhanced double-TSV scheme for defect tolerance in 3D-IC 3D-IC缺陷容限的增强双tsv方案
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.302
Hsiu-Chuan Shih, Cheng-Wen Wu
Die stacking based on Through-Silicon Via (TSV) is considered as an efficient way to reducing power consumption and form factor. In the current stage, the failure rate of TSV is still high, so some type of defect tolerance scheme is required. Meanwhile, the concept of double-via, which is normally used in traditional layer to layer interconnection, can be one of the feasible tolerance schemes. Double-via/TSV has a benefit compared to TSV repair: it can eliminate the fuse configuration procedure as well as the fuse layer. However, double-TSV has a problem of signal degradation and leakage caused by short defects. In this work, an enhanced scheme for double-TSV is proposed to solve the short-defect problem through signal path division and VDD isolation. Result shows that the enhanced double-TSV can tolerate both open and short defects, with reasonable area and timing overhead.
基于硅通孔(TSV)的芯片堆叠被认为是一种有效的降低功耗和外形尺寸的方法。现阶段,TSV的故障率仍然很高,因此需要某种缺陷容限方案。同时,传统层对层互连中常用的双通孔概念也可以作为一种可行的容差方案。与TSV维修相比,双通孔/TSV有一个好处:它可以省去保险丝的配置程序以及保险丝层。然而,双tsv存在短缺陷引起的信号退化和泄漏问题。本文提出了一种改进的双tsv方案,通过信号路径分割和VDD隔离来解决短缺陷问题。结果表明,改进后的双tsv既能容忍开路缺陷,又能容忍短路缺陷,且具有合理的面积和时序开销。
{"title":"An enhanced double-TSV scheme for defect tolerance in 3D-IC","authors":"Hsiu-Chuan Shih, Cheng-Wen Wu","doi":"10.7873/DATE.2013.302","DOIUrl":"https://doi.org/10.7873/DATE.2013.302","url":null,"abstract":"Die stacking based on Through-Silicon Via (TSV) is considered as an efficient way to reducing power consumption and form factor. In the current stage, the failure rate of TSV is still high, so some type of defect tolerance scheme is required. Meanwhile, the concept of double-via, which is normally used in traditional layer to layer interconnection, can be one of the feasible tolerance schemes. Double-via/TSV has a benefit compared to TSV repair: it can eliminate the fuse configuration procedure as well as the fuse layer. However, double-TSV has a problem of signal degradation and leakage caused by short defects. In this work, an enhanced scheme for double-TSV is proposed to solve the short-defect problem through signal path division and VDD isolation. Result shows that the enhanced double-TSV can tolerate both open and short defects, with reasonable area and timing overhead.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72616154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A gate level methodology for efficient statistical leakage estimation in complex 32nm circuits 一种用于复杂32nm电路中有效统计泄漏估计的门级方法
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.221
S. Joshi, A. Lombardot, M. Belleville, E. Beigné, S. Girard
A fast and accurate statistical method that estimates at gate level the leakage power consumption of CMOS digital circuits is demonstrated. Means, variances and correlations of logic gate leakages are extracted at library characterization step, and used for subsequent circuit statistical computation. In this paper, the methodology is applied to an eleven thousand cells ST test IP. The circuit leakage analysis computation time is 400 times faster than a single fast-Spice corner analysis, while providing coherent results.
给出了一种快速准确的CMOS数字电路栅极漏功耗统计方法。在库表征步骤中提取逻辑门泄漏的均值、方差和相关性,并用于后续的电路统计计算。在本文中,该方法应用于11000个单元的ST测试IP。电路泄漏分析的计算时间比单个fast-Spice拐角分析快400倍,同时提供连贯的结果。
{"title":"A gate level methodology for efficient statistical leakage estimation in complex 32nm circuits","authors":"S. Joshi, A. Lombardot, M. Belleville, E. Beigné, S. Girard","doi":"10.7873/DATE.2013.221","DOIUrl":"https://doi.org/10.7873/DATE.2013.221","url":null,"abstract":"A fast and accurate statistical method that estimates at gate level the leakage power consumption of CMOS digital circuits is demonstrated. Means, variances and correlations of logic gate leakages are extracted at library characterization step, and used for subsequent circuit statistical computation. In this paper, the methodology is applied to an eleven thousand cells ST test IP. The circuit leakage analysis computation time is 400 times faster than a single fast-Spice corner analysis, while providing coherent results.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74955449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An efficient and flexible hardware support for accelerating synchronization operations on the STHORM many-core architecture 为加速STHORM多核架构上的同步操作提供高效灵活的硬件支持
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.119
F. Thabet, Yves Lhuillier, Caaliph Andriamisaina, Jean-Marc Philippe, R. David
The current trend in embedded computing consists in increasing the number of processing resources on a chip. Following this paradigm, the STMicroelectronics/CEA Platform 2012 (P2012) project designed an area- and power-efficient many-core accelerator as an answer to the needs of computing power of next-generation data-intensive embedded applications. Synchronization handling on this architecture was critical since speed-ups of parallel implementations of embedded applications strongly depend on the ability to exploit the largest possible number of cores while limiting task management overhead. This paper presents the HardWare Synchronizer (HWS), a flexible hardware accelerator for synchronization operations in the P2012 architecture. Experiments on a multi-core test chip showed that the HWS has less than 1% area overhead while reducing synchronization latencies (up to 2.8 times) and contentions.
当前嵌入式计算的趋势是增加芯片上的处理资源数量。根据这种模式,意法半导体/CEA平台2012 (P2012)项目设计了一种面积和功耗都很低的多核加速器,以满足下一代数据密集型嵌入式应用对计算能力的需求。这种体系结构上的同步处理非常关键,因为嵌入式应用程序并行实现的加速在很大程度上依赖于在限制任务管理开销的同时利用尽可能多的核心的能力。本文介绍了硬件同步器(HardWare Synchronizer, HWS),它是P2012体系结构中用于同步操作的一种灵活的硬件加速器。在多核测试芯片上的实验表明,HWS的面积开销小于1%,同时减少了同步延迟(高达2.8倍)和争用。
{"title":"An efficient and flexible hardware support for accelerating synchronization operations on the STHORM many-core architecture","authors":"F. Thabet, Yves Lhuillier, Caaliph Andriamisaina, Jean-Marc Philippe, R. David","doi":"10.7873/DATE.2013.119","DOIUrl":"https://doi.org/10.7873/DATE.2013.119","url":null,"abstract":"The current trend in embedded computing consists in increasing the number of processing resources on a chip. Following this paradigm, the STMicroelectronics/CEA Platform 2012 (P2012) project designed an area- and power-efficient many-core accelerator as an answer to the needs of computing power of next-generation data-intensive embedded applications. Synchronization handling on this architecture was critical since speed-ups of parallel implementations of embedded applications strongly depend on the ability to exploit the largest possible number of cores while limiting task management overhead. This paper presents the HardWare Synchronizer (HWS), a flexible hardware accelerator for synchronization operations in the P2012 architecture. Experiments on a multi-core test chip showed that the HWS has less than 1% area overhead while reducing synchronization latencies (up to 2.8 times) and contentions.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77648768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Dynamic configuration prefetching based on piecewise linear prediction 基于分段线性预测的动态配置预取
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.173
A. Lifa, P. Eles, Zebo Peng
Modern systems demand high performance, as well as high degrees of flexibility and adaptability. Many current applications exhibit a dynamic and nonstationary behavior, having certain characteristics in one phase of their execution, that will change as the applications enter new phases, in a manner unpredictable at design-time. In order to meet the performance requirements of such systems, it is important to have on-line optimization algorithms, coupled with adaptive hardware platforms, that together can adjust to the run-time conditions. We propose an optimization technique that minimizes the expected execution time of an application by dynamically scheduling hardware prefetches. We use a piecewise linear predictor in order to capture correlations and predict the hardware modules to be reached. Experiments show that the proposed algorithm outperforms the previous state-of-art in reducing the expected execution time by up to 27% on average.
现代系统要求高性能,以及高度的灵活性和适应性。许多当前的应用程序表现出动态和非平稳的行为,在其执行的一个阶段具有某些特征,随着应用程序进入新阶段,这些特征将以设计时不可预测的方式发生变化。为了满足此类系统的性能要求,重要的是要有在线优化算法,并结合自适应硬件平台,共同适应运行时条件。我们提出了一种优化技术,通过动态调度硬件预取来最小化应用程序的预期执行时间。我们使用分段线性预测器来捕获相关性并预测要达到的硬件模块。实验表明,该算法的预期执行时间平均减少27%,优于现有算法。
{"title":"Dynamic configuration prefetching based on piecewise linear prediction","authors":"A. Lifa, P. Eles, Zebo Peng","doi":"10.7873/DATE.2013.173","DOIUrl":"https://doi.org/10.7873/DATE.2013.173","url":null,"abstract":"Modern systems demand high performance, as well as high degrees of flexibility and adaptability. Many current applications exhibit a dynamic and nonstationary behavior, having certain characteristics in one phase of their execution, that will change as the applications enter new phases, in a manner unpredictable at design-time. In order to meet the performance requirements of such systems, it is important to have on-line optimization algorithms, coupled with adaptive hardware platforms, that together can adjust to the run-time conditions. We propose an optimization technique that minimizes the expected execution time of an application by dynamically scheduling hardware prefetches. We use a piecewise linear predictor in order to capture correlations and predict the hardware modules to be reached. Experiments show that the proposed algorithm outperforms the previous state-of-art in reducing the expected execution time by up to 27% on average.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78142709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Substitute-and-simplify: A unified design paradigm for approximate and quality configurable circuits 替换和简化:近似和高质量可配置电路的统一设计范例
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.280
Swagath Venkataramani, K. Roy, A. Raghunathan
Many applications are inherently resilient to inexactness or approximations in their underlying computations. Approximate circuit design is an emerging paradigm that exploits this inherent resilience to realize hardware implementations that are highly efficient in energy or performance. In this work, we propose Substitute-And-SIMplIfy (SASIMI), a new systematic approach to the design and synthesis of approximate circuits. The key insight behind SASIMI is to identify signal pairs in the circuit that assume the same value with high probability, and substitute one for the other. While these substitutions introduce functional approximations, if performed judiciously, they result in some logic to be eliminated from the circuit while also enabling downsizing of gates on critical paths (simplification), resulting in significant power savings. We propose an automatic synthesis framework that performs substitution and simplification iteratively, while ensuring that a user-specified quality constraint is satisfied. We extend the proposed framework to perform automatic synthesis of quality configurable circuits that can dynamically operate at different accuracy levels depending on application requirements. We used SASIMI to automatically synthesize approximate and quality configurable implementations of a wide range of arithmetic units (Adders, Multipliers, MAC), complex data paths (SAD, FFT butterfly, Euclidean distance) and ISCAS85 benchmarks, using various error metrics such as error rate and average error magnitude. The synthesized approximate circuits demonstrate power improvements of 10%–28% for tight error constraints, and 30%–60% for relaxed error constraints. The quality configurable circuits obtain between 14%–40% improvement in energy in the approximate mode, while incurring no energy overheads in the accurate mode.
许多应用程序对其底层计算中的不精确或近似具有固有的弹性。近似电路设计是一种新兴的范例,它利用这种固有的弹性来实现在能源或性能上高效的硬件实现。在这项工作中,我们提出了替代和简化(SASIMI),一种新的系统方法来设计和合成近似电路。SASIMI背后的关键见解是识别电路中假设相同值的高概率信号对,并用一个替换另一个。虽然这些替换引入了功能近似,但如果执行得当,它们会导致从电路中消除一些逻辑,同时还可以缩小关键路径上的门(简化),从而显著节省功耗。我们提出了一个自动合成框架,迭代地执行替换和简化,同时确保满足用户指定的质量约束。我们扩展了所提出的框架,以执行高质量可配置电路的自动合成,这些电路可以根据应用需求以不同的精度水平动态运行。我们使用SASIMI自动合成各种算术单元(加法器,乘数器,MAC),复杂数据路径(SAD, FFT蝴蝶,欧氏距离)和ISCAS85基准的近似和质量可配置实现,使用各种错误度量,如错误率和平均错误幅度。综合近似电路在严格误差约束下的功耗提高了10% ~ 28%,在宽松误差约束下的功耗提高了30% ~ 60%。质量可配置电路在近似模式下获得14%-40%的能量改进,而在精确模式下不产生能量开销。
{"title":"Substitute-and-simplify: A unified design paradigm for approximate and quality configurable circuits","authors":"Swagath Venkataramani, K. Roy, A. Raghunathan","doi":"10.7873/DATE.2013.280","DOIUrl":"https://doi.org/10.7873/DATE.2013.280","url":null,"abstract":"Many applications are inherently resilient to inexactness or approximations in their underlying computations. Approximate circuit design is an emerging paradigm that exploits this inherent resilience to realize hardware implementations that are highly efficient in energy or performance. In this work, we propose Substitute-And-SIMplIfy (SASIMI), a new systematic approach to the design and synthesis of approximate circuits. The key insight behind SASIMI is to identify signal pairs in the circuit that assume the same value with high probability, and substitute one for the other. While these substitutions introduce functional approximations, if performed judiciously, they result in some logic to be eliminated from the circuit while also enabling downsizing of gates on critical paths (simplification), resulting in significant power savings. We propose an automatic synthesis framework that performs substitution and simplification iteratively, while ensuring that a user-specified quality constraint is satisfied. We extend the proposed framework to perform automatic synthesis of quality configurable circuits that can dynamically operate at different accuracy levels depending on application requirements. We used SASIMI to automatically synthesize approximate and quality configurable implementations of a wide range of arithmetic units (Adders, Multipliers, MAC), complex data paths (SAD, FFT butterfly, Euclidean distance) and ISCAS85 benchmarks, using various error metrics such as error rate and average error magnitude. The synthesized approximate circuits demonstrate power improvements of 10%–28% for tight error constraints, and 30%–60% for relaxed error constraints. The quality configurable circuits obtain between 14%–40% improvement in energy in the approximate mode, while incurring no energy overheads in the accurate mode.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79805112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 161
A 100 GOPS ASP based baseband processor for wireless communication 基于100 GOPS ASP的无线通信基带处理器
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.038
Zhu Ziyuan, Tang Shan, Su Yongtao, Han Juan, Sun Gang, S. Jinglin
This paper presents an ASP (application specific processor) with 512-bit SIMD (Single Instruction Multiple Data) and 192-bit VLIW (Very Long Instruction Word) architecture optimized for wireless baseband processing. It employs optimized architecture and address generation unit to accelerate the kernel algorithms. Based on the ASP, a multi-core baseband processor is developed which can work at 2×2 MIMO and 20 MHz physical bandwidth configuration for LTE inner receiver and meet requirements of Category 3 User Equipment (CAT3 UE). Furthermore, a silicon implementation of the baseband processor with 130nm CMOS technology is presented. Experimental results show that the baseband processor provides 100 GOPS computing ability at 117.6MHz.
本文提出了一种具有512位单指令多数据(SIMD)和192位超长指令字(VLIW)结构的ASP(应用专用处理器),该处理器对无线基带处理进行了优化。它采用优化的体系结构和地址生成单元来加速内核算法。在ASP的基础上,开发了一种多核基带处理器,可以在2×2 MIMO和20 MHz的LTE内接收机物理带宽配置下工作,满足CAT3 UE的要求。此外,还提出了一种基于130纳米CMOS技术的基带处理器的硅实现方案。实验结果表明,该基带处理器在117.6MHz下具有100gops的计算能力。
{"title":"A 100 GOPS ASP based baseband processor for wireless communication","authors":"Zhu Ziyuan, Tang Shan, Su Yongtao, Han Juan, Sun Gang, S. Jinglin","doi":"10.7873/DATE.2013.038","DOIUrl":"https://doi.org/10.7873/DATE.2013.038","url":null,"abstract":"This paper presents an ASP (application specific processor) with 512-bit SIMD (Single Instruction Multiple Data) and 192-bit VLIW (Very Long Instruction Word) architecture optimized for wireless baseband processing. It employs optimized architecture and address generation unit to accelerate the kernel algorithms. Based on the ASP, a multi-core baseband processor is developed which can work at 2×2 MIMO and 20 MHz physical bandwidth configuration for LTE inner receiver and meet requirements of Category 3 User Equipment (CAT3 UE). Furthermore, a silicon implementation of the baseband processor with 130nm CMOS technology is presented. Experimental results show that the baseband processor provides 100 GOPS computing ability at 117.6MHz.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80529055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1