首页 > 最新文献

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
Early design stage thermal evaluation and mitigation: The locomotiv architectural case 早期设计阶段热评估和缓解:机车建筑案例
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.327
Tanguy Sassolas, C. Sandionigi, Alexandre Guerre, Alexandre Aminot, P. Vivet, H. Boussetta, L. Ferro, N. Peltier
To offer more computing power to modern SoCs, transistors keep scaling in new technology nodes. Consequently, the power density is increasing, leading to higher thermal risks. Thermal issues need to be addressed as early as possible in the design flow, when the optimization opportunities are the highest. For early design stages, architects rely on virtual prototypes to model their designs' behavior with an adapted trade-off between accuracy and simulation speed. Unfortunately, accurate virtual prototypes fail to encompass thermal effects timescale. In this paper, we demonstrate that less accurate high-level architectural models, in conjunction with efficient power and thermal simulation tools, provide an adapted environment to analyze thermal issues and design software thermal mitigation solutions in the case of the Locomotiv MPSoC architecture.
为了给现代soc提供更多的计算能力,晶体管在新技术节点中不断扩展。因此,功率密度增加,导致更高的热风险。热问题需要在设计流程中尽早解决,此时优化机会最高。在早期设计阶段,建筑师依靠虚拟原型来模拟他们的设计行为,并在准确性和仿真速度之间进行适当的权衡。不幸的是,精确的虚拟原型不能包含热效应时间尺度。在本文中,我们展示了不太精确的高级架构模型,结合高效的功率和热仿真工具,为分析热问题和设计机车MPSoC架构的软件热缓解解决方案提供了一个适应的环境。
{"title":"Early design stage thermal evaluation and mitigation: The locomotiv architectural case","authors":"Tanguy Sassolas, C. Sandionigi, Alexandre Guerre, Alexandre Aminot, P. Vivet, H. Boussetta, L. Ferro, N. Peltier","doi":"10.7873/DATE.2014.327","DOIUrl":"https://doi.org/10.7873/DATE.2014.327","url":null,"abstract":"To offer more computing power to modern SoCs, transistors keep scaling in new technology nodes. Consequently, the power density is increasing, leading to higher thermal risks. Thermal issues need to be addressed as early as possible in the design flow, when the optimization opportunities are the highest. For early design stages, architects rely on virtual prototypes to model their designs' behavior with an adapted trade-off between accuracy and simulation speed. Unfortunately, accurate virtual prototypes fail to encompass thermal effects timescale. In this paper, we demonstrate that less accurate high-level architectural models, in conjunction with efficient power and thermal simulation tools, provide an adapted environment to analyze thermal issues and design software thermal mitigation solutions in the case of the Locomotiv MPSoC architecture.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"36 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81336444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
VRCon: Dynamic reconfiguration of voltage regulators in a multicore platform VRCon:多核平台中电压调节器的动态重构
Pub Date : 2014-03-24 DOI: 10.7873/DATE2014.378
Woojoo Lee, Yanzhi Wang, Massoud Pedram
The emerging trend toward utilizing chip multi-core processors (CMPs) that support dynamic voltage and frequency scaling (DVFS) is driven by user requirements for high performance and low power. To overcome limitations of the conventional chip-wide DVFS and achieve the maximum possible energy saving, per-core DVFS is being enabled in the recent CMP offerings. While power consumed by the CMP is reduced by per-core DVFS, power dissipated by many voltage regulators (VRs) needed to support per-core DVFS becomes critical. This paper focuses on the dynamic control of the VRs in a CMP platform. Starting with a proposed platform with a configurable VR-to-core power distribution network, two optimization methods are presented to maximize the system-wide energy savings: (i) reactive VR consolidation to reconfigure the network for maximizing the power conversion efficiency of the VRs performed under the pre-determined DVFS levels for the cores, and (ii) proactive VR consolidation to determine new DVFS levels for maximizing the total energy savings without any performance degradation. Results from detailed experiments demonstrate up to 35% VR energy loss reduction and 14% total energy saving.
使用支持动态电压和频率缩放(DVFS)的芯片多核处理器(cmp)的新兴趋势是由用户对高性能和低功耗的需求驱动的。为了克服传统的芯片范围DVFS的局限性,并实现最大限度的节能,在最近的CMP产品中启用了单核DVFS。虽然每核DVFS降低了CMP的功耗,但支持每核DVFS所需的许多稳压器(vr)的功耗变得至关重要。本文主要研究了CMP平台中vr的动态控制问题。从一个具有可配置虚拟现实到核心配电网络的拟议平台开始,提出了两种优化方法来最大化系统范围的节能:(i)被动虚拟现实整合,重新配置网络,以最大限度地提高虚拟现实在预先确定的核心DVFS水平下的功率转换效率;(ii)主动虚拟现实整合,以确定新的DVFS水平,以最大限度地提高总节能,而不会导致任何性能下降。详细的实验结果表明,高达35%的虚拟现实能量损失减少和14%的总节能。
{"title":"VRCon: Dynamic reconfiguration of voltage regulators in a multicore platform","authors":"Woojoo Lee, Yanzhi Wang, Massoud Pedram","doi":"10.7873/DATE2014.378","DOIUrl":"https://doi.org/10.7873/DATE2014.378","url":null,"abstract":"The emerging trend toward utilizing chip multi-core processors (CMPs) that support dynamic voltage and frequency scaling (DVFS) is driven by user requirements for high performance and low power. To overcome limitations of the conventional chip-wide DVFS and achieve the maximum possible energy saving, per-core DVFS is being enabled in the recent CMP offerings. While power consumed by the CMP is reduced by per-core DVFS, power dissipated by many voltage regulators (VRs) needed to support per-core DVFS becomes critical. This paper focuses on the dynamic control of the VRs in a CMP platform. Starting with a proposed platform with a configurable VR-to-core power distribution network, two optimization methods are presented to maximize the system-wide energy savings: (i) reactive VR consolidation to reconfigure the network for maximizing the power conversion efficiency of the VRs performed under the pre-determined DVFS levels for the cores, and (ii) proactive VR consolidation to determine new DVFS levels for maximizing the total energy savings without any performance degradation. Results from detailed experiments demonstrate up to 35% VR energy loss reduction and 14% total energy saving.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"104 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78776769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Accelerating graph computation with racetrack memory and pointer-assisted graph representation 利用赛道记忆和指针辅助图形表示加速图形计算
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.172
Eunhyuk Park, S. Yoo, Sunggu Lee, Hai Helen Li
The poor performance of NAND Flash memory, such as long access latency and large granularity access, is the major bottleneck of graph processing. This paper proposes an intelligent storage for graph processing which is based on fast and low cost racetrack memory and a pointer-assisted graph representation. Our experiments show that the proposed intelligent storage based on racetrack memory reduces total processing time of three representative graph computations by 40.2%~86.9% compared to the graph processing, GraphChi, which exploits sequential accesses based on normal NAND Flash memory-based SSD. Faster execution also reduces energy consumption by 39.6%~90.0%. The in-storage processing capability gives additional 10.5%~16.4% performance improvements and 12.0%~14.4% reduction of energy consumption.
NAND闪存的访问延迟长、访问粒度大等性能差是图形处理的主要瓶颈。本文提出了一种基于快速低成本赛道内存和指针辅助图形表示的图形处理智能存储方法。我们的实验表明,与基于普通NAND闪存的SSD的顺序访问的GraphChi相比,基于赛道内存的智能存储将三个代表性图形计算的总处理时间减少了40.2%~86.9%。更快的执行速度也降低了39.6%~90.0%的能耗。存储处理能力使性能提高10.5%~16.4%,能耗降低12.0%~14.4%。
{"title":"Accelerating graph computation with racetrack memory and pointer-assisted graph representation","authors":"Eunhyuk Park, S. Yoo, Sunggu Lee, Hai Helen Li","doi":"10.7873/DATE.2014.172","DOIUrl":"https://doi.org/10.7873/DATE.2014.172","url":null,"abstract":"The poor performance of NAND Flash memory, such as long access latency and large granularity access, is the major bottleneck of graph processing. This paper proposes an intelligent storage for graph processing which is based on fast and low cost racetrack memory and a pointer-assisted graph representation. Our experiments show that the proposed intelligent storage based on racetrack memory reduces total processing time of three representative graph computations by 40.2%~86.9% compared to the graph processing, GraphChi, which exploits sequential accesses based on normal NAND Flash memory-based SSD. Faster execution also reduces energy consumption by 39.6%~90.0%. The in-storage processing capability gives additional 10.5%~16.4% performance improvements and 12.0%~14.4% reduction of energy consumption.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"28 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90042962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Unified, ultra compact, quadratic power proxies for multi-core processors 统一的,超紧凑的,二次幂代理多核处理器
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.347
M. Yasin, Anas Shahrour, I. Elfadel
Per-core power proxies for multi-core processors are known to use several dozens of hardware activity monitors to achieve a 2% accuracy on core power estimation. These activity monitors are typically not accessible to the user, and even if they were accessible, there would be a significant overhead in using them at the kernel or OS level for power monitoring or control. Furthermore, when scaled up to hundreds of cores per chip, such power proxies become a computational bottleneck for power management operations such as chip power capping. In this paper, we show that a 4% accuracy or better for per-core power estimation can be achieved using an ultra compact power proxy based on a hybrid set of only four user-accessible parameters, namely core frequency, core temperature, instruction-per-cycle and active-state residency. Our proxy is nonlinear, valid across all P and C states, and is based on a randomized power data collection strategy that aims at exercising all the P and C levels of each core. We illustrate the accuracy of the model using the full suite of the SPEC CPU 2006 benchmarks on a 12-core processor.
众所周知,多核处理器的每核功率代理需要使用几十个硬件活动监视器来实现2%的核心功率估计精度。这些活动监视器通常是用户无法访问的,即使可以访问,在内核或操作系统级别使用它们进行电源监视或控制也会有很大的开销。此外,当每个芯片扩展到数百个内核时,这种功率代理将成为电源管理操作(如芯片功率上限)的计算瓶颈。在本文中,我们表明,使用基于只有四个用户可访问参数的混合集的超紧凑功率代理,即核心频率,核心温度,每周期指令和活动状态驻留,可以实现4%或更高的每核功率估计精度。我们的代理是非线性的,在所有P和C状态下都有效,并且基于随机的功率数据收集策略,旨在行使每个核心的所有P和C级别。我们使用12核处理器上的全套SPEC CPU 2006基准测试来说明模型的准确性。
{"title":"Unified, ultra compact, quadratic power proxies for multi-core processors","authors":"M. Yasin, Anas Shahrour, I. Elfadel","doi":"10.7873/DATE.2014.347","DOIUrl":"https://doi.org/10.7873/DATE.2014.347","url":null,"abstract":"Per-core power proxies for multi-core processors are known to use several dozens of hardware activity monitors to achieve a 2% accuracy on core power estimation. These activity monitors are typically not accessible to the user, and even if they were accessible, there would be a significant overhead in using them at the kernel or OS level for power monitoring or control. Furthermore, when scaled up to hundreds of cores per chip, such power proxies become a computational bottleneck for power management operations such as chip power capping. In this paper, we show that a 4% accuracy or better for per-core power estimation can be achieved using an ultra compact power proxy based on a hybrid set of only four user-accessible parameters, namely core frequency, core temperature, instruction-per-cycle and active-state residency. Our proxy is nonlinear, valid across all P and C states, and is based on a randomized power data collection strategy that aims at exercising all the P and C levels of each core. We illustrate the accuracy of the model using the full suite of the SPEC CPU 2006 benchmarks on a 12-core processor.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"10 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89654435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Novel circuit topology synthesis method using circuit feature mining and symbolic comparison 基于电路特征挖掘和符号比较的电路拓扑综合新方法
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.030
C. Ferent, A. Doboli
This paper presents a reasoning-based approach to analog circuit synthesis using ordered node clustering representations (ONCR) to describe alternative circuit features and symbolic circuit comparison to characterize performance tradeoffs of synthesized solutions. Case studies illustrate application of the proposed methods to topology selection and refinement.
本文提出了一种基于推理的模拟电路合成方法,使用有序节点聚类表示(ONCR)来描述可选电路特征,并使用符号电路比较来表征合成方案的性能权衡。实例研究说明了所提方法在拓扑选择和优化中的应用。
{"title":"Novel circuit topology synthesis method using circuit feature mining and symbolic comparison","authors":"C. Ferent, A. Doboli","doi":"10.7873/DATE.2014.030","DOIUrl":"https://doi.org/10.7873/DATE.2014.030","url":null,"abstract":"This paper presents a reasoning-based approach to analog circuit synthesis using ordered node clustering representations (ONCR) to describe alternative circuit features and symbolic circuit comparison to characterize performance tradeoffs of synthesized solutions. Case studies illustrate application of the proposed methods to topology selection and refinement.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89827441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Fast STA prediction-based gate-level timing simulation 基于快速STA预测的门级时序仿真
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.261
T. B. Ahmad, M. Ciesielski
Traditional dynamic simulation with standard delay format (SDF) back-annotation cannot be reliably performed on large designs. The large size of SDF files makes the event-driven timing simulation extremely slow as it has to process an excessive number of events. In order to accelerate gate-level timing simulation we propose an automated fast prediction-based gatelevel timing simulation that combines static timing analysis (STA) at the block level with dynamic timing simulation at the I/O interfaces. We demonstrate that the proposed timing simulation can be done earlier in the design cycle in parallel with synthesis.
采用标准延迟格式(SDF)反向标注的传统动态仿真在大型设计中无法可靠地进行。大尺寸的SDF文件使得事件驱动的计时模拟非常慢,因为它必须处理过多的事件。为了加速门级时序仿真,我们提出了一种基于自动快速预测的门级时序仿真,该仿真将块级的静态时序分析(STA)与I/O接口的动态时序仿真相结合。我们证明了所提出的时序仿真可以在设计周期的早期与合成并行进行。
{"title":"Fast STA prediction-based gate-level timing simulation","authors":"T. B. Ahmad, M. Ciesielski","doi":"10.7873/DATE.2014.261","DOIUrl":"https://doi.org/10.7873/DATE.2014.261","url":null,"abstract":"Traditional dynamic simulation with standard delay format (SDF) back-annotation cannot be reliably performed on large designs. The large size of SDF files makes the event-driven timing simulation extremely slow as it has to process an excessive number of events. In order to accelerate gate-level timing simulation we propose an automated fast prediction-based gatelevel timing simulation that combines static timing analysis (STA) at the block level with dynamic timing simulation at the I/O interfaces. We demonstrate that the proposed timing simulation can be done earlier in the design cycle in parallel with synthesis.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"32 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89937581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
P/G TSV planning for IR-drop reduction in 3D-ICs 3d集成电路中降低红外降的P/G TSV规划
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.057
Shengcheng Wang, F. Firouzi, Fabian Oboril, M. Tahoori
In recent years, interconnect issues emerged as major performance challenges for Two-Dimensional-Integrated-Circuits (2D-ICs). In this context, Three-Dimensional-ICs (3D-ICs), which consist of several active layers stacked above each other, offer a very attractive alternative to conventional 2D-ICs. However, 3D-ICs also face many challenges associated with the Power Distribution Network (PDN) design due to the increasing power density and larger supply current compared to 2D-ICs. As an important part of 3D-IC PDNs, Power/Ground (P/G) Through-Silicon-Vias (TSVs) should be well-managed. Excessive or ill-placed P/G TSVs impact the power integrity (e.g. IR-drop), and also consume a considerable amount of chip real estate. In this work, we propose a Mixed-Integer-Linear-Programming (MILP)-based technique to plan the P/G TSVs. The goal of our approach is to minimize the average IR-drop while satisfying the total area constraint of TSVs by optimizing the P/G TSV placement. Therefore, the locations, sizes and the total number of the P/G TSVs are co-optimized simultaneously. The experimental results show that the average IR-drop can be reduced by 11.8 % in average using the proposed method compared to a random placement technique with a much smaller runtime.
近年来,互连问题成为二维集成电路(2d - ic)的主要性能挑战。在这种情况下,由几个相互堆叠的有源层组成的三维集成电路(3d - ic)为传统的2d - ic提供了一个非常有吸引力的替代方案。然而,3 d-ics也面临许多挑战与配电网络(生产)设计由于增加功率密度和较大的比2 d-ics电源电流。作为3D-IC pdn的重要组成部分,Power/Ground (P/G) Through-Silicon-Vias (tsv)应该得到很好的管理。过多或放置不当的P/G tsv会影响电源完整性(例如ir下降),并且还会消耗相当多的芯片空间。在这项工作中,我们提出了一种基于混合整数线性规划(MILP)的技术来规划P/G tsv。我们的方法的目标是通过优化P/G TSV放置来最小化平均ir下降,同时满足TSV的总面积约束。因此,可以同时对P/G tsv的位置、大小和总数进行协同优化。实验结果表明,平均可以减少ir降11.8%平均使用该方法比随机放置技术与运行时要小得多。
{"title":"P/G TSV planning for IR-drop reduction in 3D-ICs","authors":"Shengcheng Wang, F. Firouzi, Fabian Oboril, M. Tahoori","doi":"10.7873/DATE.2014.057","DOIUrl":"https://doi.org/10.7873/DATE.2014.057","url":null,"abstract":"In recent years, interconnect issues emerged as major performance challenges for Two-Dimensional-Integrated-Circuits (2D-ICs). In this context, Three-Dimensional-ICs (3D-ICs), which consist of several active layers stacked above each other, offer a very attractive alternative to conventional 2D-ICs. However, 3D-ICs also face many challenges associated with the Power Distribution Network (PDN) design due to the increasing power density and larger supply current compared to 2D-ICs. As an important part of 3D-IC PDNs, Power/Ground (P/G) Through-Silicon-Vias (TSVs) should be well-managed. Excessive or ill-placed P/G TSVs impact the power integrity (e.g. IR-drop), and also consume a considerable amount of chip real estate. In this work, we propose a Mixed-Integer-Linear-Programming (MILP)-based technique to plan the P/G TSVs. The goal of our approach is to minimize the average IR-drop while satisfying the total area constraint of TSVs by optimizing the P/G TSV placement. Therefore, the locations, sizes and the total number of the P/G TSVs are co-optimized simultaneously. The experimental results show that the average IR-drop can be reduced by 11.8 % in average using the proposed method compared to a random placement technique with a much smaller runtime.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"76 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86870675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A novel embedded system for vision tracking 一种新型的嵌入式视觉跟踪系统
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.353
A. Nikitakis, Theofilos Paganos, I. Papaefstathiou
One of the most important challenges in the field of Computer Vision is the implementation of low-power embedded systems that will execute very accurate, yet real-time, algorithms. In the visual tracking sector one of the most promising approaches is the recently introduced OpenTLD algorithm which uses a random forest classification method. While it is very robust, it cannot be efficiently parallelized in its native form as its memory access pattern has certain characteristics that make it hard to take advantage of the conventional memory hierarchies. In this paper, we present a novel embedded system implementing this algorithm. We accelerate the bottleneck of the algorithm by designing and implementing a high bandwidth distributed memory sub-system which is independent of the various software parameters. We demonstrate the applicability and efficiency of this novel approach by implementing our scheme in a modern FPGA.
计算机视觉领域最重要的挑战之一是实现低功耗嵌入式系统,该系统将执行非常精确的实时算法。在视觉跟踪领域,最有前途的方法之一是最近引入的OpenTLD算法,该算法使用随机森林分类方法。虽然它非常健壮,但由于其内存访问模式具有某些特征,难以利用传统的内存层次结构,因此无法以其原生形式有效地并行化。在本文中,我们提出了一种实现该算法的新型嵌入式系统。我们设计并实现了一个不受各种软件参数影响的高带宽分布式存储子系统,从而加速了算法的瓶颈。我们通过在现代FPGA上实现我们的方案来证明这种新方法的适用性和效率。
{"title":"A novel embedded system for vision tracking","authors":"A. Nikitakis, Theofilos Paganos, I. Papaefstathiou","doi":"10.7873/DATE.2014.353","DOIUrl":"https://doi.org/10.7873/DATE.2014.353","url":null,"abstract":"One of the most important challenges in the field of Computer Vision is the implementation of low-power embedded systems that will execute very accurate, yet real-time, algorithms. In the visual tracking sector one of the most promising approaches is the recently introduced OpenTLD algorithm which uses a random forest classification method. While it is very robust, it cannot be efficiently parallelized in its native form as its memory access pattern has certain characteristics that make it hard to take advantage of the conventional memory hierarchies. In this paper, we present a novel embedded system implementing this algorithm. We accelerate the bottleneck of the algorithm by designing and implementing a high bandwidth distributed memory sub-system which is independent of the various software parameters. We demonstrate the applicability and efficiency of this novel approach by implementing our scheme in a modern FPGA.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"49 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87483699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Battery aware stochastic QoS boosting in mobile computing devices 移动计算设备中电池感知随机QoS提升
Pub Date : 2014-03-24 DOI: 10.5555/2616606.2616818
Hao Shen, Qiuwen Chen, Qinru Qiu
Mobile computing has been weaved into everyday lives to a great extend. Their usage is clearly imprinted with user's personal signature. The ability to learn such signature enables immense potential in workload prediction and resource management. In this work, we investigate the user behavior modeling and apply the model for energy management. Our goal is to maximize the quality of service (QoS) provided by the mobile device (i.e., smartphone), while keep the risk of battery depletion below a given threshold. A Markov Decision Process (MDP) is constructed from history user behavior. The optimal management policy is solved using linear programing. Simulations based on real user traces validate that, compared to existing battery energy management techniques, the stochastic control performs better in boosting the mobile devices' QoS without significantly increasing the chance of battery depletion.
移动计算已经在很大程度上融入了人们的日常生活。它们的使用清楚地印着用户的个人签名。学习这种签名的能力在工作负载预测和资源管理方面具有巨大的潜力。在这项工作中,我们研究了用户行为建模,并将该模型应用于能源管理。我们的目标是最大限度地提高移动设备(即智能手机)提供的服务质量(QoS),同时将电池耗尽的风险保持在给定阈值以下。基于历史用户行为构造马尔可夫决策过程(MDP)。采用线性规划方法求解最优管理策略。基于真实用户跟踪的仿真验证了,与现有的电池能量管理技术相比,随机控制在提高移动设备的QoS方面表现更好,而不会显著增加电池耗尽的机会。
{"title":"Battery aware stochastic QoS boosting in mobile computing devices","authors":"Hao Shen, Qiuwen Chen, Qinru Qiu","doi":"10.5555/2616606.2616818","DOIUrl":"https://doi.org/10.5555/2616606.2616818","url":null,"abstract":"Mobile computing has been weaved into everyday lives to a great extend. Their usage is clearly imprinted with user's personal signature. The ability to learn such signature enables immense potential in workload prediction and resource management. In this work, we investigate the user behavior modeling and apply the model for energy management. Our goal is to maximize the quality of service (QoS) provided by the mobile device (i.e., smartphone), while keep the risk of battery depletion below a given threshold. A Markov Decision Process (MDP) is constructed from history user behavior. The optimal management policy is solved using linear programing. Simulations based on real user traces validate that, compared to existing battery energy management techniques, the stochastic control performs better in boosting the mobile devices' QoS without significantly increasing the chance of battery depletion.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"12 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88716973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exploiting expendable process-margins in DRAMs for run-time performance optimization 利用dram中可消耗的进程余量进行运行时性能优化
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.186
K. Chandrasekar, Sven Goossens, C. Weis, Martijn Koedam, B. Akesson, N. Wehn, K. Goossens
Manufacturing-time process (P) variations and runtime voltage (V) and temperature (T) variations can affect a DRAM's performance severely. To counter these effects, DRAM vendors provide substantial design-time PVT timing margins to guarantee correct DRAM functionality under worst-case operating conditions. Unfortunately, with technology scaling these timing margins have become large and very pessimistic for a majority of the manufactured DRAMs. While run-time variations are specific to operating conditions and as a result, their margins difficult to optimize, process variations are manufacturing-time effects and excessive process-margins can be reduced at run-time, on a per-device basis, if properly identified. In this paper, we propose a generic post-manufacturing performance characterization methodology for DRAMs that identifies this excess in process-margins for any given DRAM device at runtime, while retaining the requisite margins for voltage (noise) and temperature variations. By doing so, the methodology ascertains the actual impact of process-variations on the particular DRAM device and optimizes its access latencies (timings), thereby improving its overall performance. We evaluate this methodology on 48 DDR3 devices (from 12 DIMMs) and verify the derived timings under worst-case operating conditions, showing up to 33.3% and 25.9% reduction in DRAM read and write latencies, respectively.
制造时间过程(P)变化以及运行电压(V)和温度(T)变化会严重影响DRAM的性能。为了应对这些影响,DRAM供应商提供了大量的设计时间PVT时间余量,以保证在最坏的工作条件下正确的DRAM功能。不幸的是,随着技术的扩展,这些时间裕度变得越来越大,对于大多数制造的dram来说,这是非常悲观的。虽然运行时变化是特定于操作条件的,因此,它们的余量很难优化,但过程变化是制造时间的影响,如果正确识别,可以在运行时以每个设备为基础减少过多的过程余量。在本文中,我们提出了一种通用的DRAM制造后性能表征方法,该方法可以识别任何给定DRAM设备在运行时的多余工艺裕度,同时保留电压(噪声)和温度变化的必要裕度。通过这样做,该方法确定了进程变化对特定DRAM设备的实际影响,并优化了其访问延迟(计时),从而提高了其整体性能。我们在48个DDR3器件(来自12个dimm)上评估了这种方法,并在最坏的操作条件下验证了导出的时序,分别显示DRAM读写延迟减少33.3%和25.9%。
{"title":"Exploiting expendable process-margins in DRAMs for run-time performance optimization","authors":"K. Chandrasekar, Sven Goossens, C. Weis, Martijn Koedam, B. Akesson, N. Wehn, K. Goossens","doi":"10.7873/DATE.2014.186","DOIUrl":"https://doi.org/10.7873/DATE.2014.186","url":null,"abstract":"Manufacturing-time process (P) variations and runtime voltage (V) and temperature (T) variations can affect a DRAM's performance severely. To counter these effects, DRAM vendors provide substantial design-time PVT timing margins to guarantee correct DRAM functionality under worst-case operating conditions. Unfortunately, with technology scaling these timing margins have become large and very pessimistic for a majority of the manufactured DRAMs. While run-time variations are specific to operating conditions and as a result, their margins difficult to optimize, process variations are manufacturing-time effects and excessive process-margins can be reduced at run-time, on a per-device basis, if properly identified. In this paper, we propose a generic post-manufacturing performance characterization methodology for DRAMs that identifies this excess in process-margins for any given DRAM device at runtime, while retaining the requisite margins for voltage (noise) and temperature variations. By doing so, the methodology ascertains the actual impact of process-variations on the particular DRAM device and optimizes its access latencies (timings), thereby improving its overall performance. We evaluate this methodology on 48 DDR3 devices (from 12 DIMMs) and verify the derived timings under worst-case operating conditions, showing up to 33.3% and 25.9% reduction in DRAM read and write latencies, respectively.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"6 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88913324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
期刊
2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1