2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)最新文献

英文中文

An approximate computing technique for reducing the complexity of a direct-solver for sparse linear systems in real-time video processing 一种降低实时视频处理中稀疏线性系统直接求解器复杂度的近似计算技术

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)

Pub Date : 2014-06-01 DOI: 10.1145/2593069.2593082

Michael Schaffner, Frank K. Gürkaynak, A. Smolic, H. Kaeslin, L. Benini

Many video processing algorithms are formulated as least-squares problems that result in large, sparse linear systems. Solving such systems in real time is very demanding. This paper focuses on reducing the computational complexity of a direct Cholesky-decomposition-based solver. Our approximation scheme builds on the observation that, in well-conditioned problems, many elements in the decomposition nearly vanish. Such elements may be pruned from the dependency graph with mild accuracy degradation. Using an example from image-domain warping, we show that pruning reduces the amount of operations per solve by over 75 %, resulting in significant savings in computing time, area or energy.

许多视频处理算法被表述为导致大型稀疏线性系统的最小二乘问题。实时解决这样的系统是非常困难的。本文的重点是降低直接基于cholesky分解的求解器的计算复杂度。我们的近似方案建立在观察的基础上，在条件良好的问题中，分解中的许多元素几乎消失。这样的元素可能会从依赖关系图中删除，但准确性会有轻微的降低。使用图像域翘曲的示例，我们表明修剪将每次求解的操作量减少了75%以上，从而显着节省了计算时间，面积或能量。

引用次数: 11

ePlace: Electrostatics based placement using Nesterov's method ePlace:使用Nesterov方法的基于静电的放置

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)

Pub Date : 2014-06-01 DOI: 10.1145/2593069.2593133

Jingwei Lu, Pengwen Chen, Chin-Chih Chang, Lu Sha, D. J. Huang, C. Teng, Chung-Kuan Cheng

ePlace is a generalized analytic algorithm to handle large-scale standard-cell and mixed-size placement. We use a novel density function based on electrostatics to remove overlap and Nesterov's method to minimize the nonlinear cost. Steplength is estimated as the inverse of Lipschitz constant, which is determined by our dynamic prediction and backtracking method. An approximated preconditioner is proposed to resolve the difference between large macros and standard cells, while an annealing engine is devised to handle macro legalization followed by placement of standard cells. The above innovations are integrated into our placement prototype ePlace, which outperforms the leading-edge placers on respective standard-cell and mixed-size benchmark suites. Specifically, ePlace produces 2.83%, 4.59% and 7.13% shorter wirelength while runs 3.05×, 2.84× and 1.05× faster than BonnPlace, MAPLE and NTUplace3-unified in average of ISPD 2005, ISPD 2006 and MMS circuits, respectively.

ePlace是一种用于处理大规模标准单元和混合大小放置的广义解析算法。我们使用了一种新的基于静电的密度函数来消除重叠，并使用Nesterov方法来最小化非线性代价。步长估计为利普希茨常数的倒数，该常数由动态预测和回溯方法确定。提出了一种近似的预调节器来解决大宏和标准单元之间的差异，同时设计了一个退火引擎来处理宏合法化，然后放置标准单元。上述创新集成到我们的放置原型ePlace中，它在各自的标准单元和混合尺寸基准套件上优于领先的放置器。在ISPD 2005、ISPD 2006和MMS电路中，ePlace比BonnPlace、MAPLE和ntuplace3的平均运行速度分别快3.05倍、2.84倍和1.05倍，比BonnPlace、MAPLE和ntuplace3的平均运行速度短2.83%、4.59%和7.13%。

引用次数: 46

On the scheduling of fault-tolerant mixed-criticality systems 容错混合临界系统的调度问题

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)

Pub Date : 2014-06-01 DOI: 10.1145/2593069.2593169

Pengcheng Huang, Hoeseok Yang, L. Thiele

We consider in this paper fault-tolerant mixed-criticality scheduling, where heterogeneous safety guarantees must be provided to functionalities (tasks) of varying criticalities (importances). We model explicitly the safety requirements for tasks of different criticalities according to safety standards, assuming hardware transient faults. We further provide analysis techniques to bound the effects of task killing and service degradation on the system safety and schedulability. Based on our model and analysis, we show that our problem can be converted to a conventional mixed-criticality scheduling problem. Thus, we broaden the scope of applicability of the conventional mixed-criticality scheduling techniques. Our proposed techniques are validated with a realistic flight management system application and extensive simulations.

在本文中，我们考虑容错混合临界调度，其中必须为不同临界(重要性)的功能(任务)提供异构安全保证。我们根据安全标准明确建模不同临界任务的安全要求，假设硬件瞬态故障。我们进一步提供了分析技术来绑定任务终止和服务降级对系统安全性和可调度性的影响。基于我们的模型和分析，我们表明我们的问题可以转化为一个传统的混合临界调度问题。从而拓宽了传统混合临界调度技术的适用范围。我们提出的技术通过实际的飞行管理系统应用和广泛的仿真得到了验证。

引用次数: 48

Run-time technique for simultaneous aging and power optimization in GPGPUs gpgpu中同时老化和功耗优化的运行时技术

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)

Pub Date : 2014-06-01 DOI: 10.1145/2593069.2593208

Xiaoming Chen, Yu Wang, Yun Liang, Yuan Xie, Huazhong Yang

High-performance general-purpose graphics processing units (GPGPUs) may suffer from serious power and negative bias temperature instability (NBTI) problems. In this paper, we propose a framework for run-time aging and power optimization. Our technique is based on the observation that many GPGPU applications achieve optimal performance with only a portion of cores due to either bandwidth saturation or shared resource contention. During run-time, given the dynamically tracked NBTI-induced threshold voltage shift and the problem size of GPGPU applications, our algorithm returns the optimal number of cores using detailed performance modeling. The unused cores are power-gated for power saving and NBTI recovery. Experiments show that our proposed technique achieves on average 34% reduction in NBTI-induced threshold voltage shift and 19% power reduction, while the average performance degradation is less than 1%.

高性能通用图形处理单元(gpgpu)可能会遭受严重的功率和负偏置温度不稳定性(NBTI)问题。在本文中，我们提出了一个运行时老化和功率优化的框架。我们的技术是基于这样一种观察，即由于带宽饱和或共享资源争用，许多GPGPU应用程序仅使用部分内核即可实现最佳性能。在运行期间，考虑到动态跟踪的nbti诱导的阈值电压偏移和GPGPU应用程序的问题大小，我们的算法使用详细的性能建模返回最优核数。未使用的核心是电源门控，以节省电力和NBTI恢复。实验表明，该技术平均降低了34%的nbti诱导阈值电压偏移和19%的功耗，而平均性能下降小于1%。

引用次数: 32

VIX: Virtual Input Crossbar for efficient switch allocation VIX:用于有效开关分配的虚拟输入交叉条

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)

Pub Date : 2014-06-01 DOI: 10.1145/2593069.2593242

S.Srikiran Rao, Supreet Jeloka, R. Das, D. Blaauw, R. Dreslinski, T. Mudge

Separable allocators in on-chip routers perform switch allocation in two stages that often make uncoordinated decisions resulting in sub-optimal switch allocation. We propose Virtual Input Crossbars (VIX), where more than one virtual channel (VC) of an input port is connected to the crossbar. VIX improves switch allocation by allowing more than one input VC of an input port to transmit flits in the same cycle. Also, more input VCs can participate in the output arbitration, reducing the chances of uncoordinated decisions. VIX improves network throughput by more than 15% for the topologies studied without affecting the router critical path.

片上路由器中的可分离分配器分两阶段进行交换机分配，往往会产生不协调的决策，导致交换机分配不优。我们提出了虚拟输入交叉条(VIX)，其中一个输入端口的多个虚拟通道(VC)连接到交叉条。通过允许一个输入端口的多个输入VC在同一周期内传输flts, VIX改进了交换机分配。此外，更多的投入风险投资可以参与产出仲裁，减少了不协调决策的可能性。对于所研究的拓扑结构，VIX在不影响路由器关键路径的情况下将网络吞吐量提高了15%以上。

引用次数: 22

Synthesis of PCHB-WCHB hybrid quasi-delay insensitive circuits PCHB-WCHB混合准延迟不敏感电路的合成

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)

Pub Date : 2014-06-01 DOI: 10.1145/2593069.2593224

C. Chuang, Yi-Hsiang Lai, J. H. Jiang

The increasing cost paid in clocking integrated circuits and combating timing variations forces designers to rethink asynchronous approaches to system realization. Among various techniques, quasi-delay-insensitive (QDI) design is promising due to its very relaxed timing assumption. Its expensive logic overhead, however, often nullifies its promise of performance and power improvements, and remains a major obstacle against its adoption. To overcome this obstacle, this paper proposes an efficient static performance analysis procedure and a synthesis flow for precharged half buffer (PCHB) and weak-conditioned half buffer (WCHB) circuit optimization. Experimental results demonstrate efficient performance analysis and effective area reduction under pipeline cycle time constraints.

随着时钟集成电路成本的不断增加以及与时序变化的斗争，设计人员不得不重新考虑系统实现的异步方法。在各种技术中，准延迟不敏感(QDI)设计由于其非常宽松的时序假设而很有前途。然而，其昂贵的逻辑开销常常使其性能和功率改进的承诺落空，并且仍然是其采用的主要障碍。为了克服这一障碍，本文提出了一种有效的静态性能分析程序和预充电半缓冲器(PCHB)和弱条件半缓冲器(WCHB)电路优化的综合流程。实验结果表明，在管道循环时间约束下，有效的性能分析和有效的面积缩减。

引用次数: 9

SHiFA: System-level hierarchy in run-time fault-aware management of many-core systems 多核心系统运行时故障感知管理中的系统级层次结构

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)

Pub Date : 2014-06-01 DOI: 10.1145/2593069.2593214

Mohammad Fattah, M. Palesi, P. Liljeberg, J. Plosila, H. Tenhunen

A system-level approach to fault-aware resource management of many-core systems is proposed. The proposed approach, called SHiFA, is able to tolerate run-time faults at system level without any hardware overhead. In contrast to the existing system-level methods, network resources are also considered to be potentially faulty. Accordingly, applications are mapped onto healthy nodes of the system at run-time such that their interaction will not require the use of faulty elements. By utilizing the simple routing approach, results show 100% utilizability of PEs and 99.41% of successful mapping when up to 8 links are broken. SHiFA design is based on distributed operating systems, such that it is kept scalable for future many-core systems. A significant improvement in scalability properties is observed compared to the state-of-the-art distributed approaches.

提出了一种多核心系统故障感知资源管理的系统级方法。所提出的方法称为SHiFA，它能够容忍系统级的运行时错误，而不需要任何硬件开销。与现有的系统级方法相比，网络资源也被认为是潜在的故障。因此，应用程序在运行时被映射到系统的健康节点，这样它们的交互就不需要使用有缺陷的元素。通过使用简单的路由方法，结果表明，当多达8条链路断开时，pe的利用率为100%，映射成功率为99.41%。SHiFA设计基于分布式操作系统，因此它可以在未来的多核系统中保持可扩展性。与最先进的分布式方法相比，在可伸缩性属性方面有了显著的改进。

引用次数: 12

An HDL-based system design methodology for multistandard RF SoC's 基于hdl的多标准射频SoC系统设计方法

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)

Pub Date : 2014-06-01 DOI: 10.1145/2593069.2593089

A. Atac, Zhimiao Chen, Lei Liao, Yifan Wang, M. Schleyer, Ye Zhang, R. Wunderlich, S. Heinen

Multistandard SoC's including advanced RF and analog circuitry with digital blocks are pervasive in modern IC's. However, the system design and verification methodologies that capture the complexity of multistandard RF SoC's are still limited. In this paper, an HDL design methodology is introduced for multistandard RF SoC's, which covers all the design layers from system design, to automatic extraction of the models from circuits and a systematic top level verification. The offered HDL based design methodology combines top down and bottom up design approaches, and brings the design and verification closer by reflecting the circuits to models automatically via an Automatic Parameter Extraction (APX) tool. System or block level verification is obtained with models automatically by overnight runs, without the need for extra test benches or designer interaction. This enables short term detection of functional errors or performance losses. A first time tape out of a multimode Bluetooth transceiver SoC is designed and fabricated in 8 months by using the offered methodology. The accuracy of the system level simulations show a very good match with the measurement results after fabrication. The test SoC is fabricated with 0.13 μm CMOS technology.

多标准SoC包括先进的射频和模拟电路与数字块普遍存在于现代集成电路。然而，捕捉多标准射频SoC复杂性的系统设计和验证方法仍然有限。本文介绍了一种多标准射频SoC的HDL设计方法，它涵盖了从系统设计到电路模型自动提取和系统顶层验证的所有设计层。所提供的基于HDL的设计方法结合了自顶向下和自底向上的设计方法，并通过自动参数提取(APX)工具将电路自动反映到模型中，从而使设计和验证更加接近。系统或块级验证是通过夜间运行自动获得的模型，而不需要额外的测试台或设计人员交互。这样可以在短期内检测功能错误或性能损失。使用所提供的方法，在8个月内设计和制造了多模蓝牙收发器SoC的首次磁带。系统级仿真结果与制作后的测量结果吻合较好。测试SoC采用0.13 μm CMOS工艺制作。

{"title":"An HDL-based system design methodology for multistandard RF SoC's","authors":"A. Atac, Zhimiao Chen, Lei Liao, Yifan Wang, M. Schleyer, Ye Zhang, R. Wunderlich, S. Heinen","doi":"10.1145/2593069.2593089","DOIUrl":"https://doi.org/10.1145/2593069.2593089","url":null,"abstract":"Multistandard SoC's including advanced RF and analog circuitry with digital blocks are pervasive in modern IC's. However, the system design and verification methodologies that capture the complexity of multistandard RF SoC's are still limited. In this paper, an HDL design methodology is introduced for multistandard RF SoC's, which covers all the design layers from system design, to automatic extraction of the models from circuits and a systematic top level verification. The offered HDL based design methodology combines top down and bottom up design approaches, and brings the design and verification closer by reflecting the circuits to models automatically via an Automatic Parameter Extraction (APX) tool. System or block level verification is obtained with models automatically by overnight runs, without the need for extra test benches or designer interaction. This enables short term detection of functional errors or performance losses. A first time tape out of a multimode Bluetooth transceiver SoC is designed and fabricated in 8 months by using the offered methodology. The accuracy of the system level simulations show a very good match with the measurement results after fabrication. The test SoC is fabricated with 0.13 μm CMOS technology.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123402913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Computing with hybrid CMOS/STO circuits 混合CMOS/STO电路的计算

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)

Pub Date : 2014-06-01 DOI: 10.1145/2593069.2596673

M. Kabir, M. Stan

Recent research in spin torque nano-oscillators (STNO) have opened the possibility of using electron spin to generate sustained microwave oscillations. Furthermore, the experimental verification of synchronization of STNOs could allow communication and computation with nanoscaled oscillators. In this paper, we propose a hybrid MOSFET/STNO array which can be used for pattern recognition applications. First, we show that an array of electrically coupled STNOs obey the dynamics of Kuramoto's weakly coupled oscillators [1]. This behavior allows us to use the STNO array to implement the oscillatory neurocomputer proposed by Hoppensteadt et. al. [2]. We next consider practical STNO device geometries which can be used in a parallel-connected array. We propose using a dual barrier magnetic tunnel junction (DMTJ) to produce strong, harmonic oscillation signals in the absence on an external magnetic field. Finally, we perform HSPICE simulations of a hybrid MOSFET/STNO array to show how it can be used for pattern recognition.

近年来对自旋力矩纳米振荡器(STNO)的研究开辟了利用电子自旋产生持续微波振荡的可能性。此外，实验验证了STNOs的同步性，可以实现与纳米级振荡器的通信和计算。在本文中，我们提出了一种混合MOSFET/STNO阵列，可用于模式识别应用。首先，我们证明了一组电耦合STNOs服从Kuramoto弱耦合振荡器的动力学[1]。这种行为允许我们使用STNO阵列来实现由hoppenstead等人[2]提出的振荡神经计算机。接下来，我们考虑可用于并行连接阵列的实用STNO器件几何形状。我们建议使用双势垒磁隧道结(DMTJ)在没有外部磁场的情况下产生强谐波振荡信号。最后，我们对混合MOSFET/STNO阵列进行HSPICE模拟，以展示如何将其用于模式识别。

引用次数: 7

Power-aware deployment and control of forced-convection and thermoelectric coolers 功率感知部署和控制强制对流和热电冷却器

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)

Pub Date : 2014-06-01 DOI: 10.1145/2593069.2593186

M. Dousti, Massoud Pedram

Advances in the thermoelectric cooling technology have made it one of the promising solutions for spot cooling in VLSI circuits. Thermoelectric coolers (TECs) generate heat during their operation. This heat plus the heat generated in the circuit should be transferred to the ambient environment in order to avoid high die temperatures. This paper describes a hybrid cooling solution in which TECs are augmented with forced-convection coolers (fans). Precisely, an optimization framework called OFTEC is presented which finds the optimum TEC driving current and the fan speed to minimize the overall power consumption of the cooling system while maintaining safe die temperatures. Simulation results on a set of eight benchmarks show the benefits of the proposed approach. In particular, a baseline system without TECs but with a fan could meet the thermal constraint for only three of the benchmarks whereas the OFTEC solution satisfied thermal constraints for all benchmarks. In addition, OFTEC resulted in 5.4% less average power consumption for the aforesaid three benchmarks while lowering the maximum die temperature by an average of 3.7°C.

热电冷却技术的进步使其成为超大规模集成电路中有前途的点冷却解决方案之一。热电冷却器(tec)在运行过程中产生热量。这些热量加上电路中产生的热量应传递到周围环境中，以避免模具温度过高。本文描述了一种混合冷却解决方案，其中tec增加了强制对流冷却器(风扇)。准确地说，提出了一种称为OFTEC的优化框架，该框架可以找到最佳的TEC驱动电流和风扇速度，以最大限度地降低冷却系统的总体功耗，同时保持安全的模具温度。在一组8个基准上的仿真结果显示了所提出方法的优点。特别是，没有tec但有风扇的基准系统只能满足三个基准的热约束，而OFTEC解决方案满足所有基准的热约束。此外，OFTEC使上述三个基准的平均功耗降低了5.4%，同时将最高模具温度平均降低了3.7°C。

{"title":"Power-aware deployment and control of forced-convection and thermoelectric coolers","authors":"M. Dousti, Massoud Pedram","doi":"10.1145/2593069.2593186","DOIUrl":"https://doi.org/10.1145/2593069.2593186","url":null,"abstract":"Advances in the thermoelectric cooling technology have made it one of the promising solutions for spot cooling in VLSI circuits. Thermoelectric coolers (TECs) generate heat during their operation. This heat plus the heat generated in the circuit should be transferred to the ambient environment in order to avoid high die temperatures. This paper describes a hybrid cooling solution in which TECs are augmented with forced-convection coolers (fans). Precisely, an optimization framework called OFTEC is presented which finds the optimum TEC driving current and the fan speed to minimize the overall power consumption of the cooling system while maintaining safe die temperatures. Simulation results on a set of eight benchmarks show the benefits of the proposed approach. In particular, a baseline system without TECs but with a fan could meet the thermal constraint for only three of the benchmarks whereas the OFTEC solution satisfied thermal constraints for all benchmarks. In addition, OFTEC resulted in 5.4% less average power consumption for the aforesaid three benchmarks while lowering the maximum die temperature by an average of 3.7°C.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"144 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121843198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀