2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献

英文中文

An ultra-compact virtual source FET model for deeply-scaled devices: Parameter extraction and validation for standard cell libraries and digital circuits 用于深度缩放器件的超紧凑虚拟源场效应管模型:标准单元库和数字电路的参数提取和验证

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509649

Li Yu, O. Mysore, Lan Wei, L. Daniel, D. Antoniadis, I. Elfadel, D. Boning

In this paper, we present the first validation of the virtual source (VS) charge-based compact model for standard cell libraries and large-scale digital circuits. With only a modest number of physically meaningful parameters, the VS model accounts for the main short-channel effects in nanometer technologies. Using a novel DC and transient parameter extraction methodology, the model is verified with simulated data from a well-characterized, industrial 40-nm bulk silicon model. The VS model is used to fully characterize a standard cell library with timing comparisons showing less than 2.7% error with respect to the industrial design kit. Furthermore, a 1001-stage inverter chain and a 32-bit ripple-carry adder are employed as test cases in a vendor CAD environment to validate the use of the VS model for large-scale digital circuit applications. Parametric Vdd sweeps show that the VS model is also ready for usage in low-power design methodologies. Finally, runtime comparisons have shown that the use of the VS model results in a speedup of about 7.6×.

在本文中，我们首次验证了标准单元库和大规模数字电路中基于虚拟源(VS)电荷的紧凑模型。只有少量的物理上有意义的参数，VS模型解释了纳米技术中的主要短通道效应。利用一种新颖的直流和瞬态参数提取方法，用一个特性良好的工业40纳米体硅模型的模拟数据验证了该模型。VS模型用于充分表征标准细胞库，其时序比较显示相对于工业设计套件误差小于2.7%。此外，在供应商CAD环境中，采用1001级逆变器链和32位纹波进位加法器作为测试用例，以验证VS模型在大规模数字电路应用中的使用。参数化Vdd扫描表明，VS模型也可以用于低功耗设计方法。最后，运行时比较表明，使用VS模型的结果是大约7.6倍的加速。

{"title":"An ultra-compact virtual source FET model for deeply-scaled devices: Parameter extraction and validation for standard cell libraries and digital circuits","authors":"Li Yu, O. Mysore, Lan Wei, L. Daniel, D. Antoniadis, I. Elfadel, D. Boning","doi":"10.1109/ASPDAC.2013.6509649","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509649","url":null,"abstract":"In this paper, we present the first validation of the virtual source (VS) charge-based compact model for standard cell libraries and large-scale digital circuits. With only a modest number of physically meaningful parameters, the VS model accounts for the main short-channel effects in nanometer technologies. Using a novel DC and transient parameter extraction methodology, the model is verified with simulated data from a well-characterized, industrial 40-nm bulk silicon model. The VS model is used to fully characterize a standard cell library with timing comparisons showing less than 2.7% error with respect to the industrial design kit. Furthermore, a 1001-stage inverter chain and a 32-bit ripple-carry adder are employed as test cases in a vendor CAD environment to validate the use of the VS model for large-scale digital circuit applications. Parametric Vdd sweeps show that the VS model is also ready for usage in low-power design methodologies. Finally, runtime comparisons have shown that the use of the VS model results in a speedup of about 7.6×.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133258997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Compiler-assisted refresh minimization for volatile STT-RAM cache 易失性STT-RAM缓存的编译器辅助刷新最小化

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509608

Qing'an Li, Jianhua Li, Liang Shi, C. Xue, Yiran Chen, Yanxiang He

Spin-Transfer Torque RAM (STT-RAM) has been proposed to build on-chip caches because of its attractive features: high storage density and negligible leakage power. Recently, researchers propose to improve the write performance of STT-RAM by relaxing its non-volatility property. To avoid data loss resulting from volatility, refresh schemes are proposed. However, refresh operations consume additional energy. In this paper, we propose to reduce the number of refresh operations through re-arranging program data layout at compilation time. An N-refresh scheme is also proposed. Experimental results show that, on average, the proposedmethods can reduce the number of refresh operations by 73.3%, and reduce the dynamic energy consumption by 27.6%.

自旋转移扭矩RAM (STT-RAM)由于其高存储密度和可忽略泄漏功率的特点而被提出用于构建片上高速缓存。最近，研究人员提出通过放宽STT-RAM的非易失性来提高其写入性能。为了避免数据波动带来的数据丢失，提出了数据刷新方案。但是，刷新操作会消耗额外的能量。在本文中，我们建议通过在编译时重新安排程序数据布局来减少刷新操作的次数。同时提出了一种n -刷新方案。实验结果表明，平均而言，所提出的方法可以减少73.3%的刷新操作次数，减少27.6%的动态能耗。

引用次数: 29

Performance bound and yield analysis for analog circuits under process variations 工艺变化下模拟电路的性能界限和良率分析

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509692

Xuexin Liu, A. A. Palma-Rodriguez, S. Rodriguez-Chavez, S. Tan, E. Tlelo-Cuautle, Yici Cai

Yield estimation for analog integrated circuits are crucial for analog circuit design and optimization in the presence of process variations. In this paper, we present a novel analog yield estimation method based on performance bound analysis technique in frequency domain. The new method first derives the transfer functions of linear (or linearized) analog circuits via a graph-based symbolic analysis method. Then frequency response bounds of the transfer functions in terms of magnitude and phase are obtained by a nonlinear constrained optimization technique. To predict yield rate, bound information are employed to calculate Gaussian distribution functions. Experimental results show that the new method can achieve similar accuracy while delivers 20 times speedup over Monte Carlo simulation of HSPICE on some typical analog circuits.

模拟集成电路的良率估算对于存在工艺变化的模拟电路设计和优化至关重要。本文提出了一种基于频域性能界分析技术的模拟产率估计方法。该方法首先通过基于图的符号分析方法推导出线性(或线性化)模拟电路的传递函数。然后利用非线性约束优化技术得到了传递函数在幅值和相位方面的频响边界。为了预测产出率，采用定界信息计算高斯分布函数。实验结果表明，该方法在达到相同精度的同时，在一些典型模拟电路上的速度提高了20倍。

引用次数: 11

MD: Minimal path-based fault-tolerant routing in on-Chip Networks 片上网络中基于最小路径的容错路由

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509555

M. Ebrahimi, M. Daneshtalab, J. Plosila, Farhad Mehdipour

The communication requirements of many-core embedded systems are convened by the emerging Network-on-Chip (NoC) paradigm. As on-chip communication reliability is a crucial factor in many-core systems, the NoC paradigm should address the reliability issues. Using fault-tolerant routing algorithms to reroute packets around faulty regions will increase the packet latency and create congestion around the faulty region. On the other hand, the performance of NoC is highly affected by the network congestion. Congestion in the network can increase the delay of packets to route from a source to a destination, so it should be avoided. In this paper, a minimal and defect-resilient (MD) routing algorithm is proposed in order to route packets adaptively through the shortest paths in the presence of a faulty link, as long as a path exists. To avoid congestion, output channels can be adaptively chosen whenever the distance from the current to destination node is greater than one hop along both directions. In addition, an analytical model is presented to evaluate MD for two-faulty cases.

新兴的片上网络(NoC)模式满足了多核嵌入式系统的通信需求。由于片上通信可靠性是多核系统的关键因素，因此NoC范式应该解决可靠性问题。使用容错路由算法在故障区域周围重新路由数据包会增加数据包的延迟，并在故障区域周围造成拥塞。另一方面，NoC的性能受网络拥塞的影响很大。网络中的拥塞会增加数据包从源路由到目的路由的延迟，因此应该避免拥塞。本文提出了一种最小和缺陷弹性(MD)路由算法，以便在存在故障链路的情况下，只要存在路径，就能自适应地通过最短路径路由数据包。为了避免拥塞，只要从当前节点到目标节点的距离在两个方向上都大于一跳，就可以自适应地选择输出通道。此外，还提出了一种分析模型来评估双故障情况下的MD。

引用次数: 56

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509639

Yun Liang, Zheng Cui, K. Rupnow, Deming Chen

GPUs are an increasingly popular implementation platform for a variety of general purpose applications from mobile and embedded devices to high performance computing. The CUDA and OpenCL parallel programming models enable easy utilization of the GPU's resources. However, tuning GPU applications' performance is a complex and labor intensive task. Software programmers employ a variety of optimization techniques to explore tradeoffs between the thread parallelism and performance of a single thread. However, prior techniques ignore register allocation, a significant factor in single thread performance and, indirectly affects the number of simultaneously active threads. In this paper, we show that joint optimization of register allocation and thread structure has great potential to significantly improve performance. However, the design space for this joint optimization can be large; therefore, we develop performance metrics appropriate for evaluation within a compiler's inner loop and efficient design space exploration techniques that use the metrics to narrow the search space. Across a range of GPU applications, we achieve average performance speedup of 1.33X (up to 1.73X) with design space exploration 355X faster than the exhaustive search.

gpu是一个越来越流行的实现平台，用于从移动和嵌入式设备到高性能计算的各种通用应用。CUDA和OpenCL并行编程模型可以轻松利用GPU的资源。然而，调优GPU应用程序的性能是一项复杂且劳动密集型的任务。软件程序员使用各种优化技术来探索线程并行性和单个线程的性能之间的权衡。然而，先前的技术忽略了寄存器分配，这是单线程性能的一个重要因素，并间接影响同时活动线程的数量。在本文中，我们证明了寄存器分配和线程结构的联合优化具有显著提高性能的巨大潜力。然而，这种关节优化的设计空间可能很大;因此，我们开发了适合于在编译器内部循环中进行评估的性能指标，以及使用这些指标来缩小搜索空间的有效设计空间探索技术。在一系列GPU应用中，我们实现了1.33倍(最高1.73倍)的平均性能加速，设计空间探索比穷极搜索快355X。

{"title":"Register and thread structure optimization for GPUs","authors":"Yun Liang, Zheng Cui, K. Rupnow, Deming Chen","doi":"10.1109/ASPDAC.2013.6509639","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509639","url":null,"abstract":"GPUs are an increasingly popular implementation platform for a variety of general purpose applications from mobile and embedded devices to high performance computing. The CUDA and OpenCL parallel programming models enable easy utilization of the GPU's resources. However, tuning GPU applications' performance is a complex and labor intensive task. Software programmers employ a variety of optimization techniques to explore tradeoffs between the thread parallelism and performance of a single thread. However, prior techniques ignore register allocation, a significant factor in single thread performance and, indirectly affects the number of simultaneously active threads. In this paper, we show that joint optimization of register allocation and thread structure has great potential to significantly improve performance. However, the design space for this joint optimization can be large; therefore, we develop performance metrics appropriate for evaluation within a compiler's inner loop and efficient design space exploration techniques that use the metrics to narrow the search space. Across a range of GPU applications, we achieve average performance speedup of 1.33X (up to 1.73X) with design space exploration 355X faster than the exhaustive search.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134128336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A computational model for SAT-based verification of hardware-dependent low-level embedded system software 基于sat的硬件相关底层嵌入式系统软件验证计算模型

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509684

Bernard Schmidt, Carlos Villarraga, J. Bormann, D. Stoffel, Markus Wedler, W. Kunz

This paper describes a method to generate a computational model for formal verification of hardware-dependent software in embedded systems. The computational model of the combined HW/SW system is a program netlist (PN) consisting of instruction cells connected in a directed acyclic graph that compactly represents all execution paths of the software. The model can be easily integrated into SAT-based verification environments such as those based on Bounded Model Checking (BMC). The proposed construction of the model, however, allows for an efficient reasoning of the SAT solver over entire execution paths. We demonstrate the efficiency of our approach by presenting experimental results from the formal verification of an industrial LIN (Local Interconnect Network) bus node, implemented as a software driver on a 32-bit RISC machine.

本文描述了一种生成嵌入式系统中硬件相关软件形式化验证计算模型的方法。硬件/软件组合系统的计算模型是一个由指令单元组成的程序网表(PN)，这些指令单元连接在一个有向无环图中，该图紧凑地表示软件的所有执行路径。该模型可以很容易地集成到基于sat的验证环境中，例如基于有界模型检查(BMC)的验证环境。然而，该模型的建议构造允许在整个执行路径上对SAT求解器进行有效的推理。我们通过在32位RISC机器上作为软件驱动程序实现的工业LIN(本地互连网络)总线节点的正式验证的实验结果来证明我们方法的有效性。

引用次数: 8

Thermal simulator of 3D-IC with modeling of anisotropic TSV conductance and microchannel entrance effects 三维集成电路热模拟器的各向异性TSV电导和微通道入口效应建模

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509643

H. Qian, Hao Liang, Chip-Hong Chang, Wei Zhang, Hao Yu

This paper presents a fast and accurate steady state thermal simulator for heatsink and microfluid-cooled 3D-ICs. This model considers the thermal effect of TSVs at fine-granularity by calculating the anisotropic equivalent thermal conductances of a solid grid cell if TSVs are inserted. Entrance effect of microchannels is also investigated for accurate modeling of microfluidic cooling. The proposed thermal simulator is verified against commercial multiphysics solver COMSOL and compared with Hotspot and 3D-ICE. Simulation results shows that for heatsink cooling, the proposed simulator is as accurate as Hotspot but runs much faster at moderate granularity. For microfluidic cooling, our proposed simulator is much more accurate than 3D-ICE in its estimation of steady state temperature and thermal distribution.

本文介绍了一种用于热沉和微流体冷却3d集成电路的快速、准确的稳态热模拟器。该模型通过计算插入tsv时固体网格单元的各向异性等效热导，在细粒度上考虑了tsv的热效应。为了精确模拟微流控冷却，还研究了微通道的入口效应。利用商用多物理场求解器COMSOL对该热模拟器进行了验证，并与Hotspot和3D-ICE进行了比较。仿真结果表明，对于散热器冷却，所提出的模拟器与Hotspot一样准确，但在中等粒度下运行速度要快得多。对于微流体冷却，我们所提出的模拟器在稳态温度和热分布的估计方面比3D-ICE精确得多。

引用次数: 24

SMYLE Project: Toward high-performance, low-power computing on manycore-processor SoCs SMYLE项目:在多核处理器soc上实现高性能、低功耗计算

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509655

Koji Inoue

This paper introduces a manycore research project called SMYLE (Scalable ManYcore for Low Energy computing). The aims of this project are: 1) proposing a manycore SoC architecture and developing a suitable programming and execution environment, 2) designing a domain specific manycore system for emerging video mining applications, and 3) releasing developed software tools and FPGA emulation environments to accelerate manycore research and development in the community. The project started in December 2010 with full support from the New Energy and Industrial Technology Development Organization (NEDO).

本文介绍了一个名为SMYLE (Scalable many - core for Low Energy computing)的多核研究项目。该项目的目标是:1)提出一个多核SoC架构并开发一个合适的编程和执行环境;2)为新兴的视频挖掘应用设计一个特定领域的多核系统;3)发布开发的软件工具和FPGA仿真环境，以加速社区的多核研究和开发。该项目于2010年12月在新能源和工业技术发展组织(NEDO)的全力支持下启动。

引用次数: 2

Over 10-times high-speed, energy efficient 3D TSV-integrated hybrid ReRAM/MLC NAND SSD by intelligent data fragmentation suppression 超过10倍高速，节能3D tsv集成混合ReRAM/MLC NAND SSD智能数据碎片抑制

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509566

Chao Sun, Hiroki Fujii, K. Miyaji, K. Johguchi, K. Higuchi, K. Takeuchi

A 3D through-silicon-via (TSV)-integrated hybrid ReRAM/multi-level-cell (MLC) NAND solid-state drive's (SSD's) architecture is proposed with NAND-like interface (I/F) and sector-access overwrite policy for ReRAM. Furthermore, intelligent data management algorithms are proposed to suppress data fragmentation and excess usage of MLC NAND. As a result, 11-times performance increase, 6.9-times endurance enhancement and 93% write energy reduction are achieved. Both ReRAM write and read latency should be less than 3 μs to obtain these improvements. The required endurance for ReRAM is 105.

提出了一种三维通硅通孔(TSV)集成的混合ReRAM/multi-level cell (MLC) NAND固态硬盘(SSD)架构，该架构具有类似NAND的接口(I/F)和ReRAM的扇区访问覆盖策略。此外，提出了智能数据管理算法来抑制数据碎片和MLC NAND的过度使用。因此，性能提高了11倍，续航能力提高了6.9倍，写入能量降低了93%。要获得这些改进，ReRAM的写和读延迟都应该小于3 μs。ReRAM所需的续航时间是105。

引用次数: 9

DARNS:A randomized multi-modulo RNS architecture for double-and-add in ECC to prevent power analysis side channel attacks DARNS:一种随机多模RNS架构，用于ECC中的双加和加，以防止功率分析侧信道攻击

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509667

Jude Angelo Ambrose, H. Pettenghi, L. Sousa

Security in embedded systems is of critical importance since most of our secure transactions are currently made via credit cards or mobile phones. Power analysis based side channel attacks have been proved as the most successful attacks on embedded systems to retrieve secret keys, allowing impersonation and theft. State-of-the-art solutions for such attacks in Elliptic Curve Cryptography (ECC), mostly in software, hinder performance and repeatedly attacked using improved techniques. To protect the ECC from both simple power analysis and differential power analysis, as a hardware solution, we propose to take advantage of the inherent parallelization capability in Multi-modulo Residue Number Systems (RNS) architectures to obfuscate the secure information. Random selection of moduli is proposed to randomly choose the moduli sets for each key bit operation. This solution allows us to prevent power analysis, while still providing all the benefits of RNS. In this paper, we show that Differential Power Analysis is thwarted, as well as correlation analysis.

嵌入式系统的安全性至关重要，因为我们目前大多数安全交易都是通过信用卡或手机进行的。基于功率分析的侧信道攻击已被证明是对嵌入式系统检索密钥的最成功的攻击，允许冒充和盗窃。针对椭圆曲线加密(ECC)中此类攻击的最新解决方案，主要是在软件中，会阻碍性能并使用改进的技术重复攻击。为了保护ECC免受简单功耗分析和差分功耗分析的影响，作为一种硬件解决方案，我们建议利用多模剩余数系统(RNS)架构固有的并行化能力来混淆安全信息。提出了随机选择模的方法，随机选择每个密钥位操作的模集。该解决方案允许我们避免功率分析，同时仍然提供RNS的所有优点。在本文中，我们展示了差分功率分析的挫败，以及相关分析。

引用次数: 7

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀