2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献

英文中文

Local approximation improvement of trajectory piecewise linear macromodels through Chebyshev interpolating polynomials 用切比雪夫插值多项式改进轨迹分段线性宏模型的局部逼近

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509693

M. Farooq, L. Xia

We introduce the concept of two dimensional (2D) scalability of trajectory piecewise linear (TPWL) through the exploitation of Chebyshev interpolating polynomials in each piecewise region. The goal of 2D scalability is to improve the local approximation properties of TPWL macromodels. Horizontal scalability is achieved through the reduction of number of linearization points along the trajectory; vertical scalability is obtained by extending the scope of macromodel to predict the response of a nonlinear system for inputs far from training trajectory. In this way more efficient macromodels are obtained in terms of simulation speed up of complex nonlinear systems. The methodology developed is to predict the nonlinear responses generated by faults introduced in Micro Electro-Mechanical Systems (MEMS) accelerometer during fabrication, that are used to obtain the seismic images for oil and gas discovery. We provide the implementation details and illustrate the 2D scalability concept with an example using nonlinear transmission line.

通过利用切比雪夫插值多项式，引入了轨迹分段线性(TPWL)的二维可扩展性概念。二维可扩展性的目标是改善TPWL宏模型的局部逼近特性。通过减少沿轨迹的线性化点数量来实现水平可扩展性;通过扩展宏模型的范围来预测远离训练轨迹的非线性系统的响应，从而获得垂直可扩展性。这种方法在提高复杂非线性系统的仿真速度方面得到了更有效的宏观模型。所开发的方法是预测微机电系统(MEMS)加速度计在制造过程中引入的故障产生的非线性响应，用于获得石油和天然气发现的地震图像。我们提供了实现细节，并以非线性传输线为例说明了二维可扩展性的概念。

引用次数: 7

Support tools for porting legacy applications to multicore 支持将遗留应用程序移植到多核的工具

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509658

Yuri Ardila, Natsuki Kawai, Takashi Nakamura, Yosuke Tamura

This paper presents PEMAP, an automated performance estimation tool to project performance of hand-parallelized programs from sequential programs and BEMAP, a benchmark suite to measure an auto-parallelizer or even a machine's performance. BEMAP is an open-source project, and the documentations on code explanations and experimental results are also provided. Our experiments on PEMAP shows we can estimate performance of hand-parallelized programs in an error of 0.44% of sequential program's performance on average, while using BEMAP shows that the ability of an auto-parallelizer can be measured by comparing the compiled code to the handtuned parallelized OpenCL code, and therefore assisting the development of the auto-parallelizer tool.

本文介绍了PEMAP，一个自动性能评估工具，用于从顺序程序中预测手动并行程序的性能，BEMAP是一个基准套件，用于测量自动并行化甚至机器的性能。BEMAP是一个开源项目，并提供了有关代码解释和实验结果的文档。我们在PEMAP上的实验表明，我们可以估计手动并行化程序的性能，平均误差为顺序程序性能的0.44%，而使用BEMAP表明，自动并行化的能力可以通过将编译代码与手动调优的并行化OpenCL代码进行比较来衡量，从而帮助开发自动并行化工具。

引用次数: 7

A 6.72-Gb/s, 8pJ/bit/iteration WPAN LDPC decoder in 65nm CMOS 基于65nm CMOS的6.72 gb /s, 8pJ/bit/迭代WPAN LDPC解码器

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509569

Zhixiang Chen, Xiao Peng, Xiongxin Zhao, Leona Okamura, Dajiang Zhou, S. Goto

An LDPC decoder in 65nm CMOS targeting WPAN (IEEE 802.15.3c) is presented with measurement results. A modified-PCM based message permutation strategy with compatible data flow is proposed to solve the network problem raised by high parallelism LDPC decoding. Compared to the state-of-art, decoder chip achieves 17.7%, 33.5% and 49% improvements in chip density, gate count and energy efficiency, respectively.

提出了一种针对WPAN (IEEE 802.15.3c)的65nm CMOS LDPC解码器，并给出了测量结果。针对LDPC高并行解码带来的网络问题，提出了一种基于改进pcm的兼容数据流的消息排列策略。与目前的技术水平相比，译码芯片在芯片密度、栅极数和能效方面分别提高了17.7%、33.5%和49%。

引用次数: 3

Optimizing multi-level combinational circuits for generating random bits 优化生成随机位的多级组合电路

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509586

Chen Wang, Weikang Qian

Random bits are an important construct in many applications, such as hardware-based implementation of probabilistic algorithms and weighted random testing. One approach in generating random bits with required probabilities is to synthesize combinational circuits that transform a set of source probabilities into target probabilities. In [1], the authors proposed a greedy algorithm that synthesizes circuits in the form of a gate chain to approximate target probabilities. However, since this approach only considers circuits of such a special form, the resulting circuits are not satisfactory both in terms of the approximation error and the circuit depth. In this paper, we propose a new algorithm to synthesize combinational circuits for generating random bits. Compared to the previous one, our approach greatly enlarges the search space. Also, we apply a linear property of probabilistic logic computation and an iterative local search method to increase the efficiency of our algorithm. Experimental results comparing the approximation errors and the depths of the circuits synthesized by our method to those of the circuits synthesized by the previous approach demonstrate the superiority of our method.

随机比特在许多应用中都是一个重要的结构，例如基于硬件的概率算法实现和加权随机测试。生成具有所需概率的随机比特的一种方法是合成将一组源概率转换为目标概率的组合电路。在[1]中，作者提出了一种贪婪算法，该算法以门链的形式合成电路来近似目标概率。然而，由于这种方法只考虑这种特殊形式的电路，因此所得电路在近似误差和电路深度方面都不能令人满意。本文提出了一种合成随机位的组合电路的新算法。与之前的方法相比，我们的方法极大地扩大了搜索空间。同时，我们利用概率逻辑计算的线性特性和迭代局部搜索方法来提高算法的效率。实验结果表明，用本文方法合成的电路的近似误差和电路深度与传统方法合成的电路的近似误差和深度的比较表明了本文方法的优越性。

{"title":"Optimizing multi-level combinational circuits for generating random bits","authors":"Chen Wang, Weikang Qian","doi":"10.1109/ASPDAC.2013.6509586","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509586","url":null,"abstract":"Random bits are an important construct in many applications, such as hardware-based implementation of probabilistic algorithms and weighted random testing. One approach in generating random bits with required probabilities is to synthesize combinational circuits that transform a set of source probabilities into target probabilities. In [1], the authors proposed a greedy algorithm that synthesizes circuits in the form of a gate chain to approximate target probabilities. However, since this approach only considers circuits of such a special form, the resulting circuits are not satisfactory both in terms of the approximation error and the circuit depth. In this paper, we propose a new algorithm to synthesize combinational circuits for generating random bits. Compared to the previous one, our approach greatly enlarges the search space. Also, we apply a linear property of probabilistic logic computation and an iterative local search method to increase the efficiency of our algorithm. Experimental results comparing the approximation errors and the depths of the circuits synthesized by our method to those of the circuits synthesized by the previous approach demonstrate the superiority of our method.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115544541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Design of a clock jitter reduction circuit using gated phase blending between self-delayed clock edges 利用自延迟时钟边缘之间的门控相位混合设计时钟抖动减小电路

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509577

K. Niitsu, Naohiro Harigai, D. Hirabayashi, D. Oki, Masato Sakurai, O. Kobayashi, Takahiro J. Yamaguchi, Haruo Kobayashi

Design of a clock jitter reduction circuit that exploits the phase blending technique between the uncorrelated clock edges that are self-delayed by multiples of the clock cycle, nT is presented. By blending uncorrelated clock edges, the output clock edges approach the ideal timing and, thus, timing jitter can be reduced by a factor of √2 per stage. There are three technical challenges to realize this: 1) generating uncorrelated clock edges, 2) phase averaging with small time offset from the ideal center position, and 3) minimizing the error in nT-delay being deviated from ideal nT. The proposed circuit overcomes each of these by exploiting an nT-delay, gated phase blending, and self-calibrated nT-delay elements, respectively. Measurement results with a 180-nm CMOS prototype chip demonstrated an approximately four-fold reduction in timing jitter from 30.2 ps to 8.8 ps in 500-MHz clock by cascading the proposed circuit with four-stages.

提出了一种时钟抖动减小电路的设计，该电路利用不相关时钟边缘之间的相位混合技术，该技术具有多个时钟周期nT的自延迟。通过混合不相关的时钟边，输出时钟边接近理想定时，因此，每级的定时抖动可以减少√2。实现这一目标有三个技术挑战:1)产生不相关的时钟边缘，2)与理想中心位置的时间偏移较小的相位平均，以及3)最大限度地减少nT-delay偏离理想nT的误差。所提出的电路分别通过利用nT-delay，门控相位混合和自校准nT-delay元件来克服这些问题。180nm CMOS原型芯片的测量结果表明，通过四级级联电路，在500-MHz时钟下，时序抖动从30.2 ps减少到8.8 ps，减少了大约四倍。

引用次数: 3

An adaptive filtering mechanism for energy efficient data prefetching 一种节能数据预取的自适应滤波机制

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509617

Xianglei Dang, Xiaoyin Wang, Dong Tong, Zichao Xie, Lingda Li, Keyi Wang

As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for improving the energy efficiency. In this paper, we propose an adaptive prefetch filtering (APF) mechanism to reduce the wasted bandwidth and energy as well as the cache pollution caused by useless prefetches. APF records the prefetch-victim address pairs of issued prefetches and collects information about which address in each pair is first accessed by the processor to guide the filtering of new generated useless prefetches. Meanwhile, filtered prefetches are recorded for building the feedback mechanism to avoid filtering useful prefetches. Experimental results demonstrate that APF reduces useless prefetches by an average of 53.81% with a mere 5.28% reduction of useful prefetches, thus reducing the memory access bandwidth consumption by 59.92% and the L2 cache energy by 6.19%. APF also improves the performance of several programs by reducing the cache pollution incurred by useless prefetches, thus gaining an average performance improvement of 2.12%.

数据预取应用于嵌入式处理器中，减少数据预取的能量浪费是提高处理器能效的关键。本文提出了一种自适应预取滤波(APF)机制，以减少无用预取造成的带宽和能量浪费以及缓存污染。APF记录发出预取的预取受害者地址对，并收集每个地址对中哪个地址首先被处理器访问的信息，以指导过滤新生成的无用预取。同时，记录过滤后的预取，建立反馈机制，避免过滤有用的预取。实验结果表明，APF平均减少了53.81%的无用预取，减少了5.28%的有用预取，从而减少了59.92%的内存访问带宽消耗和6.19%的L2缓存能量。APF还通过减少无用的预取带来的缓存污染来提高几个程序的性能，从而获得2.12%的平均性能提升。

引用次数: 3

A binding algorithm in high-level synthesis for path delay testability 路径延迟可测性高级综合中的绑定算法

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509653

Yuki Yoshikawa

A binding method in high-level synthesis for path delay testability is proposed in this paper. For a given scheduled data flow graph, the proposed method synthesizes a path delay testable RTL datapath and its controller. Every path in the datapath is two pattern testable with the controller if the path is activated in the functional operation, i.e., the path is not false path. Our experimental results show that the proposed method can synthesize such RTL circuits with small area overhead compared with that augmented by some DFT techniques such as scan design.

本文提出了一种高阶综合中路径延迟可测性的绑定方法。对于给定的调度数据流图，该方法综合了一个路径延迟可测试的RTL数据路径及其控制器。如果路径在功能操作中被激活，则数据路径中的每个路径都可以用控制器进行两种模式测试，即该路径不是假路径。实验结果表明，与采用扫描设计等DFT技术增强的RTL电路相比，该方法能够以较小的面积开销合成RTL电路。

引用次数: 1

Simplification of C-RTL equivalent checking for fused multiply add unit using intermediate models 用中间模型简化融合乘加单元的C-RTL等效校核

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509686

Bin Xue, Prosenjit Chatterjee, S. Shukla

The functionality of Fused multiply add (FMA) design can be formally verified by comparing its register transition level (RTL) implementation against its system level specification often modeled by C/C++ language using sequential equivalent checking (SEC). However, C-RTL SEC does not scale for FMA because of the huge discrepancy existed between the two models. This paper analyzes the dissimilarities and proposes two intermediate models, one abstract RTL and one rewritten C model to bridge the gap. The original SEC proof are partitioned into three sub-proofs among intermediate models where a variety of simplification techniques are applied to further reduce the complexity. Experiments from an industry project show that with the two intermediate models, the SEC proof is complete and scalable for FMA design.

融合乘加(FMA)设计的功能可以通过比较其寄存器转换级(RTL)实现与系统级规范(通常由C/ c++语言使用顺序等效检查(SEC)建模)进行形式化验证。然而，由于两种模型之间存在巨大的差异，C-RTL SEC不适合FMA。本文分析了两者之间的差异，提出了两种中间模型，一种是抽象的RTL模型，另一种是重写的C模型。原始证交会证明在中间模型中被划分为三个子证明，并应用了各种简化技术来进一步降低复杂性。一个工业项目的实验表明，使用这两个中间模型，SEC证明是完整的，并且可扩展用于FMA设计。

引用次数: 6

Improving the mapping of reversible circuits to quantum circuits using multiple target lines 利用多目标线改进可逆电路到量子电路的映射

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509587

R. Wille, Mathias Soeken, Christian Otterstedt, R. Drechsler

The efficient synthesis of quantum circuits is an active research area. Since many of the known quantum algorithms include a large Boolean component (e.g. the database in the Grover search algorithm), quantum circuits are commonly synthesized in a two-stage approach. First, the desired function is realized as a reversible circuit making use of existing synthesis methods for this domain. Afterwards, each reversible gate is mapped to a functionally equivalent quantum gate cascade. In this paper, we propose an improved mapping of reversible circuits to quantum circuits which exploits a certain structure of many reversible circuits. In fact, it can be observed that reversible circuits are often composed of similar gates which only differ in the position of their target lines. We introduce an extension of reversible gates which allow multiple target lines in a single gate. This enables a significantly cheaper mapping to quantum circuits. Experiments show that considering multiple target lines leads to improvements of up to 85% in the resulting quantum cost.

量子电路的高效合成是一个活跃的研究领域。由于许多已知的量子算法都包含一个大的布尔分量(例如，Grover搜索算法中的数据库)，量子电路通常以两阶段的方法合成。首先，利用该领域现有的合成方法，以可逆电路的形式实现所需的功能。然后，每个可逆门被映射到一个功能等效的量子门级联。在本文中，我们提出了一种改进的可逆电路到量子电路的映射，它利用了许多可逆电路的特定结构。事实上，可以观察到，可逆电路通常由相似的门组成，它们只是在目标线的位置上不同。我们引入了一种可逆门的扩展，它允许在一个门中有多个目标线。这使得映射到量子电路的成本大大降低。实验表明，考虑多个目标线可使量子成本提高高达85%。

引用次数: 55

A dynamic stream link for efficient data flow control in NoC based heterogeneous MPSoC 基于NoC的异构MPSoC中高效数据流控制的动态流链路

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509556

C. Helmstetter, Sylvain Basset, R. Lemaire, F. Clermidy, P. Vivet, M. Langevin, Chuck Pilkington, P. Paulin, D. Fuin

As Systems-on-Chip size increase, the communication costs become critical and Networks-on-Chip (NoC) bring innovative solutions. Efficient stream-based protocols over NoC have been widely studied to address dataflow communications. They are usually controlled by a set of static parameters. However, new applications, such as high-resolution video decoders, present more data-dependent behaviors forcing communication protocols to support higher dynamicity. For this purpose, we present in this paper dynamic stream links for stream-based end-to-end NoC communications by introducing two link protocols, both independent of the transfer size, allowing to improve the hardware/software control flexibility. The proposed protocols have been modeled in a MPSoC virtual platform and the hardware cost evaluated. Based on simulations, we provide guidelines to exploit these protocols according to application needs.

随着片上系统尺寸的增加，通信成本变得至关重要，而片上网络(NoC)带来了创新的解决方案。基于NoC的高效流协议已被广泛研究以解决数据流通信问题。它们通常由一组静态参数控制。然而，新的应用，如高分辨率视频解码器，呈现出更多的数据依赖行为，迫使通信协议支持更高的动态性。为此，我们在本文中通过引入两个独立于传输大小的链路协议，提出了基于流的端到端NoC通信的动态流链路，从而提高了硬件/软件控制的灵活性。在MPSoC虚拟平台上对所提出的协议进行了建模，并对硬件成本进行了评估。在仿真的基础上，我们提供了根据应用需要利用这些协议的指南。

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀