2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献

英文中文

A full 4-channel 60 GHz direct-conversion transceiver 一个完整的4通道60 GHz直接转换收发器

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509573

Seitaro Kawai, R. Minami, Ahmed Musa, Takahiro Sato, Ning Li, Tatsuya Yamaguchi, Y. Takeuchi, Yuki Tsukui, K. Okada, A. Matsuzawa

This paper presents a 60-GHz direct-conversion transceiver in 65 nm CMOS technology. By the proposed gain peaking technique, this transceiver realizes good gain flatness and is capable of more than 7 Gbps in 16QAM wireless communication for all channels of IEEE802.11ad standard within EVM of around -23 dB. The transceiver consumes 319mWin transmitting and 223mW in receiving, including the PLL consumption.

本文提出了一种采用65纳米CMOS技术的60 ghz直接转换收发器。通过提出的增益峰值技术，该收发器实现了良好的增益平坦性，在EVM约为-23 dB的IEEE802.11ad标准的所有信道中，在16QAM无线通信中能够实现超过7 Gbps的增益平坦性。收发器的发射功耗为319mWin，接收功耗为223mW，其中包括锁相环功耗。

引用次数: 14

BAMSE: A balanced mapping space exploration algorithm for GALS-based manycore platforms BAMSE:一种基于gis的多核平台平衡映射空间探索算法

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509642

Mohammad H. Foroozannejad, Brent Bohnenstiehl, S. Ghiasi

We study the problem of mapping concurrent tasks of an application modeled as a data flow graph onto processors of a GALS-based manycore platform. We propose a mapping algorithm called BAMSE, which exploits the characteristics of streaming applications and the specifications of the target architecture to optimize the mapping solution. Different configuration parameters embedded into the algorithm enable one to strike a balance between scalability of the approach and the quality of generated solutions. Experiments with several real life applications show that our algorithm outperforms hand-optimized manual mappings up to 65% in terms of longest inter-processor communication link, and as high as 19% with respect to total length of the links, when the two criteria are used as primary and secondary optimization objectives, respectively. Additionally, our algorithm delivers superior mappings compared to ILP generated solutions after 10 days of solver runtime.

研究了将应用程序的并发任务以数据流图的形式映射到基于gals的多核平台处理器上的问题。我们提出了一种称为BAMSE的映射算法，该算法利用流应用程序的特征和目标体系结构的规范来优化映射解决方案。嵌入到算法中的不同配置参数使人们能够在方法的可伸缩性和生成的解决方案的质量之间取得平衡。几个实际应用的实验表明，当这两个标准分别用作主要和次要优化目标时，我们的算法在最长处理器间通信链路方面优于手动优化的手动映射高达65%，在总链路长度方面优于手动映射高达19%。此外，与ILP生成的解决方案相比，我们的算法在求解器运行10天后提供了更好的映射。

引用次数: 1

A physical unclonable function chip exploiting load transistors' variation in SRAM bitcells 一种利用SRAM位元中负载晶体管变化的物理不可克隆功能芯片

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509565

S. Okumura, S. Yoshimoto, H. Kawaguchi, M. Yoshimoto

We propose a chip identification (ID) generating scheme with random variation of transistor characteristics in SRAM bitcells. In the proposed scheme, a unique fingerprint is generated by grounding both bitlines. It has high speed, and it can be implemented in a very small area overhead. We fabricated test chips in a 65-nm process and obtained 12,288 sets of unique 128-bit fingerprints, which are evaluated in this paper. The failure rate of the IDs is found to be 2.1 × 10-12.

我们提出了一种在SRAM位元中随机变化晶体管特性的芯片识别(ID)生成方案。在该方案中，通过将两个位线接地来生成唯一指纹。它具有很高的速度，并且可以在非常小的面积开销中实现。我们在65纳米工艺下制作了测试芯片，获得了12288组唯一的128位指纹，并在本文中对其进行了评估。IDs的故障率为2.1 × 10-12。

引用次数: 3

Symmetrical buffered clock-tree synthesis with supply-voltage alignment 具有电源电压对准的对称缓冲时钟树合成

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509637

Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao

For high-performance synchronous systems, non-uniform/non-ideal supply voltages of buffers (e.g., due to IRdrop) may incur a large clock skew and thus serious performance degradation. This paper addresses this problem and presents the first symmetrical buffered clock-tree synthesis flow that considers supply voltage differences of buffers. We employ a two-phase technique of bottom-up clock sink clustering to determine the tree topology, followed by top-down buffer placement and wire routing to complete the clock tree. At each level of processing, clock skew and wirelength are minimized by the determination of buffer embedding regions and the alignment of buffer supply voltages. Experimental results show that our method can reach, on average, respective 76% and 40% clock skew reduction compared to the state-of-the-art work (1) without supply voltage consideration and (2) with an extension for supply voltages based on our top-down flow. The reduction is achieved by marginal resource and runtime overheads. Note that our method can meet the stringent skew constraint set by the 2010 ISPD contest for all cases, while other counterparts cannot. In particular, our work provides a key insight into the importance of handling practical design issues (such as IR-drop) for real-world clock-tree synthesis.

对于高性能同步系统，缓冲器的非均匀/非理想电源电压(例如，由于IRdrop)可能导致较大的时钟倾斜，从而导致严重的性能下降。本文针对这一问题，提出了第一个考虑缓冲器供电电压差异的对称缓冲时钟树合成流程。我们采用自底向上时钟汇聚聚类的两阶段技术来确定树的拓扑结构，然后采用自顶向下的缓冲区放置和导线路由来完成时钟树。在每一级处理中，通过确定缓冲嵌入区域和校准缓冲电源电压来最小化时钟偏差和波长。实验结果表明，与最先进的工作(1)不考虑电源电压和(2)基于自顶向下流程扩展电源电压相比，我们的方法平均可以分别减少76%和40%的时钟偏差。减少是通过边际资源和运行时开销来实现的。请注意，我们的方法可以满足2010 ISPD竞赛对所有情况设置的严格的倾斜约束，而其他同行则不能。特别是，我们的工作提供了处理实际设计问题(如IR-drop)对于现实世界时钟树合成的重要性的关键见解。

{"title":"Symmetrical buffered clock-tree synthesis with supply-voltage alignment","authors":"Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao","doi":"10.1109/ASPDAC.2013.6509637","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509637","url":null,"abstract":"For high-performance synchronous systems, non-uniform/non-ideal supply voltages of buffers (e.g., due to IRdrop) may incur a large clock skew and thus serious performance degradation. This paper addresses this problem and presents the first symmetrical buffered clock-tree synthesis flow that considers supply voltage differences of buffers. We employ a two-phase technique of bottom-up clock sink clustering to determine the tree topology, followed by top-down buffer placement and wire routing to complete the clock tree. At each level of processing, clock skew and wirelength are minimized by the determination of buffer embedding regions and the alignment of buffer supply voltages. Experimental results show that our method can reach, on average, respective 76% and 40% clock skew reduction compared to the state-of-the-art work (1) without supply voltage consideration and (2) with an extension for supply voltages based on our top-down flow. The reduction is achieved by marginal resource and runtime overheads. Note that our method can meet the stringent skew constraint set by the 2010 ISPD contest for all cases, while other counterparts cannot. In particular, our work provides a key insight into the importance of handling practical design issues (such as IR-drop) for real-world clock-tree synthesis.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129456706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Dependable VLSI Platform using Robust Fabrics 采用稳健结构的可靠VLSI平台

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509583

H. Onodera

Technology scaling and growing complexity have an increasing impact on the resilience of VLSI circuits and systems. Severe challenges have been emerging for the realization of dependable VLSI circuits and systems with necessary and sufficient amount of reliability and security. For coping with the increasing threats on manufacturability, variability, and transient (soft) errors, we have been working on the development of “Dependable VLSI Platform using Robust Fabrics.” The project tackles the challenges with collaborative researches on layout, circuit, architecture, and design automation. Overview of the project as well as key achievements on the component-level (Fabrics) and the architecture-level (reconfigurable architecture) will be explained, followed by a brief introduction of the platform SoC and its C-based design tools.

技术规模和日益增长的复杂性对VLSI电路和系统的弹性产生了越来越大的影响。为了实现可靠的超大规模集成电路和系统，并提供必要和足够的可靠性和安全性，已经出现了严峻的挑战。为了应对日益增长的可制造性、可变性和瞬态(软)错误威胁，我们一直致力于“使用稳健结构的可靠VLSI平台”的开发。该项目通过在布局、电路、架构和设计自动化方面的合作研究来解决挑战。本文将介绍该项目的概述以及组件级(fabric)和架构级(可重构架构)的主要成果，然后简要介绍平台SoC及其基于c的设计工具。

引用次数: 0

Pulsed-latch ASIC synthesis in industrial design flow 工业设计流程中的脉冲锁存器ASIC合成

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509621

Sangmin Kim, Duckhwan Kim, Youngsoo Shin

Flip-flop has long been used as a sequencing element of choice in ASIC design; commercial synthesis tools have also been developed in this context. This work has been motivated by a question of whether existing CAD tools can be employed from RTL to layout while pulsed latch replaces flip-flop as a sequencing element. Two important problems have been identified and their solutions are proposed: placement of pulse generators and latches for integrity of pulse shape, and design of special scan latches and their selective use to reduce hold violations. A reference design flow has also been set up using published documents, in order to assess the proposed one. In 40-nm technology, the proposed flow achieves 20% reduction in circuit area and 30% reduction in power consumption, on average of 12 test circuits.

长期以来，触发器一直被用作ASIC设计中选择的排序元件;商业合成工具也在这方面得到了发展。这项工作的动机是，当脉冲锁存器取代触发器作为测序元件时，现有的CAD工具是否可以从RTL到布局。确定了两个重要问题并提出了解决方案:放置脉冲发生器和锁存器以保持脉冲形状的完整性，设计特殊的扫描锁存器并选择性地使用它们以减少保持违规。为了评估建议的设计流程，还使用已发表的文件建立了参考设计流程。在40纳米技术中，所提出的流程实现了电路面积减少20%，功耗降低30%，平均12个测试电路。

引用次数: 0

Optimizing routability in large-scale mixed-size placement 大规模混合尺寸布局的可达性优化

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509636

J. Cong, Guojie Luo, Kalliopi Tsota, Bingjun Xiao

One of the necessary requirements for the placement process is that it should be capable of generating routable solutions. This paper describes a simple but effective method leading to the reduction of the routing congestion and the final routed wirelength for large-scale mixed-size designs. In order to reduce routing congestion and improve routability, we propose blocking narrow regions on the chip. We also propose dummy-cell insertion inside regions characterized by reduced fixed-macro density. Our placer consists of three major components: (i) narrow channel reduction by performing neighbor-based fixed-macro inflation; (ii) dummy-cell insertion inside large regions with reduced fixed-macro density; and (iii) pre-placement inflation by detecting tangled logic structures in the netlist and minimizing the maximum pin density. We evaluated the quality of our placer using the newly released DAC 2012 routability-driven placement contest designs and we compared our results to the top four teams that participated in the placement contest. The experimental results reveal that our placer improves the routability of the DAC 2012 placement contest designs and effectively reduces the routing congestion.

放置过程的必要要求之一是它应该能够生成可路由的解决方案。本文描述了一种简单而有效的方法，可以减少大规模混合尺寸设计的路由拥塞和最终路由长度。为了减少路由拥塞，提高路由可达性，我们提出在芯片上阻塞窄区域。我们还提出了在固定宏观密度降低的区域内插入假细胞。我们的预测由三个主要部分组成:(i)通过执行基于邻居的固定宏观通货膨胀来缩小渠道;(ii)在固定宏观密度降低的大区域内插入假细胞;(iii)通过检测网表中的纠缠逻辑结构和最小化最大引脚密度来实现预放置膨胀。我们使用最新发布的DAC 2012可达性驱动的安置竞赛设计来评估我们的安置质量，并将我们的结果与参加安置竞赛的前四名团队进行比较。实验结果表明，该算法提高了DAC 2012布局竞赛设计的可达性，有效地减少了路由拥塞。

{"title":"Optimizing routability in large-scale mixed-size placement","authors":"J. Cong, Guojie Luo, Kalliopi Tsota, Bingjun Xiao","doi":"10.1109/ASPDAC.2013.6509636","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509636","url":null,"abstract":"One of the necessary requirements for the placement process is that it should be capable of generating routable solutions. This paper describes a simple but effective method leading to the reduction of the routing congestion and the final routed wirelength for large-scale mixed-size designs. In order to reduce routing congestion and improve routability, we propose blocking narrow regions on the chip. We also propose dummy-cell insertion inside regions characterized by reduced fixed-macro density. Our placer consists of three major components: (i) narrow channel reduction by performing neighbor-based fixed-macro inflation; (ii) dummy-cell insertion inside large regions with reduced fixed-macro density; and (iii) pre-placement inflation by detecting tangled logic structures in the netlist and minimizing the maximum pin density. We evaluated the quality of our placer using the newly released DAC 2012 routability-driven placement contest designs and we compared our results to the top four teams that participated in the placement contest. The experimental results reveal that our placer improves the routability of the DAC 2012 placement contest designs and effectively reduces the routing congestion.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"130 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116578379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

On real-time STM concurrency control for embedded software with improved schedulability 改进可调度性的嵌入式软件实时STM并发控制研究

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509557

Mohammed El-Shambakey, B. Ravindran

We consider software transactional memory (STM) concurrency control for embedded multicore real-time software, and present a novel contention manager for resolving transactional conflicts, called PNF. We upper bound transactional retries and task response times. Our implementation in RSTM/real-time Linux reveals that PNF yields shorter or comparable retry costs than competitors.

本文研究了嵌入式多核实时软件的软件事务性内存并发控制，提出了一种新的用于解决事务性冲突的争用管理器PNF。我们为事务重试和任务响应时间设定了上限。我们在RSTM/real-time Linux中的实现表明，PNF比竞争对手产生更短或相当的重试成本。

引用次数: 8

Network flow modeling for escape routing on staggered pin arrays 交错引脚阵列上逃逸路由的网络流建模

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509595

Pei-Ci Wu, Martin D. F. Wong

Recently staggered pin arrays are introduced for modern designs with high pin density. Although some studies have been done on escape routing for hexagonal arrays, the hexagonal array is only a special kind of staggered pin array. There exist other kinds of staggered pin arrays in current industrial designs, and the existing works cannot be extended to solve them. In this paper, we study the escape routing problem on staggered pin arrays. Network flow models are proposed to correctly model the capacity constraints of staggered pin arrays. Our models are guaranteed to find an escape routing satisfying the capacity constraints if there exists one. The correctness of these models lead to an optimal algorithm.

近年来，交错引脚阵列被引入到高引脚密度的现代设计中。虽然对六边形阵列的逃逸路径进行了一些研究，但六边形阵列只是一种特殊的交错引脚阵列。在目前的工业设计中还存在其他类型的交错引脚阵列，现有的工作无法扩展来解决这些问题。本文研究了交错引脚阵列上的逃逸布线问题。为了正确地模拟交错引脚阵列的容量约束，提出了网络流模型。如果存在容量约束，我们的模型保证能找到满足容量约束的逃逸路径。这些模型的正确性导致了最优算法。

引用次数: 5

Shared cache aware task mapping for WCRT minimization 共享缓存感知任务映射，实现 WCRT 最小化

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509688

Huping Ding, Yun Liang, T. Mitra

The Worst-Case Response Time (WCRT) of multi-tasking applications running on multi-cores is an important metric for real-time embedded systems. The WCRT is determined by the mapping of the tasks to the cores (which determines load balancing) and the Worst-Case Execution Time (WCET) of the tasks. However, the WCET of a task is also influenced by the conflicts in the shared cache from concurrently executing tasks on other cores in a multi-core system. In other words, the mapping of the tasks to the cores indirectly influences the WCET of the tasks, which in turn impacts the WCRT of the entire application. Thus the mapping of the tasks to the cores should simultaneously maximize workload balance and minimize shared cache interference. We propose an integer-linear programming (ILP) formulation to achieve this objective. Experimental evaluation shows that shared cache aware task mapping achieves on an average 25% and 33% WCRT reduction for real-life and synthetic applications, respectively, compared to traditional approach that is agnostic to shared cache conflicts and solely focuses on load balancing.

在多核上运行的多任务应用程序的最坏情况响应时间（WCRT）是实时嵌入式系统的一个重要指标。WCRT 由任务到内核的映射（决定负载平衡）和任务的最坏情况执行时间 (WCET) 决定。然而，任务的 WCET 还会受到多核系统中其他内核上同时执行的任务在共享缓存中产生的冲突的影响。换句话说，任务与内核的映射会间接影响任务的 WCET，进而影响整个应用的 WCRT。因此，将任务映射到内核时，应同时最大限度地实现工作量平衡和最小化共享缓存干扰。为实现这一目标，我们提出了一种整数线性编程（ILP）方法。实验评估表明，与不考虑共享缓存冲突、只关注负载平衡的传统方法相比，共享缓存感知任务映射在实际应用和合成应用中分别平均降低了 25% 和 33% 的 WCRT。

引用次数: 20

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀