首页 > 最新文献

2008 Asia and South Pacific Design Automation Conference最新文献

英文 中文
Panel: Best ways to use billions of devices on a chip 面板:在一个芯片上使用数十亿设备的最佳方法
Pub Date : 2008-03-21 DOI: 10.1109/ASPDAC.2008.4484061
G. Martin
We all know that Moore's law is good for at least a few more generations of silicon process, and this will give rise to many integrated circuits having billions of transistors on them. The leading 45 nm processors being announced are getting close to a billion transistors as of 2007. But how can we best use these devices in the future? Integrating more and more features and functions onto SoCs may not be the optimal use for all of these billions of resources. Indeed, to even have a working device at 45, 32, 22 and 16 nm may require new architectures and new structures to be incorporated. Among the many ideas that can be advanced to best use the 'billions and billions served' are: (1) multicore and multiprocessor systems (2) yet more memory, to hold the embedded software and data required by multiprocessor architectures (3) more and more elaborate on-chip interconnect and network structures (4) redundant structures for defect tolerance (5) structures and architectures for dynamic error recovery (6) a variety of schemes to allow lower and lower power and energy consumption At the same time, billions of transistors on a chip will pose increasing challenges to our design methodologies, integration approaches and design tools. How can we best conceive of, architect, design, integrate, verify and manufacture such devices? This panel draws on several academic and industry experts who will discuss their views on the best things to integrate into future ICs, and the best ways to do that integration. It will give an excellent opportunity to the audience to challenge and discuss these ideas and to advocate their own views. As well as considering the 'best' ways to use these resources, the panel will also be a good opportunity to discuss the 'worst' ways to proceed. What architectural dead-ends should be avoided as we move through each silicon process generation?
我们都知道,摩尔定律至少对未来几代硅制程是有益的,这将导致许多集成电路产生数十亿个晶体管。截至2007年,领先的45纳米处理器已接近10亿个晶体管。但在未来,我们如何才能最好地利用这些设备呢?将越来越多的特性和功能集成到soc上可能不是所有这些数十亿资源的最佳用途。事实上,即使是在45,32,22和16nm的工作器件,也可能需要新的架构和新的结构。为了更好地利用“数十亿美元的服务”,可以提出许多想法,其中包括:(1)多核和多处理器系统(2)更多的内存,以容纳多处理器架构所需的嵌入式软件和数据(3)越来越复杂的片上互连和网络结构(4)冗余结构的缺陷容忍(5)结构和架构的动态错误恢复(6)各种方案,以允许越来越低的功耗和能耗同时,芯片上数十亿个晶体管将对我们的设计方法、集成方法和设计工具提出越来越大的挑战。我们如何才能最好地构思、构建、设计、集成、验证和制造这样的设备?该小组邀请了几位学术和行业专家,他们将讨论他们对集成到未来ic中的最佳内容以及实现集成的最佳方法的看法。它将给观众一个绝佳的机会来挑战和讨论这些想法,并倡导自己的观点。除了考虑使用这些资源的“最佳”方式外,该小组也将是讨论“最坏”方式的好机会。在我们进行每个硅制程生成时,应该避免哪些架构上的死胡同?
{"title":"Panel: Best ways to use billions of devices on a chip","authors":"G. Martin","doi":"10.1109/ASPDAC.2008.4484061","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4484061","url":null,"abstract":"We all know that Moore's law is good for at least a few more generations of silicon process, and this will give rise to many integrated circuits having billions of transistors on them. The leading 45 nm processors being announced are getting close to a billion transistors as of 2007. But how can we best use these devices in the future? Integrating more and more features and functions onto SoCs may not be the optimal use for all of these billions of resources. Indeed, to even have a working device at 45, 32, 22 and 16 nm may require new architectures and new structures to be incorporated. Among the many ideas that can be advanced to best use the 'billions and billions served' are: (1) multicore and multiprocessor systems (2) yet more memory, to hold the embedded software and data required by multiprocessor architectures (3) more and more elaborate on-chip interconnect and network structures (4) redundant structures for defect tolerance (5) structures and architectures for dynamic error recovery (6) a variety of schemes to allow lower and lower power and energy consumption At the same time, billions of transistors on a chip will pose increasing challenges to our design methodologies, integration approaches and design tools. How can we best conceive of, architect, design, integrate, verify and manufacture such devices? This panel draws on several academic and industry experts who will discuss their views on the best things to integrate into future ICs, and the best ways to do that integration. It will give an excellent opportunity to the audience to challenge and discuss these ideas and to advocate their own views. As well as considering the 'best' ways to use these resources, the panel will also be a good opportunity to discuss the 'worst' ways to proceed. What architectural dead-ends should be avoided as we move through each silicon process generation?","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121807645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vertical via design techniques for multi-layered P/G networks 多层P/G网络的垂直通径设计技术
Pub Date : 2008-01-21 DOI: 10.1109/ASPDAC.2008.4484027
Shuai Li, Jin Shi, Yici Cai, Xianlong Hong
In multi-layered power/ground (P/G) networks, to connect the whole network together, vertical vias are usually placed at intersections between metal wires of adjoining layers. In this paper, a deep study about the design of vertical vias is presented. First we present an efficient heuristic algorithm based on sensitivity analysis to optimize via allocation in early design stage. Compared with even allocation, averagely our algorithm is capable of reducing worst voltage drop by 8.43% while using the same or even less number of vias. Also, adjoint network method is utilized and significantly improves the efficiency of our algorithm. Next, we demonstrate that by linking metal wires of nonadjacent layers, cross-layer vias are powerful in eliminating "hot" areas which suffer from large voltage drop on bottom layer. A similar heuristic algorithm is also developed for the addition of cross-layer vias.
在多层电源/地(P/G)网络中,为了将整个网络连接在一起,通常在相邻层金属导线之间的交叉处放置垂直过孔。本文对垂直通孔的设计进行了深入的研究。首先提出了一种基于灵敏度分析的启发式算法,在设计早期通过分配进行优化。与均匀分配相比,在使用相同或更少的过孔数量的情况下,我们的算法平均能够将最差电压降降低8.43%。同时利用伴随网络方法,大大提高了算法的效率。接下来,我们证明了通过连接非相邻层的金属线,跨层通孔在消除底层受大电压降影响的“热”区域方面是强大的。对于跨层通孔的添加,也开发了类似的启发式算法。
{"title":"Vertical via design techniques for multi-layered P/G networks","authors":"Shuai Li, Jin Shi, Yici Cai, Xianlong Hong","doi":"10.1109/ASPDAC.2008.4484027","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4484027","url":null,"abstract":"In multi-layered power/ground (P/G) networks, to connect the whole network together, vertical vias are usually placed at intersections between metal wires of adjoining layers. In this paper, a deep study about the design of vertical vias is presented. First we present an efficient heuristic algorithm based on sensitivity analysis to optimize via allocation in early design stage. Compared with even allocation, averagely our algorithm is capable of reducing worst voltage drop by 8.43% while using the same or even less number of vias. Also, adjoint network method is utilized and significantly improves the efficiency of our algorithm. Next, we demonstrate that by linking metal wires of nonadjacent layers, cross-layer vias are powerful in eliminating \"hot\" areas which suffer from large voltage drop on bottom layer. A similar heuristic algorithm is also developed for the addition of cross-layer vias.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115439186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2n) GF(P)和GF(2n)中Montgomery乘法的可扩展统一双基结构
Pub Date : 2008-01-21 DOI: 10.1109/ASPDAC.2008.4484041
Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, K. Shimizu, Youhua Shi, N. Togawa, M. Yanagisawa, T. Ohtsuki
Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), which is a type of public-key cryptography. Montgomery multiplication is commonly used as a technique for the modular multiplication and required scalability since the bit length of operands varies depending on the security levels. Also, ECC is performed in GF(P) or GF(2n), and unified architectures for GF(P) and GF(2n) multiplier are needed. However, in previous works, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2n) circuits of the multiplier because the critical path of GF(P) circuit is longer. This paper proposes a scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2n). The proposed architecture unifies 4 parallel radix-216 multipliers in GF(P) and a radix-264 multiplier in GF(2n) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute GF(P) 256-bit Montgomery multiplication in 0.23 mus.
模乘法运算是椭圆曲线密码(ECC)中最主要的算术运算,椭圆曲线密码是一种公钥密码。Montgomery乘法通常用作模块化乘法和所需可伸缩性的技术,因为操作数的位长度根据安全级别而变化。此外,ECC是在GF(P)或GF(2n)中进行的,并且需要GF(P)和GF(2n)乘法器的统一架构。然而,在以往的工作中,由于GF(P)电路的关键路径较长,需要改变频率或双基结构来处理乘法器的GF(P)电路和GF(2n)电路之间的延迟时间差。本文提出了GF(P)和GF(2n)中Montgomery乘法的可扩展统一双基体系结构。所提出的架构将GF(P)中的4个并行基数-216乘法器和GF(2n)中的基数-264乘法器统一为一个单元。对GF(P)乘法器应用较低的基数可以缩短其关键路径,并且可以使用相同的乘法器以相同的频率计算两个字段中的操作数,从而不需要时钟分频器来处理延迟时间差。此外,GF(P)的并行架构减少了双基数方法增加的时钟周期。因此,所提出的架构实现了在0.23 mus内计算GF(P) 256位蒙哥马利乘法。
{"title":"Scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2n)","authors":"Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, K. Shimizu, Youhua Shi, N. Togawa, M. Yanagisawa, T. Ohtsuki","doi":"10.1109/ASPDAC.2008.4484041","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4484041","url":null,"abstract":"Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), which is a type of public-key cryptography. Montgomery multiplication is commonly used as a technique for the modular multiplication and required scalability since the bit length of operands varies depending on the security levels. Also, ECC is performed in GF(P) or GF(2n), and unified architectures for GF(P) and GF(2n) multiplier are needed. However, in previous works, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2n) circuits of the multiplier because the critical path of GF(P) circuit is longer. This paper proposes a scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2n). The proposed architecture unifies 4 parallel radix-216 multipliers in GF(P) and a radix-264 multiplier in GF(2n) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute GF(P) 256-bit Montgomery multiplication in 0.23 mus.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123151926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
LVDS-type on-chip transmision line interconnect with passive equalizers in 90nm CMOS process lvds型片上传输线与90nm CMOS工艺无源均衡器互连
Pub Date : 2008-01-21 DOI: 10.1109/ASPDAC.2008.4484069
A. Mineyama, Hiroyuki Ito, T. Ishii, K. Okada, K. Masu
This paper demonstrates a low voltage differential signaling (LVDS)-type on-chip transmission line (TL) interconnect to solve delay issues on global interconnects. The proposed on-chip TL interconnect can achieve 10.5 Gbps signaling and has smaller delay, smaller delay variation and better power efficiency than conventional on-chip interconnects at high-frequencies.
本文演示了一种低压差分信号(LVDS)型片上传输线(TL)互连,以解决全局互连中的延迟问题。所提出的片上TL互连在高频下具有比传统片上互连更小的延迟、更小的延迟变化和更高的功率效率,可实现10.5 Gbps的信令。
{"title":"LVDS-type on-chip transmision line interconnect with passive equalizers in 90nm CMOS process","authors":"A. Mineyama, Hiroyuki Ito, T. Ishii, K. Okada, K. Masu","doi":"10.1109/ASPDAC.2008.4484069","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4484069","url":null,"abstract":"This paper demonstrates a low voltage differential signaling (LVDS)-type on-chip transmission line (TL) interconnect to solve delay issues on global interconnects. The proposed on-chip TL interconnect can achieve 10.5 Gbps signaling and has smaller delay, smaller delay variation and better power efficiency than conventional on-chip interconnects at high-frequencies.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117117929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Behavioral synthesis with activating unused flip-flops for reducing glitch power in FPGA 通过激活未使用触发器的行为综合来降低FPGA中的故障功率
Pub Date : 2008-01-21 DOI: 10.1109/ASPDAC.2008.4483919
C. Hsieh, J. Cong, Zhiru Zhang, Shih-Chieh Chang
In this paper we discuss optimizing the interconnect power of designs implemented in FPGA platforms. In particular, we reduce the glitch power on interconnects associated with the output of functional units in a design. The idea is to activate unused flip-flops to block the propagation of glitches, which takes advantage of the abundant flip-flops in modern FPGA structures. Since the activation of additional flip-flops may cause data hazard problems, we develop several effective behavioral synthesis techniques to prevent such data hazards. We also study the optimality of our techniques. The experimental results show that on average, our methods lead to a 28% reduction in dynamic power in the Xilinx Virtex-II platform.
本文讨论了在FPGA平台上实现的互连功率优化设计。特别是,我们减少了与设计中功能单元输出相关的互连上的故障功率。这个想法是激活未使用的触发器来阻止小故障的传播,这利用了现代FPGA结构中丰富的触发器。由于激活额外的触发器可能导致数据危害问题,我们开发了几种有效的行为合成技术来防止此类数据危害。我们也研究我们的技术的最优性。实验结果表明,我们的方法平均可使Xilinx Virtex-II平台的动态功耗降低28%。
{"title":"Behavioral synthesis with activating unused flip-flops for reducing glitch power in FPGA","authors":"C. Hsieh, J. Cong, Zhiru Zhang, Shih-Chieh Chang","doi":"10.1109/ASPDAC.2008.4483919","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483919","url":null,"abstract":"In this paper we discuss optimizing the interconnect power of designs implemented in FPGA platforms. In particular, we reduce the glitch power on interconnects associated with the output of functional units in a design. The idea is to activate unused flip-flops to block the propagation of glitches, which takes advantage of the abundant flip-flops in modern FPGA structures. Since the activation of additional flip-flops may cause data hazard problems, we develop several effective behavioral synthesis techniques to prevent such data hazards. We also study the optimality of our techniques. The experimental results show that on average, our methods lead to a 28% reduction in dynamic power in the Xilinx Virtex-II platform.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117148159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Fast, quasi-optimal, and pipelined instruction-set extensions 快速、准最优和流水线指令集扩展
Pub Date : 2008-01-21 DOI: 10.1109/ASPDAC.2008.4483970
A. K. Verma, P. Brisk, P. Ienne
Nowadays many customised embedded processors offer the possibility of speeding up an application by implementing it using application-specific functional units (AFUs). However, the AFUs must satisfy certain constraints in terms of read and write ports between AFU and processor register file. Due to these restrictions the size and complexity of AFUs remain small. However, in recent some work has been done on relaxing the register file port constraints by serialising register file access (i.e., by allowing multi cycle read and write). This makes the problem of selecting best AFU significantly more complex. Most previous approaches use a two staged process to solve this problem, i.e., first selecting AFUs under some higher I/O constraints and then serialise them under the actual register file port constraints. Not only these methods are complex but also lead to suboptimal solutions. In this paper we formulate the AFU selection problem as an integer linear programming and solve it optimally. We show experimentally that our methodology produces significantly better results compared to state of art techniques.
如今,许多定制嵌入式处理器通过使用特定于应用程序的功能单元(afu)来实现应用程序,从而提供了加速应用程序的可能性。但是,AFU与处理器寄存器文件之间的读写端口必须满足一定的限制。由于这些限制,afu的大小和复杂性仍然很小。然而,最近在通过序列化寄存器文件访问(即通过允许多周期读写)来放松寄存器文件端口限制方面已经做了一些工作。这使得选择最佳AFU的问题变得更加复杂。大多数以前的方法使用两个阶段的过程来解决这个问题,即,首先在一些更高的I/O约束下选择afu,然后在实际的寄存器文件端口约束下序列化它们。这些方法不仅复杂,而且会导致次优解。本文将AFU选择问题表述为一个整数线性规划问题,并对其进行最优求解。我们通过实验证明,与最先进的技术相比,我们的方法产生了明显更好的结果。
{"title":"Fast, quasi-optimal, and pipelined instruction-set extensions","authors":"A. K. Verma, P. Brisk, P. Ienne","doi":"10.1109/ASPDAC.2008.4483970","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483970","url":null,"abstract":"Nowadays many customised embedded processors offer the possibility of speeding up an application by implementing it using application-specific functional units (AFUs). However, the AFUs must satisfy certain constraints in terms of read and write ports between AFU and processor register file. Due to these restrictions the size and complexity of AFUs remain small. However, in recent some work has been done on relaxing the register file port constraints by serialising register file access (i.e., by allowing multi cycle read and write). This makes the problem of selecting best AFU significantly more complex. Most previous approaches use a two staged process to solve this problem, i.e., first selecting AFUs under some higher I/O constraints and then serialise them under the actual register file port constraints. Not only these methods are complex but also lead to suboptimal solutions. In this paper we formulate the AFU selection problem as an integer linear programming and solve it optimally. We show experimentally that our methodology produces significantly better results compared to state of art techniques.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"6 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120935849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
DPlace2.0: A stable and efficient analytical placement based on diffusion DPlace2.0:基于扩散的稳定高效的分析放置
Pub Date : 2008-01-21 DOI: 10.1109/ASPDAC.2008.4483972
T. Luo, D. Pan
Nowadays a placement problem often involves multi-million objects and excessive fixed blockages. We present a new global placement algorithm that scales well to the modern large-scale circuit placement problems. We simulate the natural diffusion process to spread cells smoothly over the placement region, and use both analytical and discrete techniques to improve the wire length. Although any analytical wire length technique can be used in our new framework, by using the quadratic wire length model, the hessian of our formulation is extremely sparse compared with conventional formulations, which brings 24x speed up on quadratic solver. We also propose a wire linearization technique that transform quadratic star model into HPWL exactly. The overall runtime of our tool is close to the fastest placement tool in existing literature and significantly better than others. And meanwhile, we obtain competitive wire length results to the best known ones. The average total wire length is 2.2% higher than mPL6, 0.2%, 3.1%, and 9.1% better than FastPlace3.0, APlace2.0, and Capol0.2 respectively.
如今,放置问题通常涉及数百万个对象和过多的固定阻塞。我们提出了一种新的全局布局算法,可以很好地适用于现代大规模电路布局问题。我们模拟自然扩散过程,使细胞在放置区域平滑地扩散,并使用解析和离散技术来改善导线长度。虽然在我们的新框架中可以使用任何解析线长度技术,但通过使用二次线长度模型,我们的公式与传统公式相比,网格非常稀疏,从而使二次求解器的速度提高了24倍。我们还提出了一种线线性化技术,将二次星型模型精确地转换为HPWL。我们的工具的整体运行时间接近现有文献中最快的放置工具,并且明显优于其他工具。与此同时,我们获得了与最知名的线材长度相媲美的结果。平均总导线长度比mPL6高2.2%,比FastPlace3.0、APlace2.0和Capol0.2分别高0.2%、3.1%和9.1%。
{"title":"DPlace2.0: A stable and efficient analytical placement based on diffusion","authors":"T. Luo, D. Pan","doi":"10.1109/ASPDAC.2008.4483972","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483972","url":null,"abstract":"Nowadays a placement problem often involves multi-million objects and excessive fixed blockages. We present a new global placement algorithm that scales well to the modern large-scale circuit placement problems. We simulate the natural diffusion process to spread cells smoothly over the placement region, and use both analytical and discrete techniques to improve the wire length. Although any analytical wire length technique can be used in our new framework, by using the quadratic wire length model, the hessian of our formulation is extremely sparse compared with conventional formulations, which brings 24x speed up on quadratic solver. We also propose a wire linearization technique that transform quadratic star model into HPWL exactly. The overall runtime of our tool is close to the fastest placement tool in existing literature and significantly better than others. And meanwhile, we obtain competitive wire length results to the best known ones. The average total wire length is 2.2% higher than mPL6, 0.2%, 3.1%, and 9.1% better than FastPlace3.0, APlace2.0, and Capol0.2 respectively.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124935287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
On reducing both shift and capture power for scan-based testing 关于降低基于扫描的测试的移位和捕获功率
Pub Date : 2008-01-21 DOI: 10.1109/ASPDAC.2008.4484032
Jia Li, Q. Xu, Yu Hu, Xiaowei Li
Power consumption in scan-based testing is a major concern nowadays. In this paper, we present a new X-fllling technique to reduce both shift power and capture power during scan tests, namely LSC-filling. The basic idea is to use as few as possible X-bits to keep the capture power under the peak power limit of the circuit under test (CUT), while using the remaining X-bits to reduce the shift power to cut down the CUT's average power consumption during scan tests as much as possible. In addition, by carefully selecting the X-filling order, our X-filling technique is able to achieve lower capture power when compared to existing methods. Experimental results on ISCAS'89 benchmark circuits show the effectiveness of the proposed methodology.
目前,基于扫描的测试中的功耗是一个主要问题。在本文中,我们提出了一种新的x填充技术,即lsc填充,以降低扫描测试时的移位功率和捕获功率。基本思想是使用尽可能少的x位来保持捕获功率低于被测电路(CUT)的峰值功率限制,同时使用剩余的x位来减少移位功率,以尽可能地降低扫描测试期间CUT的平均功耗。此外,通过仔细选择x填充顺序,与现有方法相比,我们的x填充技术能够实现更低的捕获功率。在ISCAS’89基准电路上的实验结果表明了该方法的有效性。
{"title":"On reducing both shift and capture power for scan-based testing","authors":"Jia Li, Q. Xu, Yu Hu, Xiaowei Li","doi":"10.1109/ASPDAC.2008.4484032","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4484032","url":null,"abstract":"Power consumption in scan-based testing is a major concern nowadays. In this paper, we present a new X-fllling technique to reduce both shift power and capture power during scan tests, namely LSC-filling. The basic idea is to use as few as possible X-bits to keep the capture power under the peak power limit of the circuit under test (CUT), while using the remaining X-bits to reduce the shift power to cut down the CUT's average power consumption during scan tests as much as possible. In addition, by carefully selecting the X-filling order, our X-filling technique is able to achieve lower capture power when compared to existing methods. Experimental results on ISCAS'89 benchmark circuits show the effectiveness of the proposed methodology.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124998224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
TCG-based multi-bend bus driven floorplanning 基于tcg的多弯道客车驱动平面规划
Pub Date : 2008-01-21 DOI: 10.1109/ASPDAC.2008.4483938
Tilen Ma, Evangeline F. Y. Young
In this paper, the problem of bus driven floor-planning is addressed. Given a set of modules and bus specifications, a floorplan solution including the bus routes will be generated with the floorplan area and total bus area minimized. Some previous works have addressed this problem with restricted bus shapes of 0-bend, 1-bend or 2-bend (Law, 2005). However, in this paper, we address this bus driven floorplanning without any limitations on the shapes of the buses. We solve this problem by a simulated annealing based floorplanner using the transitive closure graph (TCG) representation (Lin, 2001). Experimental results show that we can improve over (Law, 2005) significantly in terms of both run time and quality, since there are more flexibilities in routing the buses and complex shape validation steps are not needed. For data sets with buses connecting a large number of blocks, our approach can still generate high quality solutions effectively, while the approach (Law, 2005) of restricting to 2-bend buses often cannot give any feasible solutions.
本文研究了公共汽车驱动的楼层规划问题。给定一组模块和巴士规格,将生成包含巴士路线的平面图解决方案,并将平面图面积和总巴士面积最小化。以前的一些工作已经解决了这个问题,限制巴士形状为0弯道,1弯道或2弯道(Law, 2005)。然而,在本文中,我们解决了这种公共汽车驱动的平面规划,而不限制公共汽车的形状。我们通过使用传递闭包图(TCG)表示的基于模拟退火的地板规划器来解决这个问题(Lin, 2001)。实验结果表明,我们可以在运行时间和质量方面显著改进(Law, 2005),因为在总线路由方面有更多的灵活性,并且不需要复杂的形状验证步骤。对于总线连接大量块的数据集,我们的方法仍然可以有效地生成高质量的解,而限制为2弯总线的方法(Law, 2005)往往不能给出任何可行的解。
{"title":"TCG-based multi-bend bus driven floorplanning","authors":"Tilen Ma, Evangeline F. Y. Young","doi":"10.1109/ASPDAC.2008.4483938","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483938","url":null,"abstract":"In this paper, the problem of bus driven floor-planning is addressed. Given a set of modules and bus specifications, a floorplan solution including the bus routes will be generated with the floorplan area and total bus area minimized. Some previous works have addressed this problem with restricted bus shapes of 0-bend, 1-bend or 2-bend (Law, 2005). However, in this paper, we address this bus driven floorplanning without any limitations on the shapes of the buses. We solve this problem by a simulated annealing based floorplanner using the transitive closure graph (TCG) representation (Lin, 2001). Experimental results show that we can improve over (Law, 2005) significantly in terms of both run time and quality, since there are more flexibilities in routing the buses and complex shape validation steps are not needed. For data sets with buses connecting a large number of blocks, our approach can still generate high quality solutions effectively, while the approach (Law, 2005) of restricting to 2-bend buses often cannot give any feasible solutions.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125742697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A debug probe for concurrently debugging multiple embedded cores and inter-core transactions in NoC-based systems 一种调试探针,用于在基于cpu的系统中并发调试多个嵌入式内核和核间事务
Pub Date : 2008-01-21 DOI: 10.1109/ASPDAC.2008.4483986
Shan Tang, Qiang Xu
Existing SoC debug techniques mainly target bus-based systems. They are not readily applicable to the emerging system that use network-on-chip (NoC) as on-chip communication scheme. In this paper, we present the detailed design of a novel debug probe (DP) inserted between the core under debug (CUD) and the NoC. With embedded configurable triggers, delay control and timestamping mechanism, the proposed DP is very effective for inter-core transaction analysis as well as controlling embedded cores' debug processes. Experimental results show the functionalities of the proposed DP and its area overhead.
现有的SoC调试技术主要针对基于总线的系统。它们并不容易适用于以片上网络(NoC)作为片上通信方案的新兴系统。在本文中,我们详细设计了一种新型的调试探头(DP),它插入在调试核心(CUD)和NoC之间。该方案采用嵌入式可配置触发器、延迟控制和时间戳机制,对核间事务分析和嵌入式核调试过程控制非常有效。实验结果表明了该算法的功能和面积开销。
{"title":"A debug probe for concurrently debugging multiple embedded cores and inter-core transactions in NoC-based systems","authors":"Shan Tang, Qiang Xu","doi":"10.1109/ASPDAC.2008.4483986","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483986","url":null,"abstract":"Existing SoC debug techniques mainly target bus-based systems. They are not readily applicable to the emerging system that use network-on-chip (NoC) as on-chip communication scheme. In this paper, we present the detailed design of a novel debug probe (DP) inserted between the core under debug (CUD) and the NoC. With embedded configurable triggers, delay control and timestamping mechanism, the proposed DP is very effective for inter-core transaction analysis as well as controlling embedded cores' debug processes. Experimental results show the functionalities of the proposed DP and its area overhead.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125333757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
2008 Asia and South Pacific Design Automation Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1