首页 > 最新文献

2008 IEEE International Conference on Computer Design最新文献

英文 中文
Analysis and minimization of practical energy in 45nm subthreshold logic circuits 45nm亚阈值逻辑电路中实际能量的分析与最小化
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751876
D. Bol, R. Ambroise, D. Flandre, J. Legat
Over the last decade, the design of ultra-low-power digital circuits in subthreshold regime has been driven by the quest for minimum energy per operation. In this contribution, we observe that operating at minimum-energy point is not straightforward as design constraints from real-life applications have an important impact on energy. Therefore, we introduce the alternative concept of practical energy, taking functional-yield and throughput constraints on minimum Vdd into account. In this context, we demonstrate for the first time the detrimental impact of DIBL on minimum Vdd. Practical energy gives a useful analysis framework of circuit optimization to reach minimum-energy point, while considering the throughput as an input variable dictated by the application. From simulation of a benchmark multiplier in 45 nm technology, we find out that practical energy can be far higher than minimum energy point, in the case of low-throughput applications (ap 10-100 kOp/s) because of static leakage energy and robustness-limited minimum Vdd. With the proposed framework, we investigate the capability of conventional optimization techniques to make practical energy meet minimum energy point. Amongst these techniques, channel length upsize is shown to be more efficient than MTCMOS power gating, body biasing, Vt selection or device width upsize, as it increases robustness while simultaneously reducing static leakage energy. A small length upsize with low area overhead is shown to reduce practical energy at low throughput to less than 2.1 times the minimum energy level. At medium throughput, it even brings practical energy 30% lower than minimum energy level without optimization techniques.
在过去的十年中,亚阈值状态下的超低功耗数字电路的设计一直受到每次操作最小能量的追求的驱动。在这篇文章中,我们观察到在最小能量点操作并不简单,因为来自实际应用的设计约束对能量有重要影响。因此,我们引入实用能量的替代概念,考虑到最小Vdd的功能产率和吞吐量约束。在这种情况下,我们首次证明了DIBL对最小Vdd的有害影响。实际能量给出了一个有用的电路优化分析框架,以达到最小能量点,同时考虑吞吐量作为一个输入变量由应用决定。通过对45纳米技术的基准倍增器的模拟,我们发现在低吞吐量应用(ap 10-100 kOp/s)的情况下,由于静态泄漏能量和鲁棒性限制的最小Vdd,实际能量可能远高于最小能量点。在此框架下,我们考察了传统优化技术使实际能量满足最小能量点的能力。在这些技术中,通道长度增大被证明比MTCMOS功率门控、体偏置、Vt选择或器件宽度增大更有效,因为它增加了鲁棒性,同时减少了静态泄漏能量。具有低面积开销的小长度增大可以在低吞吐量时将实际能量降低到最小能量水平的2.1倍以下。在中等吞吐量下,在没有优化技术的情况下,实际能量甚至比最低能量水平低30%。
{"title":"Analysis and minimization of practical energy in 45nm subthreshold logic circuits","authors":"D. Bol, R. Ambroise, D. Flandre, J. Legat","doi":"10.1109/ICCD.2008.4751876","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751876","url":null,"abstract":"Over the last decade, the design of ultra-low-power digital circuits in subthreshold regime has been driven by the quest for minimum energy per operation. In this contribution, we observe that operating at minimum-energy point is not straightforward as design constraints from real-life applications have an important impact on energy. Therefore, we introduce the alternative concept of practical energy, taking functional-yield and throughput constraints on minimum Vdd into account. In this context, we demonstrate for the first time the detrimental impact of DIBL on minimum Vdd. Practical energy gives a useful analysis framework of circuit optimization to reach minimum-energy point, while considering the throughput as an input variable dictated by the application. From simulation of a benchmark multiplier in 45 nm technology, we find out that practical energy can be far higher than minimum energy point, in the case of low-throughput applications (ap 10-100 kOp/s) because of static leakage energy and robustness-limited minimum Vdd. With the proposed framework, we investigate the capability of conventional optimization techniques to make practical energy meet minimum energy point. Amongst these techniques, channel length upsize is shown to be more efficient than MTCMOS power gating, body biasing, Vt selection or device width upsize, as it increases robustness while simultaneously reducing static leakage energy. A small length upsize with low area overhead is shown to reduce practical energy at low throughput to less than 2.1 times the minimum energy level. At medium throughput, it even brings practical energy 30% lower than minimum energy level without optimization techniques.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133756057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Power switch characterization for fine-grained dynamic voltage scaling 细粒度动态电压缩放的功率开关特性
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751923
Liang Di, M. Putic, J. Lach, B. Calhoun
Dynamic voltage scaling (DVS) provides power savings for systems with varying performance requirements. One low overhead implementation of DVS uses PMOS power switches to connect DVS blocks to one of the available VDD supplies. While power switches have been analyzed extensively for leakage power gating, proper design of power switches for DVS is less well understood. This paper characterizes power switches for DVS in terms of VDD-switching delay and VDD-switching energy. We show the impact of these switching overheads on a novel fine-grained DVS architecture and present an RC model that allows fast estimation of the overhead. Measurements of a DVS multiplier and adder on a 90 nm CMOS test chip validate the model. Our model and measurements confirm that power switched DVS can provide sufficiently low overhead to give energy savings with only one clock cycle spent at a lower voltage, making this approach a flexible and enticing option for embedded portable systems.
动态电压缩放(DVS)为具有不同性能要求的系统提供节能功能。分布式交换机的一种低开销实现使用PMOS电源开关将分布式交换机模块连接到可用的VDD电源之一。虽然对泄漏电源门控的功率开关进行了广泛的分析,但对分布式交换机的功率开关的正确设计却知之甚少。本文从vdd开关延时和vdd开关能量两个方面对分布式交换机的功率开关进行了表征。我们展示了这些交换开销对新型细粒度分布式交换机架构的影响,并提出了一个RC模型,该模型允许快速估计开销。在90nm CMOS测试芯片上对DVS乘法器和加法器进行了测量,验证了该模型。我们的模型和测量证实,功率开关式分布式交换机可以提供足够低的开销,在较低电压下仅花费一个时钟周期就可以节省能源,使这种方法成为嵌入式便携式系统灵活而诱人的选择。
{"title":"Power switch characterization for fine-grained dynamic voltage scaling","authors":"Liang Di, M. Putic, J. Lach, B. Calhoun","doi":"10.1109/ICCD.2008.4751923","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751923","url":null,"abstract":"Dynamic voltage scaling (DVS) provides power savings for systems with varying performance requirements. One low overhead implementation of DVS uses PMOS power switches to connect DVS blocks to one of the available VDD supplies. While power switches have been analyzed extensively for leakage power gating, proper design of power switches for DVS is less well understood. This paper characterizes power switches for DVS in terms of VDD-switching delay and VDD-switching energy. We show the impact of these switching overheads on a novel fine-grained DVS architecture and present an RC model that allows fast estimation of the overhead. Measurements of a DVS multiplier and adder on a 90 nm CMOS test chip validate the model. Our model and measurements confirm that power switched DVS can provide sufficiently low overhead to give energy savings with only one clock cycle spent at a lower voltage, making this approach a flexible and enticing option for embedded portable systems.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128079974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A high-performance parallel CAVLC encoder on a fine-grained many-core system 基于细粒度多核系统的高性能并行CAVLC编码器
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751869
Zhibin Xiao, B. Baas
This paper presents a high-performance parallel context-based adaptive length coding (CAVLC) encoder implemented on a fine-grained many-core system. The software encoder is designed for a H.264/AVC baseline profile encoder. By utilizing arithmetic table elimination and compression techniques, the data-flow of the CAVLC encoder has been partitioned and mapped to an array of 15 small processors. The parallel workload of each processor is characterized and balanced for further throughput optimization. The proposed parallel CAVLC encoder achieves the real-time processing requirement of 30 frames per second for 720 p HDTV. Our experiments show that the presented CAVLC encoder has 4.86 to 6.83 times higher throughput and requires far smaller chip area than the identical encoder implemented on state-of-art general-purpose processors. In comparison to published implementations on common DSP processors, the design has approximately 1.0 to 6.15 times higher throughput while requiring less than 6 times smaller area.
提出了一种在细粒度多核系统上实现的高性能并行上下文自适应长度编码(CAVLC)编码器。软件编码器是为H.264/AVC基线配置文件编码器而设计的。利用算术表消除和压缩技术,对CAVLC编码器的数据流进行了分区,并映射到15个小处理器阵列上。对每个处理器的并行工作负载进行表征和平衡,以进一步优化吞吐量。所提出的并行CAVLC编码器实现了720p高清电视每秒30帧的实时处理要求。实验表明,该编码器的吞吐量是目前通用处理器上相同编码器的4.86 ~ 6.83倍,而且所需的芯片面积要小得多。与在普通DSP处理器上发布的实现相比,该设计具有大约1.0至6.15倍的高吞吐量,而所需的面积不到6倍。
{"title":"A high-performance parallel CAVLC encoder on a fine-grained many-core system","authors":"Zhibin Xiao, B. Baas","doi":"10.1109/ICCD.2008.4751869","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751869","url":null,"abstract":"This paper presents a high-performance parallel context-based adaptive length coding (CAVLC) encoder implemented on a fine-grained many-core system. The software encoder is designed for a H.264/AVC baseline profile encoder. By utilizing arithmetic table elimination and compression techniques, the data-flow of the CAVLC encoder has been partitioned and mapped to an array of 15 small processors. The parallel workload of each processor is characterized and balanced for further throughput optimization. The proposed parallel CAVLC encoder achieves the real-time processing requirement of 30 frames per second for 720 p HDTV. Our experiments show that the presented CAVLC encoder has 4.86 to 6.83 times higher throughput and requires far smaller chip area than the identical encoder implemented on state-of-art general-purpose processors. In comparison to published implementations on common DSP processors, the design has approximately 1.0 to 6.15 times higher throughput while requiring less than 6 times smaller area.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123303639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
A parallel Steiner tree heuristic for macro cell routing 宏单元路由的并行Steiner树启发式算法
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751836
C. Fobel, G. Grewal
Global routing of macro cells remains an important but time-consuming step in the VLSI design cycle. Macro cells are large, irregularly sized parameterized circuit modules that typically contain large numbers of terminals that must be interconnected. The interconnection pattern for each set of terminals (net) that must be connected is a Steiner tree, and the primary sub-problem in the global routing of macro cells is to find a set of dissimilar, low-cost Steiner trees for each net that must be routed. In this paper, a two-phase, parallel (multi-processor) algorithm is proposed for quickly constructing a diverse pool of high-quality Steiner trees for routing of multi-terminal nets. In the first phase, a single Steiner tree is constructed using a heuristic, called Shrubbery. Then, in the second phase, a pool of dissimilar, high-quality trees are created from the original tree, by running multiple instances of a local search in parallel. Computational experiments performed on over 800 commonly used benchmarks show that running multiple instances of the local search in parallel results in near-linear speed-up over the serial case. Most importantly, the trees produced are both high-quality and dissimilar, allowing for numerous routing possibilities for each net.
宏单元的全局路由仍然是VLSI设计周期中一个重要但耗时的步骤。宏单元是大型的、尺寸不规则的参数化电路模块,通常包含大量必须相互连接的端子。必须连接的每一组终端(网络)的互连模式是一棵斯坦纳树,宏观单元全局路由的主要子问题是为每一个必须路由的网络找到一组不同的、低成本的斯坦纳树。本文提出了一种两阶段并行(多处理器)算法,用于快速构建用于多终端网络路由的高质量斯坦纳树池。在第一阶段,使用一种叫做灌木林的启发式方法构造一棵斯坦纳树。然后,在第二阶段,通过并行运行多个本地搜索实例,从原始树创建一个不同的高质量树池。在超过800个常用基准测试上进行的计算实验表明,并行运行多个局部搜索实例比串行情况有近线性的加速。最重要的是,生成的树既高质量又不同,允许每个网络有许多路由可能性。
{"title":"A parallel Steiner tree heuristic for macro cell routing","authors":"C. Fobel, G. Grewal","doi":"10.1109/ICCD.2008.4751836","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751836","url":null,"abstract":"Global routing of macro cells remains an important but time-consuming step in the VLSI design cycle. Macro cells are large, irregularly sized parameterized circuit modules that typically contain large numbers of terminals that must be interconnected. The interconnection pattern for each set of terminals (net) that must be connected is a Steiner tree, and the primary sub-problem in the global routing of macro cells is to find a set of dissimilar, low-cost Steiner trees for each net that must be routed. In this paper, a two-phase, parallel (multi-processor) algorithm is proposed for quickly constructing a diverse pool of high-quality Steiner trees for routing of multi-terminal nets. In the first phase, a single Steiner tree is constructed using a heuristic, called Shrubbery. Then, in the second phase, a pool of dissimilar, high-quality trees are created from the original tree, by running multiple instances of a local search in parallel. Computational experiments performed on over 800 commonly used benchmarks show that running multiple instances of the local search in parallel results in near-linear speed-up over the serial case. Most importantly, the trees produced are both high-quality and dissimilar, allowing for numerous routing possibilities for each net.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124779796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A novel, highly SEU tolerant digital circuit design approach 一种新颖的、高度容限的数字电路设计方法
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751834
Rajesh Garg, S. Khatri
In this paper, we present a new radiation tolerant CMOS standard cell library, and demonstrate its effectiveness in implementing radiation hardened digital circuits. We exploit the fact that if a gate is implemented using only PMOS (NMOS) transistors then a radiation particle strike can result only in logic a 0 to 1 (1 to 0) flip. Based on this observation, we derive our radiation hardened gates from regular static CMOS gates. In particular, we separate the PMOS and NMOS devices, and split the gate output into two signals. One of these outputs of our radiation tolerant gate is generated using PMOS transistors, and it drives other PMOS transistors (only) in its fanout. Similarly, the other output (generated from NMOS transistors) drives only other NMOS transistors in its fanout. Now, if a radiation particle strikes one of the outputs of the radiation tolerant gate, then the gates in the fanout enter a high-impedance state, and hence preserve their output values. Our radiation hardened gates exhibit an extremely high degree of SEU tolerance, which is validated at the circuit level. Using these gates, we also implement circuit level hardening based on logical masking, to selectively harden those gates in a circuit which contribute most to the soft error failure of the circuit. The gates with a low probability of logical masking are replaced by SEU tolerant gates from our new library, such that the digital design achieves a 90% soft error rate reduction. Experimental results demonstrate that this reduction is achieved with a modest layout area and delay penalty of 62% and 29% respectively, for area mapped designs. In contrast with existing approaches, our approach results in SEU immunity for extremely large critical charge values (>650fC).
本文提出了一种新的耐辐射CMOS标准单元库,并证明了其在实现抗辐射数字电路中的有效性。我们利用这样一个事实,即如果栅极仅使用PMOS (NMOS)晶体管实现,那么辐射粒子撞击只能导致逻辑上的0到1(1到0)翻转。基于这一观察,我们从常规静态CMOS栅极中推导出辐射硬化栅极。特别地,我们分离了PMOS和NMOS器件,并将门输出拆分为两个信号。我们的耐辐射门的其中一个输出是使用PMOS晶体管产生的,并且它驱动其他PMOS晶体管(仅)在其扇出。同样,另一个输出(由NMOS晶体管产生)只驱动其扇出中的其他NMOS晶体管。现在,如果辐射粒子击中容辐射门的一个输出,那么扇出中的门进入高阻抗状态,因此保持其输出值。我们的辐射硬化门具有极高的SEU容忍度,这在电路层面得到了验证。利用这些门,我们还实现了基于逻辑屏蔽的电路级强化,以选择性地强化电路中对电路软错误故障贡献最大的那些门。逻辑屏蔽概率较低的门被我们的新库中的SEU容限门所取代,从而使数字设计实现了90%的软错误率降低。实验结果表明,对于区域映射设计,这种减少分别为62%和29%的适度布局面积和延迟损失。与现有方法相比,我们的方法在极大的临界电荷值(>650fC)下具有SEU抗扰性。
{"title":"A novel, highly SEU tolerant digital circuit design approach","authors":"Rajesh Garg, S. Khatri","doi":"10.1109/ICCD.2008.4751834","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751834","url":null,"abstract":"In this paper, we present a new radiation tolerant CMOS standard cell library, and demonstrate its effectiveness in implementing radiation hardened digital circuits. We exploit the fact that if a gate is implemented using only PMOS (NMOS) transistors then a radiation particle strike can result only in logic a 0 to 1 (1 to 0) flip. Based on this observation, we derive our radiation hardened gates from regular static CMOS gates. In particular, we separate the PMOS and NMOS devices, and split the gate output into two signals. One of these outputs of our radiation tolerant gate is generated using PMOS transistors, and it drives other PMOS transistors (only) in its fanout. Similarly, the other output (generated from NMOS transistors) drives only other NMOS transistors in its fanout. Now, if a radiation particle strikes one of the outputs of the radiation tolerant gate, then the gates in the fanout enter a high-impedance state, and hence preserve their output values. Our radiation hardened gates exhibit an extremely high degree of SEU tolerance, which is validated at the circuit level. Using these gates, we also implement circuit level hardening based on logical masking, to selectively harden those gates in a circuit which contribute most to the soft error failure of the circuit. The gates with a low probability of logical masking are replaced by SEU tolerant gates from our new library, such that the digital design achieves a 90% soft error rate reduction. Experimental results demonstrate that this reduction is achieved with a modest layout area and delay penalty of 62% and 29% respectively, for area mapped designs. In contrast with existing approaches, our approach results in SEU immunity for extremely large critical charge values (>650fC).","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120961689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Understanding performance, power and energy behavior in asymmetric multiprocessors 了解非对称多处理器的性能、功耗和能源行为
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751903
Nagesh B. Lakshminarayana, Hyesoon Kim
Multiprocessor architectures are becoming popular in both desktop and mobile processors. Among multiprocessor architectures, asymmetric architectures show promise in saving energy and power. However, the performance and energy consumption behavior of asymmetric multiprocessors with desktop-oriented multithreaded applications has not been studied widely. In this study, we measure performance and power consumption in asymmetric and symmetric multiprocessors using real 8 and 16 processor systems to understand the relationships between thread interactions and performance/power behavior. We find that when the workload is asymmetric, using an asymmetric multiprocessor can save energy, but for most of the symmetric workloads, using a symmetric multiprocessor (with the highest clock frequency) consumes less energy.
多处理器架构在桌面和移动处理器中越来越流行。在多处理器体系结构中,非对称体系结构在节能和节能方面表现出良好的前景。然而,针对面向桌面的多线程应用,非对称多处理器的性能和能耗行为还没有得到广泛的研究。在本研究中,我们使用真实的8和16处理器系统来测量非对称和对称多处理器的性能和功耗,以了解线程交互与性能/功耗行为之间的关系。我们发现,当工作负载是非对称时,使用非对称多处理器可以节省能量,但对于大多数对称工作负载,使用对称多处理器(时钟频率最高)消耗的能量更少。
{"title":"Understanding performance, power and energy behavior in asymmetric multiprocessors","authors":"Nagesh B. Lakshminarayana, Hyesoon Kim","doi":"10.1109/ICCD.2008.4751903","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751903","url":null,"abstract":"Multiprocessor architectures are becoming popular in both desktop and mobile processors. Among multiprocessor architectures, asymmetric architectures show promise in saving energy and power. However, the performance and energy consumption behavior of asymmetric multiprocessors with desktop-oriented multithreaded applications has not been studied widely. In this study, we measure performance and power consumption in asymmetric and symmetric multiprocessors using real 8 and 16 processor systems to understand the relationships between thread interactions and performance/power behavior. We find that when the workload is asymmetric, using an asymmetric multiprocessor can save energy, but for most of the symmetric workloads, using a symmetric multiprocessor (with the highest clock frequency) consumes less energy.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115411711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Configurable rectilinear Steiner tree construction for SoC and nano technologies 可配置的线性斯坦纳树结构的SoC和纳米技术
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751837
I. Jiang, Yen-Ting Yu
The rectilinear Steiner minimal tree (RSMT) problem is essential in physical design. Moreover, the variant constraints for fabrication issues, including obstacle avoidance, multiple routing layers, layer-specific routing directions, cannot be ignored during RSMT construction for modern SoC and nano technologies. This paper proposes a construction-by-correction approach for obstacle-avoiding preferred direction rectilinear Steiner tree construction. Experimental results show that our algorithm is promising and outperforms the state-of-the-art works.
线性斯坦纳最小树(RSMT)问题在物理设计中是必不可少的。此外,在现代SoC和纳米技术的RSMT构建过程中,制造问题的各种约束,包括避障、多路由层、层特定路由方向,都是不可忽视的。提出了一种避障优先方向直线斯坦纳树构造的修正构造方法。实验结果表明,该算法具有较好的应用前景。
{"title":"Configurable rectilinear Steiner tree construction for SoC and nano technologies","authors":"I. Jiang, Yen-Ting Yu","doi":"10.1109/ICCD.2008.4751837","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751837","url":null,"abstract":"The rectilinear Steiner minimal tree (RSMT) problem is essential in physical design. Moreover, the variant constraints for fabrication issues, including obstacle avoidance, multiple routing layers, layer-specific routing directions, cannot be ignored during RSMT construction for modern SoC and nano technologies. This paper proposes a construction-by-correction approach for obstacle-avoiding preferred direction rectilinear Steiner tree construction. Experimental results show that our algorithm is promising and outperforms the state-of-the-art works.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127255182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A family of scalable FFT architectures and an implementation of 1024-point radix-2 FFT for real-time communications 一个可扩展的FFT体系结构家族和用于实时通信的1024点基数-2 FFT实现
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751880
A. Suleiman, H. Saleh, A. Hussein, D. Akopian
The paper presents a family of architectures for FFT implementation based on the decomposition of the perfect shuffle permutation, which can be designed with variable number of processing elements. This provides designers with a trade-off choice of speed vs. complexity (cost and area.). A detailed case study is provided on the implementation of 1024-point FFT with 2 processing elements using 45 nm process technology, including area, timing, power and place-and-route results.
本文提出了一种基于完美洗牌排列分解的FFT实现体系结构,该体系结构可以设计为可变数量的处理元素。这为设计师提供了速度与复杂性(成本和面积)之间的权衡选择。详细的案例研究了采用45纳米工艺技术的2个处理元件实现1024点FFT,包括面积、时间、功率和位置和路由结果。
{"title":"A family of scalable FFT architectures and an implementation of 1024-point radix-2 FFT for real-time communications","authors":"A. Suleiman, H. Saleh, A. Hussein, D. Akopian","doi":"10.1109/ICCD.2008.4751880","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751880","url":null,"abstract":"The paper presents a family of architectures for FFT implementation based on the decomposition of the perfect shuffle permutation, which can be designed with variable number of processing elements. This provides designers with a trade-off choice of speed vs. complexity (cost and area.). A detailed case study is provided on the implementation of 1024-point FFT with 2 processing elements using 45 nm process technology, including area, timing, power and place-and-route results.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126157235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Timing analysis considering IR drop waveforms in power gating designs 功率门控设计中考虑红外降波的时序分析
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751912
Shih-Hung Weng, Yu-Min Kuo, Shih-Chieh Chang, M. Marek-Sadowska
IR drop noise has become a critical issue in advanced process technologies. Traditionally, timing analysis in which the IR drop noise is considered assumes a worst-case IR drop for each gate; however, using this assumption provides unduly pessimistic results. In this paper, we describe a timing analysis approach for power gating designs. To improve the accuracy of the gate delay calculation we determine the virtual voltage level by taking into account the IR drop waveforms across the sleep transistors. These can be obtained efficiently using a linear programming approach. Our experimental results are very promising.
红外降噪已成为先进工艺技术中的关键问题。传统上,考虑红外降噪声的时序分析假设每个栅极的最坏情况下的红外降;然而,使用这种假设提供了过于悲观的结果。在本文中,我们描述了一种功率门控设计的时序分析方法。为了提高栅极延迟计算的准确性,我们通过考虑休眠晶体管间的红外降波形来确定虚电压电平。这些可以用线性规划方法有效地得到。我们的实验结果很有希望。
{"title":"Timing analysis considering IR drop waveforms in power gating designs","authors":"Shih-Hung Weng, Yu-Min Kuo, Shih-Chieh Chang, M. Marek-Sadowska","doi":"10.1109/ICCD.2008.4751912","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751912","url":null,"abstract":"IR drop noise has become a critical issue in advanced process technologies. Traditionally, timing analysis in which the IR drop noise is considered assumes a worst-case IR drop for each gate; however, using this assumption provides unduly pessimistic results. In this paper, we describe a timing analysis approach for power gating designs. To improve the accuracy of the gate delay calculation we determine the virtual voltage level by taking into account the IR drop waveforms across the sleep transistors. These can be obtained efficiently using a linear programming approach. Our experimental results are very promising.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126456009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Optimizing data sharing and address translation for the Cell BE Heterogeneous Chip Multiprocessor 优化Cell BE异构芯片多处理器的数据共享和地址转换
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751904
M. Gschwind
Heterogeneous Chip Multiprocessors (HMPs), such as the Cell Broadband Engine, offer a new design optimization opportunity by allowing designers to provide accelerators for application specific domains. Data sharing between CPUs and accelerators, and memory access mechanisms and protocols are crucial decisions in the design of an HMP. In this article, we analyze the choices between hardware and software managed coherence between CPU and accelerators for DMA-based data sharing, and find that hardware-coherent DMA shows a performance benefit of up to 3x, even for simple workloads.We explore memory address translation architecture choices for DMA-based data sharing. In multiprogramming environments, address translation is commonly used to separate processes. For efficiency, direct access to system memory requires address translation capabilities in the accelerator. We find that hardware managed address translation shows a performance benefit of up to 5x, even for simple workloads, by avoiding the costs of accelerator/CPU communication and supervisor management of the translation context and the introduction of a serial bottleneck on the CPU.
异构芯片多处理器(hmp),如Cell宽带引擎,通过允许设计人员为特定应用领域提供加速器,提供了新的设计优化机会。cpu和加速器之间的数据共享以及内存访问机制和协议是HMP设计中的关键决策。在本文中,我们分析了基于DMA的数据共享的CPU和加速器之间的硬件和软件管理一致性的选择,并发现硬件一致的DMA显示了高达3倍的性能优势,即使对于简单的工作负载也是如此。我们探索了基于dma的数据共享的内存地址转换架构选择。在多道程序设计环境中,地址转换通常用于分离进程。为了提高效率,直接访问系统内存需要加速器中的地址转换功能。我们发现,硬件管理的地址转换显示了高达5倍的性能优势,即使对于简单的工作负载,通过避免加速器/CPU通信和翻译上下文的主管管理的成本,以及在CPU上引入串行瓶颈。
{"title":"Optimizing data sharing and address translation for the Cell BE Heterogeneous Chip Multiprocessor","authors":"M. Gschwind","doi":"10.1109/ICCD.2008.4751904","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751904","url":null,"abstract":"Heterogeneous Chip Multiprocessors (HMPs), such as the Cell Broadband Engine, offer a new design optimization opportunity by allowing designers to provide accelerators for application specific domains. Data sharing between CPUs and accelerators, and memory access mechanisms and protocols are crucial decisions in the design of an HMP. In this article, we analyze the choices between hardware and software managed coherence between CPU and accelerators for DMA-based data sharing, and find that hardware-coherent DMA shows a performance benefit of up to 3x, even for simple workloads.We explore memory address translation architecture choices for DMA-based data sharing. In multiprogramming environments, address translation is commonly used to separate processes. For efficiency, direct access to system memory requires address translation capabilities in the accelerator. We find that hardware managed address translation shows a performance benefit of up to 5x, even for simple workloads, by avoiding the costs of accelerator/CPU communication and supervisor management of the translation context and the introduction of a serial bottleneck on the CPU.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125674744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2008 IEEE International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1