2007 25th International Conference on Computer Design最新文献

英文中文

Cluster-level simultaneous multithreading for VLIW processors VLIW处理器的集群级同步多线程

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601890

Manoj Gupta, F. Sánchez, J. Llosa

Clustered VLIW embedded processors have become widespread due to benefits of simple hardware and low power. However, while some applications exhibit large amounts of instruction level parallelism (ILP) and benefit from very wide machines, others have little ILP, which wastes precious resources in wide processors. Simultaneous multithreading (SMT) is a well known technique that improves resource utilization by exploiting thread level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures. In this paper, we propose CSMT (cluster-level simultaneous multithreading) to allow some degree of SMT in clustered VLIW processors with minimal hardware cost and complexity. CSMT considers the set of operations that execute simultaneously in a given cluster (named bundle) as the assignment unit. All bundles belonging to a VLIW instruction from a given thread are issued simultaneously. To minimize cluster conflicts between threads, a very simple hardware- based cluster renaming mechanism is proposed. The experimental results show that CSMT significantly improves ILP when compared with other multithreading approaches suited for VLIW. For instance, with 4 threads CSMT shows an average speedup of 113% over a single-thread VLIW architecture and 36% over interleaved multithreading (IMT). In some cases, speedup can be as high as 228% over single thread architecture and 97% over IMT.

集群VLIW嵌入式处理器由于硬件简单和功耗低的优点而变得广泛。然而，虽然一些应用程序表现出大量的指令级并行性(ILP)并受益于非常宽的机器，但其他应用程序只有很少的ILP，这浪费了宽处理器中的宝贵资源。同步多线程(SMT)是一种众所周知的技术，它通过在指令粒度级别上利用线程级别的并行性来提高资源利用率。然而，为vliw实现SMT需要复杂的结构。在本文中，我们提出了CSMT(集群级同步多线程)，以最小的硬件成本和复杂性在集群VLIW处理器中允许一定程度的SMT。CSMT将在给定集群(命名为bundle)中同时执行的一组操作视为分配单元。来自给定线程的属于VLIW指令的所有包将同时发出。为了减少线程间的集群冲突，提出了一种非常简单的基于硬件的集群重命名机制。实验结果表明，与其他适用于VLIW的多线程方法相比，CSMT显著提高了ILP。例如，使用4线程时，CSMT比单线程VLIW体系结构平均加速113%，比交错多线程(IMT)平均加速36%。在某些情况下，与单线程架构相比，加速可高达228%，与IMT相比可高达97%。

{"title":"Cluster-level simultaneous multithreading for VLIW processors","authors":"Manoj Gupta, F. Sánchez, J. Llosa","doi":"10.1109/ICCD.2007.4601890","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601890","url":null,"abstract":"Clustered VLIW embedded processors have become widespread due to benefits of simple hardware and low power. However, while some applications exhibit large amounts of instruction level parallelism (ILP) and benefit from very wide machines, others have little ILP, which wastes precious resources in wide processors. Simultaneous multithreading (SMT) is a well known technique that improves resource utilization by exploiting thread level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures. In this paper, we propose CSMT (cluster-level simultaneous multithreading) to allow some degree of SMT in clustered VLIW processors with minimal hardware cost and complexity. CSMT considers the set of operations that execute simultaneously in a given cluster (named bundle) as the assignment unit. All bundles belonging to a VLIW instruction from a given thread are issued simultaneously. To minimize cluster conflicts between threads, a very simple hardware- based cluster renaming mechanism is proposed. The experimental results show that CSMT significantly improves ILP when compared with other multithreading approaches suited for VLIW. For instance, with 4 threads CSMT shows an average speedup of 113% over a single-thread VLIW architecture and 36% over interleaved multithreading (IMT). In some cases, speedup can be as high as 228% over single thread architecture and 97% over IMT.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"29 1","pages":"121-128"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87082564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Algorithms to simplify multi-clock/edge timing constraints 简化多时钟/边缘时序约束的算法

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601937

V. Nagbhushan, C. Y. Chen

The use of multiple clocks has become a common practice in modern microprocessor design. With multiple clocks, the timing specifications have become complicated and tend to go beyond the ability of single-clock based CAD tools. This paper first introduces the concept of timing specification transformation. Then, this paper describes algorithms for transforming an interface timing specification with multiple clocks/edges into an equivalent specification with a single clock/edge for combinational circuit blocks. It formulates a new optimization problem, which is important but has never been addressed by CAD researchers. It identifies conditions under which this transformation can be performed efficiently without any loss of timing budget. The algorithm can be used to simplify the constraints to drive many synthesis and optimization algorithms.

在现代微处理器设计中，使用多个时钟已经成为一种普遍的做法。有了多个时钟，时序规范变得复杂，并且往往超出了基于单时钟的CAD工具的能力。本文首先介绍了时序规范变换的概念。然后，本文描述了将具有多个时钟/边缘的接口时序规范转换为具有单个时钟/边缘的等效规范的算法。它提出了一个新的优化问题，这是一个重要的问题，但从未被CAD研究者解决。它确定了在没有任何时间预算损失的情况下可以有效地执行此转换的条件。该算法可用于简化约束，从而驱动许多综合和优化算法。

引用次数: 1

VOSCH: Voltage scaled cache hierarchies VOSCH:电压缩放缓存层次结构

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601944

W. Wong, Cheng-Kok Koh, Yiran Chen, Hai Helen Li

The cache hierarchy of state-of-the-art - especially multicore - microprocessors consumes a significant amount of area and energy. A significant amount of research has been devoted especially to reducing the latter. One of the most important microarchitectural techniques proposed for the energy management is dynamic voltage scaling (DVS). In DVS solutions, each cache operates at a number of different voltages. Most of the research in DVS techniques have been around how the voltages can be adjusted and tuned. In this paper, we depart from the use of DVS for energy conservation by examining static voltage assignments for caches. We propose the use of voltage scaled cache hierarchies (VOSCH) as a means to conserve both static and dynamic energy. In VOSCH, the caches are powered at progressively lower supply voltages as the cache level increases. Compared to DVS solutions, VOSCH is simple, potentially more robust and can conserve more energy. We also experimented with more aggressive designs that included the addition of small cache structures to VOSCH. Even greater energy savings were achieved without having to sacrifice performance.

最先进的——尤其是多核的——微处理器的缓存层次结构消耗了大量的面积和能量。大量的研究特别致力于减少后者。动态电压缩放(DVS)是能源管理中最重要的微体系结构技术之一。在分布式交换机解决方案中，每个缓存在多个不同的电压下运行。大多数关于分布式交换机技术的研究都围绕着如何调整和调谐电压。在本文中，我们通过检查缓存的静态电压分配来脱离分布式交换机的节能使用。我们建议使用电压缩放缓存层次结构(VOSCH)作为保存静态和动态能量的一种手段。在VOSCH中，随着缓存电平的增加，缓存以逐渐降低的电源电压供电。与分布式交换机解决方案相比，VOSCH简单，可能更强大，并且可以节省更多能源。我们还尝试了更激进的设计，包括在VOSCH中添加小型缓存结构。在不牺牲性能的情况下实现了更大的能源节约。

引用次数: 15

Challenges and prospects of SDR for mobile phones 手机SDR的挑战与前景

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601903

U. Ramacher

A nonvolatile semiconductor memory apparatus is provided which comprises a flip-flop circuit formed of a pair of MOS FETs and a pair of MNOS FETs coupled to the bistable output terminals of the flip-flop circuit, respectively. The memory apparatus further has a pair of MOS FETs coupled to have the current paths in parallel with the current paths of the pair of MOS FETs of the flip-flop circuit.

提供了一种非易失性半导体存储装置，其包括由一对MOS fet和一对MNOS fet组成的触发器电路，分别耦合到触发器电路的双稳输出端。存储装置还具有一对耦合的MOS fet，使其电流路径与触发器电路的一对MOS fet的电流路径平行。

引用次数: 1

A power gating scheme for ground bounce reduction during mode transition 一种在模式转换过程中减少地面反弹的功率门控方案

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601929

Ku He, Rong Luo, Yu Wang

Power gating is an effective method to reduce leakage power during the circuit sleep mode; however, it introduces the ground bounce problem and has considerable energy consumption during the mode transitions. To mitigate the ground bounce, we propose a novel power gating scheme that reduces the magnitude of the peak current and voltage glitches as well as the time to stabilize power and ground during mode transitions. To further decrease the wakeup time while keep the energy efficiency, we introduce two improved circuit schemes with two intermediate states, based on our proposed power gating scheme. The scheme provides an average peak voltage reduction of 67.0%, and the wakeup time reduction is up to 62.3%. If the circuits use the intermediate schemes, wakeup time can be further reduced by a maximum of 95.7%. Beside these reductions, our proposed circuit scheme also has the advantage of small size and flexible controllability.

功率门控是降低电路休眠时漏功率的有效方法;然而，它引入了地面反弹问题，并且在模式转换期间有相当大的能量消耗。为了减轻地弹跳，我们提出了一种新的功率门控方案，该方案减少了峰值电流和电压故障的幅度，以及在模式转换期间稳定电源和地的时间。为了进一步减少唤醒时间，同时保持能量效率，我们在提出的功率门控方案的基础上，引入了两种中间状态的改进电路方案。该方案提供平均峰值电压降低67.0%，唤醒时间减少高达62.3%。如果电路使用中间方案，唤醒时间可以进一步减少95.7%。除了这些优点外，我们所提出的电路方案还具有体积小，可控灵活的优点。

引用次数: 31

FPGA global routing architecture optimization using a multicommodity flow approach 使用多商品流方法的FPGA全局路由架构优化

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601893

Yuanfang Hu, Yi Zhu, M. Taylor, Chung-Kuan Cheng

Low energy and small switch area usage are two of the important design objectives in FPGA global routing architecture design. This paper presents an improved MCF model based CAD flow that performs aggressive optimizations, such as topology and wire style optimizations, to reduce the energy and switch area of FPGA global routing architectures. The experiments show that when compared to traditional mesh architecture, the optimized FPGA routing architectures achieve up to 10% to 15% energy savings and up to 20% switch area savings in average for a set of seven benchmark circuits.

低功耗和小开关面积是FPGA全局路由架构设计的两个重要设计目标。本文提出了一种改进的基于MCF模型的CAD流程，该流程执行积极的优化，如拓扑和线路样式优化，以减少FPGA全局路由架构的能量和开关面积。实验表明，与传统的网格结构相比，优化后的FPGA路由架构在一组7个基准电路中平均节省了10% ~ 15%的能量，节省了20%的开关面积。

引用次数: 2

On modeling impact of sub-wavelength lithography on transistors 亚波长光刻技术对晶体管建模的影响

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601884

Aswin Sreedhar, S. Kundu

As the VLSI technology marches beyond 65 and 45 nm process technologies, variation in gate length has a direct impact on leakage and performance of CMOS transistors. Due to sub-wavelength lithography, the shape of the transistor often differs from idealized rectangles. In silicon, the effective channel length of a transistor varies across its width. This is a modeling problem. The average effective channel length is different for ON current and OFF currents, making it difficult, if not impossible for a single Leff to accurately represent both. In this paper, we report an accurate post-litho non-rectangular transistor modeling methodology. We further studied the impact of focus and dose variations in lithographic process on transistor parameters. The resulting transistor models were applied for standard cell characterization in successive steps of lithographic simulation of layout and device characterization. Results show that the new models can improve the accuracy of estimation of leakage current by 40% or more over a nominal model that is primarily tuned for ON current.

随着超大规模集成电路技术超越65纳米和45纳米工艺，栅极长度的变化对CMOS晶体管的漏损和性能有直接影响。由于亚波长光刻技术，晶体管的形状往往与理想的矩形不同。在硅中，晶体管的有效沟道长度随其宽度而变化。这是一个建模问题。ON电流和OFF电流的平均有效通道长度是不同的，这使得单个Leff很难(如果不是不可能的话)准确地表示两者。在本文中，我们报告了一种精确的后光刻非矩形晶体管建模方法。我们进一步研究了光刻过程中焦距和剂量变化对晶体管参数的影响。所得到的晶体管模型在平面布局和器件表征的光刻模拟的连续步骤中用于标准电池表征。结果表明，与主要针对导通电流进行调整的标称模型相比，新模型可以将泄漏电流的估计精度提高40%以上。

引用次数: 11

Non-arithmetic carry chains for reconfigurable fabrics 用于可重构结构的非算术进位链

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601892

Michael T. Frederick, Arun Kumar Somani

Reconfigurable fabrics cater to a wide variety of applications, but have adopted specialized components to allow efficient implementation of performance-critical arithmetic operations. Carry chains have been integrated into the fabric typically as an optimized ripple-carry chain. However, in non-arithmetic operations the carry chain goes unused, when it could be a valuable adjacent-cell interconnect resource. This paper presents a cell architecture facilitating reuse, as well as an analysis of the potential benefits of reuse for an sampling of common of algorithms using commercial FPGAs. Technology map experiments indicate that a variety of applications can benefit from reuse, with utilized routing resources reduced by up to 13% and maximum clock frequency increased by up to 47%.

可重构结构迎合了各种各样的应用程序，但采用了专门的组件来允许高效地实现性能关键的算术运算。Carry链通常作为优化的波纹Carry链集成到织物中。然而，在非算术运算中，当进位链可能是一个有价值的邻接单元互连资源时，它是不被使用的。本文提出了一种促进重用的单元结构，并分析了使用商用fpga的通用算法采样的重用的潜在好处。技术地图实验表明，各种应用程序都可以从重用中受益，所利用的路由资源最多减少13%，最大时钟频率最多增加47%。

引用次数: 4

Evaluating voltage islands in CMPs under process variations 工艺变化下cmp电压岛的评估

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601891

Abhishek Das, S. Ozdemir, G. Memik, A. Choudhary

Parameter variations are a major factor causing power-performance asymmetry in chip multiprocessors. In this paper, we analyze the effects of with-in-die (WID) process variations on chip multicore processors and then apply a variable voltage island scheme to minimize power dissipation. Our idea is based on the observation that due to process variations, the critical paths in each core are likely to have a different latencies resulting in core-to-core (C2C) variations. As a result, each core can operate correctly under different supply voltage levels, achieving an optimal power consumption level. Particularly, we analyze voltage islands at different granularities ranging from a single core to a group of cores. We show that the dynamic power consumption can be reduced by up to 36.2% when each core can set its individual supply voltage level. In addition, for most manufacturing technologies, significant power savings can be achieved with only a few voltage islands on the whole chip: a single customized voltage setting can reduce the power consumption by up to 31.5%. Since the nominal operating frequency remains unchanged after the modifications, our scheme incurs no performance overhead.

在芯片多处理器中，参数变化是导致功率性能不对称的主要因素。在本文中，我们分析了芯片内(WID)工艺变化对芯片多核处理器的影响，然后应用可变电压岛方案来最小化功耗。我们的想法是基于这样的观察，即由于进程变化，每个核心中的关键路径可能具有不同的延迟，从而导致核心到核心(C2C)变化。因此，每个核心可以在不同的电源电压水平下正常工作，实现最佳的功耗水平。特别是，我们分析了从单个磁芯到一组磁芯的不同粒度的电压岛。我们表明，当每个核心可以设置其单独的电源电压水平时，动态功耗可降低36.2%。此外，对于大多数制造技术而言，仅通过整个芯片上的几个电压孤岛就可以实现显著的节能:单个定制电压设置可以将功耗降低高达31.5%。由于标称工作频率在修改后保持不变，因此我们的方案不会产生性能开销。

{"title":"Evaluating voltage islands in CMPs under process variations","authors":"Abhishek Das, S. Ozdemir, G. Memik, A. Choudhary","doi":"10.1109/ICCD.2007.4601891","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601891","url":null,"abstract":"Parameter variations are a major factor causing power-performance asymmetry in chip multiprocessors. In this paper, we analyze the effects of with-in-die (WID) process variations on chip multicore processors and then apply a variable voltage island scheme to minimize power dissipation. Our idea is based on the observation that due to process variations, the critical paths in each core are likely to have a different latencies resulting in core-to-core (C2C) variations. As a result, each core can operate correctly under different supply voltage levels, achieving an optimal power consumption level. Particularly, we analyze voltage islands at different granularities ranging from a single core to a group of cores. We show that the dynamic power consumption can be reduced by up to 36.2% when each core can set its individual supply voltage level. In addition, for most manufacturing technologies, significant power savings can be achieved with only a few voltage islands on the whole chip: a single customized voltage setting can reduce the power consumption by up to 31.5%. Since the nominal operating frequency remains unchanged after the modifications, our scheme incurs no performance overhead.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"38 3 1","pages":"129-136"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79388788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Effective Dynamic Thermal Management for MPEG-4 decoding MPEG-4解码的有效动态热管理

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601962

Inchoon Yeo, Heung-Ki Lee, Eun Jung Kim, K. H. Yum

This paper proposes dynamic thermal management (DTM) based on a dynamic voltage and frequency scaling (DVFS) technique for MPEG-4 decoding to guarantee thermal safety while maintaining a quality of service (QoS) constraint. Although many low-power and low-temperature multimedia playback techniques have been proposed, most of them are impractical in real-time and have several restricting assumptions. Multimedia data consists of several frames requiring different decoding efforts. Since both temperature and performance of a multimedia system are affected by the complexity of scenes, our main idea is to use the information on scene complexity to find an appropriate frequency. In order to predict the complexity of the current scene, we extract information from the previous group of pictures (GOP) using feedback control with a display buffer. Experimental results with twelve movies show that our DTM scheme guarantees the threshold of temperature (70degC) while maintaining 0% frame miss ratio. Also, our DTM scheme decreases the average temperature by up to 13% without any additional hardware and playback latency.

本文提出了一种基于动态电压和频率缩放(DVFS)技术的MPEG-4解码动态热管理(DTM)，以保证热安全，同时保持服务质量(QoS)约束。虽然已经提出了许多低功耗和低温的多媒体播放技术，但大多数技术在实时情况下是不切实际的，并且有一些限制性的假设。多媒体数据由几个帧组成，需要不同的解码努力。由于多媒体系统的温度和性能都受到场景复杂性的影响，我们的主要思想是利用场景复杂性的信息来找到合适的频率。为了预测当前场景的复杂性，我们使用带有显示缓冲区的反馈控制从前一组图片(GOP)中提取信息。12部电影的实验结果表明，我们的DTM方案在保证温度阈值(70℃)的同时保持0%的帧丢失率。此外，我们的DTM方案在没有任何额外硬件和播放延迟的情况下，将平均温度降低了13%。

引用次数: 14

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2007 25th International Conference on Computer Design

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀