Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors最新文献

英文中文

A floating point radix 2 shared division/square root chip 一个浮点基数2共享除法/平方根芯片

Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors

Pub Date : 1995-10-02 DOI: 10.1109/ICCD.1995.528910

H. Srinivas, K. Parhi

This paper presents the architecture and implementation of a full-custom 1.2 micron CMOS VLSI chip that executes a shared division/square root algorithm operating on mantissas (23-b in length) of single precision IEEE 754 std. floating point numbers. The division and square root algorithms used in this implementation are the radix 2 signed digit based digit-by-digit schemes. These two algorithms perform quotient/root digit selection using two most-significant digits of the partial remainder and are hence faster than other similar previously proposed radix 2 shared division/square root schemes. This chip runs at a clock rate of about 66 MHz at 5.0 V (from simulations) and requires 29 cycles per divide/square root operation from the time the operands are provided at its pin inputs.

本文介绍了一种全定制的1.2微米CMOS VLSI芯片的结构和实现，该芯片在单精度IEEE 754标准浮点数尾数(长度为23-b)上执行共享除法/平方根算法。在这个实现中使用的除法和平方根算法是基于基数2带符号的逐位方案。这两种算法使用部分余数的两位最高有效位数执行商/根位数选择，因此比其他类似的先前提出的基数2共享除法/平方根方案更快。该芯片在5.0 V时以大约66 MHz的时钟速率运行(来自模拟)，并且从在其引脚输入处提供操作数开始，每次除法/平方根操作需要29个周期。

引用次数: 4

High level profiling based low power synthesis technique 基于高阶轮廓的低功率合成技术

Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors

Pub Date : 1995-10-02 DOI: 10.1109/ICCD.1995.528906

S. Katkoori, Nand Kumar, R. Vemuri

We present a profiling based technique for power estimation. This technique is implemented in the PDSS (Profile Driven Synthesis System) for the synthesis of low power designs. Initially, each module in the module library is characterized for the average switching capacitance per input vector. The input description is simulated using user-specified set of input vectors to collect the profile data for various operators and carriers. The profile data, in conjunction with the pre-characterized module library is used to estimate the total capacitance switched by each of the valid schedules produced by the PDSS scheduler. A valid schedule is one which satisfies other constants such as area and delay. The schedule with the least switching capacitance estimate is further synthesized to the layout level. Results show an average deviation of 12% compared with the actual switching capacitance values at the layout level.

我们提出了一种基于分析的功率估计技术。该技术在PDSS(轮廓驱动合成系统)中实现，用于低功耗设计的合成。最初，模块库中的每个模块的特征是每个输入矢量的平均开关电容。使用用户指定的一组输入向量来模拟输入描述，以收集各种运营商和载波的轮廓数据。配置文件数据与预表征模块库一起用于估计由PDSS调度程序产生的每个有效调度所切换的总电容。一个有效的调度是一个满足其他常数，如面积和延迟。将开关电容估计值最小的调度进一步综合到布局级。结果表明，与实际开关电容值相比，在布局水平上的平均偏差为12%。

引用次数: 14

Control unit synthesis targeting low-power processors 针对低功耗处理器的控制单元合成

Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors

Pub Date : 1995-10-02 DOI: 10.1109/ICCD.1995.528907

Chuan-Yu Wang, K. Roy

With demands for reliability and further integration, reducing power consumption becomes a critical concern in today's processor design. Considering the different techniques to minimize power consumption and promote system's reliability, reducing switching activity of CMOS circuits is a promising area to be explored. Motivated by these, we propose two optimization schemes which can be incorporated into processor's control unit synthesis to lower power dissipation. The first one, a low-power decoding scheme, utilizes graph embedding and logic minimization techniques to refine the decoding structure in processor's control unit. To get further optimization for those control units in nanoprogrammed or microprogrammed architecture, the second scheme is proposed to optimally assign ZERO or ONE to the don't-care bits distributed in nanocontrol memory or control memory, to significantly reduce switching activity within the control unit and/or on the path from control unit to data processing unit. To achieve these two goals efficiently, we have used pseudo-Boolean programming to optimize the synthesis parameters. Based on a subset of 8086 instruction set, experimental results show that 15.8 percent improvement is obtained by properly encoding instruction opcodes, and 4.9 to 16.6 percent improvement can be obtained from a optimal don't-care bits assignment.

随着对可靠性和进一步集成的要求，降低功耗成为当今处理器设计中的一个关键问题。考虑到降低功耗和提高系统可靠性的不同技术，降低CMOS电路的开关活度是一个有前景的探索领域。在此基础上，我们提出了两种优化方案，可将其整合到处理器的控制单元合成中，以降低功耗。第一种是低功耗解码方案，利用图嵌入和逻辑最小化技术来优化处理器控制单元的解码结构。为了进一步优化纳米编程或微编程架构中的控制单元，提出了第二种方案，将零或一分配给分布在纳米控制存储器或控制存储器中的不关心位，以显着减少控制单元内和/或从控制单元到数据处理单元的路径上的切换活动。为了有效地实现这两个目标，我们使用伪布尔编程来优化合成参数。实验结果表明，在8086指令集的一个子集上，通过对指令操作码进行适当的编码可以提高15.8%的性能，而通过最优的不关心位分配可以提高4.9 ~ 16.6%的性能。

{"title":"Control unit synthesis targeting low-power processors","authors":"Chuan-Yu Wang, K. Roy","doi":"10.1109/ICCD.1995.528907","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528907","url":null,"abstract":"With demands for reliability and further integration, reducing power consumption becomes a critical concern in today's processor design. Considering the different techniques to minimize power consumption and promote system's reliability, reducing switching activity of CMOS circuits is a promising area to be explored. Motivated by these, we propose two optimization schemes which can be incorporated into processor's control unit synthesis to lower power dissipation. The first one, a low-power decoding scheme, utilizes graph embedding and logic minimization techniques to refine the decoding structure in processor's control unit. To get further optimization for those control units in nanoprogrammed or microprogrammed architecture, the second scheme is proposed to optimally assign ZERO or ONE to the don't-care bits distributed in nanocontrol memory or control memory, to significantly reduce switching activity within the control unit and/or on the path from control unit to data processing unit. To achieve these two goals efficiently, we have used pseudo-Boolean programming to optimize the synthesis parameters. Based on a subset of 8086 instruction set, experimental results show that 15.8 percent improvement is obtained by properly encoding instruction opcodes, and 4.9 to 16.6 percent improvement can be obtained from a optimal don't-care bits assignment.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134561567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Systolic algorithms for tree pattern matching 树模式匹配的收缩算法

Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors

Pub Date : 1995-10-02 DOI: 10.1109/ICCD.1995.528937

A. Ejnioui, N. Ranganathan

The objective of tree matching is to find the set of nodes at which a pattern tree matches a subject tree. Several sequential and parallel algorithms have been proposed in the literature for this compute bound problem. Most of the parallel algorithms are based on the theoretical PRAM model of computation. In this paper, we propose two efficient parallel algorithms for tree pattern matching based on the linear systolic array model. The algorithms can be mapped onto any SIMD machine. The algorithms require O(n+m) time to perform the matching using either n or m processors, where n is the size of the subject tree and m is the size of the pattern tree. The algorithms represent a significant improvement over the existing ones in view of implementation.

树匹配的目标是找到模式树与主题树匹配的节点集。针对这一计算界问题，文献中已经提出了几种顺序和并行算法。大多数并行算法都是基于理论上的PRAM计算模型。本文提出了两种基于线性收缩阵列模型的高效并行树模式匹配算法。这些算法可以映射到任何SIMD机器上。该算法需要O(n+m)时间来使用n或m个处理器执行匹配，其中n是主题树的大小，m是模式树的大小。这些算法在实现上比现有算法有了很大的改进。

引用次数: 5

Accurate and efficient layout-to-circuit extraction for high-speed MOS and bipolar/BiCMOS integrated circuits 高速MOS和双极/BiCMOS集成电路的精确和高效的布局到电路提取

Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors

Pub Date : 1995-10-02 DOI: 10.1109/ICCD.1995.528834

F. Beeftink, A. V. Genderen, N. V. D. Meijs

In this paper, we describe how we have exploited the advantages of various methods for device recognition and modeling in a layout-to-circuit extractor, called Space. Hence, we have obtained a program that, for different technologies, can quickly translate a large layout into an equivalent network. The network includes layout parasitics of the interconnects and can directly be simulated by various simulation packages, such as Spice. The efficiency and accuracy of the extractor are confirmed by experimental results and enable a fast and reliable layout verification for both MOS and bipolar/BiCMOS technologies.

在本文中，我们描述了我们如何利用各种方法在称为空间的布局电路提取器中进行设备识别和建模的优势。因此，我们获得了一个程序，对于不同的技术，可以快速将大型布局转换为等效网络。该网络包含互连的布局寄生，可以直接通过各种仿真包(如Spice)进行仿真。实验结果证实了提取器的效率和准确性，并为MOS和双极/BiCMOS技术提供了快速可靠的布局验证。

引用次数: 7

A CMOS wave-pipelined image processor for real-time morphology 用于实时形态学的CMOS波流水线图像处理器

Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors

Pub Date : 1995-10-02 DOI: 10.1109/ICCD.1995.528935

R. Krishnamurthy, R. Sridhar

This paper presents the implementation of a high-speed morphological image processor using CMOS wave-pipelining. A modular and expandable architecture, based on wave-pipelined transmission gate logic, has been developed for gray-scale and binary morphological operators. Using this architecture, 3/spl times/3 (2-dimensional) structuring element binary dilation and erosion units, and a two-stage morphological skeleton transform filter have been implemented in CMOS 1.2 /spl mu/m technology. The operating frequency is 333 MHz, which exceeds the speeds reported in literature for this functionality. Simulation results indicate a speed-up of 4-5 compared to non-pipelined processor implementations. The wave-pipelined implementation also offers a significant reduction in latency and hardware complexity compared to regular pipelined architectures.

本文介绍了一种利用CMOS波形流水线实现的高速形态图像处理器。针对灰度算子和二元形态算子，提出了一种基于波管道传输门逻辑的模块化可扩展结构。利用该结构，在CMOS 1.2 /spl mu/m技术上实现了3/spl倍/3(二维)结构单元二元膨胀和侵蚀单元以及两级形态骨架变换滤波器。工作频率为333兆赫，这超过了该功能的文献报道的速度。仿真结果表明，与非流水线处理器实现相比，速度提高了4-5。与常规的流水线架构相比，波式流水线实现还显著降低了延迟和硬件复杂性。

引用次数: 5

Statistical generalization: theory and applications 统计泛化:理论与应用

Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors

Pub Date : 1995-10-02 DOI: 10.1109/ICCD.1995.528783

B. Wah, Arthur Ieumwananonthachai, Shu Yao, T. Yu

In this paper, we discuss a new approach to generalize heuristic methods (HMs) to new test cases of an application, and conditions under which such generalization is possible. Generalization is difficult when performance values of HMs are characterized by multiple statistical distributions across subsets of test cases of an application. We define a new measure called probability of win and propose three methods to evaluate it: interval analysis, maximum likelihood estimate, and Bayesian analysis. We show experimental results on new HMs found for blind equalization and branch-and-bound search.

在本文中，我们讨论了一种将启发式方法(HMs)推广到应用程序的新测试用例的新方法，以及这种推广可能的条件。当HMs的性能值被应用程序测试用例子集的多个统计分布所表征时，泛化是困难的。我们定义了一种新的度量方法，称为获胜概率，并提出了三种评估方法:区间分析、最大似然估计和贝叶斯分析。我们给出了盲均衡和分支定界搜索的实验结果。

引用次数: 3

Statistics on concurrent fault and design error simulation 并发故障统计与设计误差仿真

Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors

Pub Date : 1995-10-02 DOI: 10.1109/ICCD.1995.528933

B. Grayson, S. Shaikh, S. Szygenda

Basic data of the nature presented here on fault and design error simulation processes have not been previously reported. Experiments are performed on c-sim, a gate level concurrent simulator developed at the University of Texas at Austin. Three types of statistics are considered: event based statistics, gate evaluation statistics and memory requirements. These statistics are important for design verification researchers and engineers for numerous reasons. For example, they help simulator developers tune up or optimize their concurrent simulators. They also fulfill the increasing need for experimental data concerning design error simulation. Most importantly, these statistics provide guidance to hardware accelerator designers in evaluating and comparing various design options.

这里介绍的关于故障和设计错误模拟过程的基本数据以前没有报道过。实验是在c-sim上进行的，c-sim是德克萨斯大学奥斯汀分校开发的门级并发模拟器。考虑了三种类型的统计:基于事件的统计、门评估统计和内存需求。这些统计数据对于设计验证研究人员和工程师来说很重要，原因有很多。例如，它们帮助模拟器开发人员调整或优化他们的并发模拟器。它们还满足了对设计误差仿真实验数据日益增长的需求。最重要的是，这些统计数据为硬件加速器设计人员评估和比较各种设计选项提供了指导。

引用次数: 2

An efficient cut-based algorithm on minimizing the number of L-shaped channels for safe routing ordering 一种有效的l形通道安全排序最小化算法

Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors

Pub Date : 1995-10-02 DOI: 10.1109/ICCD.1995.528835

Jin-Tai Yan

In this paper, based on the assumptions of the geometrical topology in a floorplan graph and the precedence relations in a channel precedence graph, the cuts are further classified into S-cuts, redundant L-cuts, balanced L-cuts, non-minimal L-cuts, non-critical L-cuts and critical L-cuts. An efficient cut-based algorithm on minimizing the number of L-shaped channels is proposed. The time complexity of the algorithm is proved to be in O(n) time, where n is the number of line segments in a floorplan graph. Finally, several examples have been tested on Dai's and Cai's algorithms and the proposed algorithm. The experimental results show that the proposed algorithm defines fewer L-shaped channels than Dai's and Cai's algorithms in the definition of straight and L-shaped channels for the assignment of safe routing ordering.

本文在平面图几何拓扑假设和通道优先图优先关系的基础上，将切口进一步划分为s形切口、冗余l形切口、平衡l形切口、非极小l形切口、非临界l形切口和临界l形切口。提出了一种有效的l形通道最小化算法。证明了该算法的时间复杂度为O(n)时间，其中n为平面图中线段的个数。最后，对Dai和Cai的算法以及所提出的算法进行了实例测试。实验结果表明，与Dai和Cai的算法相比，该算法在定义用于安全路由排序的直通道和l形通道方面定义了更少的l形通道。

引用次数: 3

PEPPER-a timing driven early floorplanner pepper——一个时间驱动的早期楼层规划师

Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors

Pub Date : 1995-10-02 DOI: 10.1109/ICCD.1995.528815

Vinod Narayananan, D. LaPotin, Rajesh K. Gupta, G. Vijayan

With increasing chip complexities and the requirement to reduce design time, early analysis is becoming increasingly important in the design of performance critical CMOS chips. As clock rates increase rapidly, interconnect delay consumes an appreciable portion of the chip cycle time, and the floorplan of the chip significantly affects its performance. This paper describes a system for early floorplan analysis of large designs. The floorplanner is designed to be used in the early stages of system design, to optimize performance, area and wireability targets before detailed implementation decisions are made. Most floorplanners which claim to optimize timing work only on a subset of paths during the floorplanning process. One novel feature of our floorplanner is that it performs static timing analysis during the floorplan optimization process, instead of working on a subset of the paths. The floorplanner incorporates various interactive and automatic floorplanning capabilities. The paper describes the floorplanning capabilities and algorithms as well as our experiences in using the tool.

随着芯片复杂性的增加和缩短设计时间的要求，早期分析在性能关键型CMOS芯片的设计中变得越来越重要。随着时钟速率的迅速增加，互连延迟消耗了芯片周期时间的相当一部分，芯片的平面设计显著影响其性能。本文介绍了一个大型设计的早期平面图分析系统。floorplanner设计用于系统设计的早期阶段，在制定详细的实施决策之前，优化性能、面积和可连接性目标。大多数声称优化时间的地板规划者在地板规划过程中只在路径的子集上工作。我们的地板规划器的一个新颖功能是，它在地板规划优化过程中执行静态定时分析，而不是在路径的子集上工作。地板规划器集成了各种交互式和自动地板规划功能。本文介绍了平面图的功能和算法，以及我们使用该工具的经验。

引用次数: 7

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀