首页 > 最新文献

Proceedings Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems最新文献

英文 中文
The design of an asynchronous TinyRISC/sup TM/ TR4101 microprocessor core 设计了一个异步的TinyRISC/sup TM/ TR4101微处理器内核
K. T. Christensen, P. Jensen, P. Korger, J. Sparsø
This paper presents the design of an asynchronous version of the TR4101 embedded microprocessor core developed by LSI Logic Inc. The asynchronous processor, called ARISC, was designed using the same CAD tools and the same standard cell library that was used to implement the TR4101. The paper reports on the design methodology, the architecture, the implementation, and the performance of the ARISC. This includes a comparison with the TR4101, and a detailed breakdown of the power consumption in the ARISC. ARISC is our first attempt at an asynchronous implementation and a number of simplifying decisions were made up front. Throughout the entire design we use four-phase handshaking in combination with a normally opaque latch controller. All logic is implemented using static logic standard cells. Despite this the ARISC performs surprisingly well: In 0.35 /spl mu/m CMOS performance is 74-123 MIPS depending on the instruction mix, and at 74 MIPS the power efficiency is 635 MIPS/Watt.
本文介绍了由LSI Logic公司开发的TR4101嵌入式微处理器内核的异步版本的设计。异步处理器称为ARISC,使用与实现TR4101相同的CAD工具和相同的标准单元库进行设计。本文介绍了ARISC的设计方法、体系结构、实现和性能。这包括与TR4101的比较,以及ARISC中功耗的详细细分。ARISC是我们对异步实现的第一次尝试,并且预先做出了许多简化决策。在整个设计中,我们使用四相握手与通常不透明的闩锁控制器相结合。所有逻辑都是使用静态逻辑标准单元实现的。尽管如此,ARISC的表现出奇地好:在0.35 /spl mu/m的CMOS下,根据指令组合的不同,性能为74-123 MIPS,而在74 MIPS时,功率效率为635 MIPS/瓦特。
{"title":"The design of an asynchronous TinyRISC/sup TM/ TR4101 microprocessor core","authors":"K. T. Christensen, P. Jensen, P. Korger, J. Sparsø","doi":"10.1109/ASYNC.1998.666498","DOIUrl":"https://doi.org/10.1109/ASYNC.1998.666498","url":null,"abstract":"This paper presents the design of an asynchronous version of the TR4101 embedded microprocessor core developed by LSI Logic Inc. The asynchronous processor, called ARISC, was designed using the same CAD tools and the same standard cell library that was used to implement the TR4101. The paper reports on the design methodology, the architecture, the implementation, and the performance of the ARISC. This includes a comparison with the TR4101, and a detailed breakdown of the power consumption in the ARISC. ARISC is our first attempt at an asynchronous implementation and a number of simplifying decisions were made up front. Throughout the entire design we use four-phase handshaking in combination with a normally opaque latch controller. All logic is implemented using static logic standard cells. Despite this the ARISC performs surprisingly well: In 0.35 /spl mu/m CMOS performance is 74-123 MIPS depending on the instruction mix, and at 74 MIPS the power efficiency is 635 MIPS/Watt.","PeriodicalId":425072,"journal":{"name":"Proceedings Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125798788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Asynchronous circuits and systems in superconducting RSFQ digital technology 超导RSFQ数字技术中的异步电路与系统
Z. J. Deng, S. Whiteley, T. Duzer, J. Tierno
Superconductive Rapid Single Flux Quantum (RSFQ) logic and memory, in which ones and zeros are represented by the presence or absence within a timing window of quantized picosecond voltage pulse (/spl int/v(t)dt=h/2e=2.07 mV/spl middot/ps), corresponding to one SFQ, can be integrated into a digital computing system with an operating rate of several tens of GHz, based on the present Nb Josephson junction integrated circuit technology. It is the most promising technology beyond semiconductor transistors for low-power high-end computation. However, as the operating speed of circuits and systems increase, timing uncertainty from fabrication process variations makes global synchronization very hard. In this paper, we present a globally asynchronous, locally synchronous timing methodology for RSFQ digital design, which can solve the global synchronization problem. We also demonstrate the recent experimental results of some asynchronous circuits and systems implemented in RSFQ technology. Several key components such as a self-timed shift register, a self-timed demultiplexor, a Muller-C element, a completion detector, and a clock generator have been designed and tested. High speed operation has been confirmed up to 20 Gb/s for a prototype data buffer system, which consists two self-timed shift registers and an on-chip 5-38 GHz clock generator.
超导快速单通量量子(RSFQ)逻辑存储器,其中1和0由量子化皮秒电压脉冲(/spl int/v(t)dt=h/2e=2.07 mV/spl middot/ps)的存在或不存在表示,对应一个SFQ,可以基于现有Nb Josephson结集成电路技术集成到一个工作速率为几十GHz的数字计算系统中。在低功耗高端计算领域,它是半导体晶体管之外最有前途的技术。然而,随着电路和系统运行速度的提高,制造工艺变化带来的时间不确定性使得全局同步变得非常困难。本文提出了一种全局异步、局部同步的RSFQ数字设计时序方法,解决了RSFQ数字设计的全局同步问题。我们还展示了一些采用RSFQ技术实现的异步电路和系统的最新实验结果。几个关键部件,如自定时移位寄存器,自定时解复用器,Muller-C元件,补全检测器和时钟发生器已经设计和测试。原型数据缓冲系统的高速运行已被证实高达20 Gb/s,该系统由两个自定时移位寄存器和一个片上5-38 GHz时钟发生器组成。
{"title":"Asynchronous circuits and systems in superconducting RSFQ digital technology","authors":"Z. J. Deng, S. Whiteley, T. Duzer, J. Tierno","doi":"10.1109/ASYNC.1998.666512","DOIUrl":"https://doi.org/10.1109/ASYNC.1998.666512","url":null,"abstract":"Superconductive Rapid Single Flux Quantum (RSFQ) logic and memory, in which ones and zeros are represented by the presence or absence within a timing window of quantized picosecond voltage pulse (/spl int/v(t)dt=h/2e=2.07 mV/spl middot/ps), corresponding to one SFQ, can be integrated into a digital computing system with an operating rate of several tens of GHz, based on the present Nb Josephson junction integrated circuit technology. It is the most promising technology beyond semiconductor transistors for low-power high-end computation. However, as the operating speed of circuits and systems increase, timing uncertainty from fabrication process variations makes global synchronization very hard. In this paper, we present a globally asynchronous, locally synchronous timing methodology for RSFQ digital design, which can solve the global synchronization problem. We also demonstrate the recent experimental results of some asynchronous circuits and systems implemented in RSFQ technology. Several key components such as a self-timed shift register, a self-timed demultiplexor, a Muller-C element, a completion detector, and a clock generator have been designed and tested. High speed operation has been confirmed up to 20 Gb/s for a prototype data buffer system, which consists two self-timed shift registers and an on-chip 5-38 GHz clock generator.","PeriodicalId":425072,"journal":{"name":"Proceedings Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134127303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Asynchronous macrocell interconnect using MARBLE 使用MARBLE的异步宏单元互连
William John Bainbridge, S. Furber
This paper introduces MARBLE, the Manchester AsynchRonous Bus for Low Energy, a two channel micropipeline bus with centralized arbitration and address decoding which provides for the interconnection of asynchronous VLSI macrocells. In addition to basic bus functionality, MARBLE supports bus-bridging and test access, demonstrating that all the functions of a high speed macrocell bus can be implemented efficiently in a fully asynchronous design style. MARBLE is used in the AMULET3i microprocessor to connect the CPU core and DMA controller to RAM, ROM and peripherals. It exploits pipelining of the arbitration, address and data cycles, together with spatial locality optimizations and in-order split transfers, to supply the bandwidth requirements of such a system. The design of a MARBLE initiator data interface used in the AMULET3i is presented, including a Petri-net specification suitable for synthesis using the Petrify tool.
本文介绍了一种具有集中仲裁和地址解码功能的双通道微管道总线MARBLE (Manchester AsynchRonous Bus for Low Energy),用于异步超大规模集成电路(VLSI)宏单元的互连。除了基本的总线功能外,MARBLE还支持总线桥接和测试访问,这表明高速macrocell总线的所有功能都可以在完全异步的设计风格下有效地实现。AMULET3i微处理器使用MARBLE将CPU核心和DMA控制器连接到RAM、ROM和外设。它利用仲裁、地址和数据周期的流水线,以及空间局部性优化和按顺序分割传输,来提供这样一个系统的带宽需求。介绍了AMULET3i中使用的大理石引发剂数据接口的设计,包括适合使用石化工具合成的Petri-net规范。
{"title":"Asynchronous macrocell interconnect using MARBLE","authors":"William John Bainbridge, S. Furber","doi":"10.1109/ASYNC.1998.666499","DOIUrl":"https://doi.org/10.1109/ASYNC.1998.666499","url":null,"abstract":"This paper introduces MARBLE, the Manchester AsynchRonous Bus for Low Energy, a two channel micropipeline bus with centralized arbitration and address decoding which provides for the interconnection of asynchronous VLSI macrocells. In addition to basic bus functionality, MARBLE supports bus-bridging and test access, demonstrating that all the functions of a high speed macrocell bus can be implemented efficiently in a fully asynchronous design style. MARBLE is used in the AMULET3i microprocessor to connect the CPU core and DMA controller to RAM, ROM and peripherals. It exploits pipelining of the arbitration, address and data cycles, together with spatial locality optimizations and in-order split transfers, to supply the bandwidth requirements of such a system. The design of a MARBLE initiator data interface used in the AMULET3i is presented, including a Petri-net specification suitable for synthesis using the Petrify tool.","PeriodicalId":425072,"journal":{"name":"Proceedings Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131062598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
A low-power, low noise, configurable self-timed DSP 低功耗,低噪声,可配置自定时DSP
N. Paver, P. Day, C. Farnsworth, D. L. Jackson, W. A. Lien, Jianwei Liu
This paper describes a commercial implementation of a self-timed DSP. The self-timed design is fully compatible with a synchronous implementation allowing comparisons of both design styles to be made. The self-timed implementation has shown many benefits over its synchronous counterpart especially with regards power consumption and noise emissions. It also demonstrates the commercial viability of self-timed designs in power and noise sensitive applications. This paper also introduces the concept of a highly configurable Application Specific Integrated Architecture (ASIA/sup TM/).
本文介绍了一种自定时DSP的商业实现。自定时设计与同步实现完全兼容,允许对两种设计风格进行比较。与同步技术相比,自定时技术具有许多优点,特别是在功耗和噪音排放方面。它还证明了自定时设计在功率和噪声敏感应用中的商业可行性。本文还介绍了高度可配置的应用特定集成体系结构(ASIA/sup TM/)的概念。
{"title":"A low-power, low noise, configurable self-timed DSP","authors":"N. Paver, P. Day, C. Farnsworth, D. L. Jackson, W. A. Lien, Jianwei Liu","doi":"10.1109/ASYNC.1998.666492","DOIUrl":"https://doi.org/10.1109/ASYNC.1998.666492","url":null,"abstract":"This paper describes a commercial implementation of a self-timed DSP. The self-timed design is fully compatible with a synchronous implementation allowing comparisons of both design styles to be made. The self-timed implementation has shown many benefits over its synchronous counterpart especially with regards power consumption and noise emissions. It also demonstrates the commercial viability of self-timed designs in power and noise sensitive applications. This paper also introduces the concept of a highly configurable Application Specific Integrated Architecture (ASIA/sup TM/).","PeriodicalId":425072,"journal":{"name":"Proceedings Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126249269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
A FIFO data switch design experiment 一个FIFO数据开关设计实验
William S. Coates, J. Lexau, I. W. Jones, Scott M. Fairbanks, I. Sutherland
A core problem in many pipelined circuit designs is data-dependent data flow. We describe a methodology and a set of circuit modules to address this problem in the asynchronous domain. We call our methodology P**3, or "P cubed". Items flowing through a set of FIFO datapaths can be conditionally steered under the control of data carried by other FIFOs. We have used the P**3 methodology to design and implement a FIFO rest chip that uses a data-dependent switch to delete marked data items conditionally. The circuit uses two on-chip FIFO rings as high-speed data sources. It was fabricated through MOSIS using their 0.6 /spl mu/ CMOS design rules. The peak data switch throughput was measured to be a minimum of 580 million data items per second at nominal Vdd of 3.3 V.
数据相关的数据流是许多流水线电路设计中的一个核心问题。我们描述了一种方法和一组电路模块来解决异步领域的这个问题。我们称我们的方法为P**3,或“P立方”。流经一组FIFO数据路径的项目可以在其他FIFO携带的数据的控制下有条件地转向。我们使用P**3方法来设计和实现一个FIFO休息芯片,该芯片使用数据依赖开关来有条件地删除标记的数据项。该电路使用两个片上FIFO环作为高速数据源。利用他们的0.6 /spl mu/ CMOS设计规则通过MOSIS制作。在标称Vdd为3.3 V时,测量到的峰值数据交换机吞吐量至少为每秒5.8亿个数据项。
{"title":"A FIFO data switch design experiment","authors":"William S. Coates, J. Lexau, I. W. Jones, Scott M. Fairbanks, I. Sutherland","doi":"10.1109/ASYNC.1998.666490","DOIUrl":"https://doi.org/10.1109/ASYNC.1998.666490","url":null,"abstract":"A core problem in many pipelined circuit designs is data-dependent data flow. We describe a methodology and a set of circuit modules to address this problem in the asynchronous domain. We call our methodology P**3, or \"P cubed\". Items flowing through a set of FIFO datapaths can be conditionally steered under the control of data carried by other FIFOs. We have used the P**3 methodology to design and implement a FIFO rest chip that uses a data-dependent switch to delete marked data items conditionally. The circuit uses two on-chip FIFO rings as high-speed data sources. It was fabricated through MOSIS using their 0.6 /spl mu/ CMOS design rules. The peak data switch throughput was measured to be a minimum of 580 million data items per second at nominal Vdd of 3.3 V.","PeriodicalId":425072,"journal":{"name":"Proceedings Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133535072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Verifying a self-timed divider 验证自定时分压器
Tarik Ono-Tesfaye, Christoph Kern, M. Greenstreet
This paper presents an approach to verifying timed designs based on refinement: first, correctness is established for a speed-independent model; then, the timed design is shown to be a refinement of this model. Although this approach is less automatic than methods based on timed state space enumeration, it is tractable for larger designs. Our method is implemented using a proof checker with a built-in model checker for verifying properties of high-level models, a tautology checker for establishing refinement, and a graph-based timing verification procedure for showing timing properties of transistor level models. We demonstrate the method by proving the timing correctness of Williams' self-timed divider.
本文提出了一种基于改进的时间设计验证方法:首先,建立了与速度无关的模型的正确性;然后,时间设计被证明是该模型的细化。虽然这种方法的自动化程度不如基于时间状态空间枚举的方法,但对于较大的设计来说,它是易于处理的。我们的方法是使用一个带有内置模型检查器的证明检查器来验证高级模型的属性,一个重言检查器来建立改进,以及一个基于图的时序验证程序来显示晶体管级模型的时序属性。我们通过证明Williams自定时除法的定时正确性来证明该方法。
{"title":"Verifying a self-timed divider","authors":"Tarik Ono-Tesfaye, Christoph Kern, M. Greenstreet","doi":"10.1109/ASYNC.1998.666501","DOIUrl":"https://doi.org/10.1109/ASYNC.1998.666501","url":null,"abstract":"This paper presents an approach to verifying timed designs based on refinement: first, correctness is established for a speed-independent model; then, the timed design is shown to be a refinement of this model. Although this approach is less automatic than methods based on timed state space enumeration, it is tractable for larger designs. Our method is implemented using a proof checker with a built-in model checker for verifying properties of high-level models, a tautology checker for establishing refinement, and a graph-based timing verification procedure for showing timing properties of transistor level models. We demonstrate the method by proving the timing correctness of Williams' self-timed divider.","PeriodicalId":425072,"journal":{"name":"Proceedings Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134045608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Building finite automata from DI specifications 根据DI规范构建有限自动机
W. C. Mallon, J. T. Udding
Numerous formalisms exist to specify delay-insensitive computations and their implementations. It is not always straightforward to compare specifications in the different formalisms. One way of comparing specifications is transforming them to automata in which nodes are annotated with progress requirements. In this paper we present an algorithm that transforms DI-algebra recursive process expressions into finite automata. In doing so we develop an operational semantics for DI-algebra. The algorithm has been proven correct, and we highlight the most interesting aspects of that proof The algorithm has been implemented and turns out to be very valuable in the process of getting a specification right.
存在许多形式来指定对延迟不敏感的计算及其实现。比较不同形式的规范并不总是直截了当的。比较规范的一种方法是将它们转换为自动机,其中节点用进度需求进行注释。本文提出了一种将di代数递归过程表达式转化为有限自动机的算法。在此过程中,我们为di代数开发了一个操作语义。该算法已被证明是正确的,我们将重点介绍该证明中最有趣的方面。该算法已被实现,并且在获得正确规范的过程中非常有价值。
{"title":"Building finite automata from DI specifications","authors":"W. C. Mallon, J. T. Udding","doi":"10.1109/ASYNC.1998.666504","DOIUrl":"https://doi.org/10.1109/ASYNC.1998.666504","url":null,"abstract":"Numerous formalisms exist to specify delay-insensitive computations and their implementations. It is not always straightforward to compare specifications in the different formalisms. One way of comparing specifications is transforming them to automata in which nodes are annotated with progress requirements. In this paper we present an algorithm that transforms DI-algebra recursive process expressions into finite automata. In doing so we develop an operational semantics for DI-algebra. The algorithm has been proven correct, and we highlight the most interesting aspects of that proof The algorithm has been implemented and turns out to be very valuable in the process of getting a specification right.","PeriodicalId":425072,"journal":{"name":"Proceedings Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128577591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
An asynchronous PRBS error checker for testing high-speed self-clocked serial links 用于测试高速自时钟串行链路的异步PRBS错误检查器
P. T. Røine
Pseudo-random bit sequences (PRBS) are commonly used to determine the bit error rate (BER) of serial communication links. On self-clocked links in an asynchronous environment, the data rate may vary over time. An asynchronous PRBS error checker was designed for BER measurements on such links working at data rates exceeding 1 Gbps. To achieve the highest possible speed, the error checker employs a self-timed ring structure with distributed completion detection.
伪随机比特序列(PRBS)通常用于确定串行通信链路的误码率。在异步环境中的自时钟链路上,数据速率可能随时间变化。设计了异步PRBS错误检查器,用于在数据速率超过1gbps的链路上进行误码率测量。为了达到尽可能高的速度,错误检查器采用了具有分布式完成检测的自定时环结构。
{"title":"An asynchronous PRBS error checker for testing high-speed self-clocked serial links","authors":"P. T. Røine","doi":"10.1109/ASYNC.1998.666500","DOIUrl":"https://doi.org/10.1109/ASYNC.1998.666500","url":null,"abstract":"Pseudo-random bit sequences (PRBS) are commonly used to determine the bit error rate (BER) of serial communication links. On self-clocked links in an asynchronous environment, the data rate may vary over time. An asynchronous PRBS error checker was designed for BER measurements on such links working at data rates exceeding 1 Gbps. To achieve the highest possible speed, the error checker employs a self-timed ring structure with distributed completion detection.","PeriodicalId":425072,"journal":{"name":"Proceedings Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122137576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A fast asynchronous Huffman decoder for compressed-code embedded processors 用于压缩码嵌入式处理器的快速异步霍夫曼解码器
Martin Benes, S. Nowick, A. Wolfe
This paper presents the architecture and design of a high-performance asynchronous Huffman decoder for compressed-code embedded processors. In such processors, embedded programs are stored in compressed form in instruction ROM then are decompressed on demand during instruction cache refill. The Huffman decoder is used as a code decompression engine. The circuit is non-pipelined, and is implemented as an iterative self-timed ring. It achieves a high-speed decode rate with very low area overhead. Simulations using Lsim show an average throughput of 32 bits/25 ns on the output side (or 163 MBytes/sec, or 1303 Mbit/sec), corresponding to about 889 Mbit/sec on the input side. The area of the design is extremely small: under 1 mm/sup 2/ in a 0.8 micron full-custom layout. The decoder is estimated to have higher throughput than any comparable synchronous Huffman decoder (after normalizing for feature size and voltage), yet is much smaller than synchronous designs. Its performance is also 83% faster than a recently published asynchronous Huffman decoder using the same technology.
本文介绍了一种用于嵌入式压缩码处理器的高性能异步霍夫曼解码器的结构和设计。在这种处理器中,嵌入式程序以压缩形式存储在指令ROM中,然后在指令缓存重新填充期间按需解压缩。霍夫曼解码器被用作代码解压缩引擎。该电路是非流水线的,并实现为迭代自定时环。它以非常低的面积开销实现高速解码率。使用Lsim进行的模拟显示,输出端的平均吞吐量为32位/25 ns(或163兆字节/秒,或1303兆比特/秒),对应于输入端的大约889兆比特/秒。该设计的面积非常小:在0.8微米的全定制布局中,小于1毫米/sup 2/。据估计,该解码器比任何可比较的同步霍夫曼解码器(在特征尺寸和电压归一化后)具有更高的吞吐量,但比同步设计小得多。它的性能也比最近发布的使用相同技术的异步霍夫曼解码器快83%。
{"title":"A fast asynchronous Huffman decoder for compressed-code embedded processors","authors":"Martin Benes, S. Nowick, A. Wolfe","doi":"10.1109/ASYNC.1998.666493","DOIUrl":"https://doi.org/10.1109/ASYNC.1998.666493","url":null,"abstract":"This paper presents the architecture and design of a high-performance asynchronous Huffman decoder for compressed-code embedded processors. In such processors, embedded programs are stored in compressed form in instruction ROM then are decompressed on demand during instruction cache refill. The Huffman decoder is used as a code decompression engine. The circuit is non-pipelined, and is implemented as an iterative self-timed ring. It achieves a high-speed decode rate with very low area overhead. Simulations using Lsim show an average throughput of 32 bits/25 ns on the output side (or 163 MBytes/sec, or 1303 Mbit/sec), corresponding to about 889 Mbit/sec on the input side. The area of the design is extremely small: under 1 mm/sup 2/ in a 0.8 micron full-custom layout. The decoder is estimated to have higher throughput than any comparable synchronous Huffman decoder (after normalizing for feature size and voltage), yet is much smaller than synchronous designs. Its performance is also 83% faster than a recently published asynchronous Huffman decoder using the same technology.","PeriodicalId":425072,"journal":{"name":"Proceedings Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"64 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134624362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Average-case optimized transistor-level technology mapping of extended burst-mode circuits 扩展突发模式电路的平均箱优化晶体管级技术映射
K. W. James, K. Yun
We describe an automated method (3D-map) for determining near-optimal decomposed generalized C-element (gC) implementations of extended burst-mode asynchronous controllers. Average-case optimization is performed so that frequent paths are accelerated, possibly at the expense of less frequent paths. The overall effect, as quantified using Elmore delay analysis, is a circuit that has near-optimal performance for the average or common case.
我们描述了一种自动化方法(3D-map)来确定扩展突发模式异步控制器的近最优分解广义c元(gC)实现。执行平均情况优化,以便加速频繁路径,可能以牺牲较不频繁的路径为代价。总体效果,如量化使用Elmore延迟分析,是一个电路具有接近最佳性能的平均或普通情况下。
{"title":"Average-case optimized transistor-level technology mapping of extended burst-mode circuits","authors":"K. W. James, K. Yun","doi":"10.1109/ASYNC.1998.666495","DOIUrl":"https://doi.org/10.1109/ASYNC.1998.666495","url":null,"abstract":"We describe an automated method (3D-map) for determining near-optimal decomposed generalized C-element (gC) implementations of extended burst-mode asynchronous controllers. Average-case optimization is performed so that frequent paths are accelerated, possibly at the expense of less frequent paths. The overall effect, as quantified using Elmore delay analysis, is a circuit that has near-optimal performance for the average or common case.","PeriodicalId":425072,"journal":{"name":"Proceedings Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134022050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
Proceedings Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1