Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors最新文献

英文中文

An analysis of the CORDIC algorithm for direct digital frequency synthesis 直接数字频率合成的CORDIC算法分析

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors

Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030709

C. Kang, E. Swartzlander

The circular-mode CORDIC (coordinate rotation digital computer) algorithm is analyzed for DDFS (direct digital frequency synthesis) applications. It is shown how the CORDIC parameters should be chosen to meet given DDFS parameters. Also, three methods of CORDIC datapath quantization: rounding, truncation, and jamming, have been investigated and their error bounds are derived. Through a set of simulations, it is demonstrated that jamming has desirable characteristics in many aspects such as complexity, speed, error, and bias. Finally, it is shown that the CORDIC output can be made exact to the digits by an additional rounding process, which is especially useful for DDFS applications where the CORDIC output should be truncated to the final DAC (digital-to-analog converter) width.

分析了直接数字频率合成(DDFS)应用中的圆模坐标旋转数字计算机(CORDIC)算法。演示了如何选择CORDIC参数以满足给定的DDFS参数。此外，还研究了CORDIC数据路径量化的三种方法:舍入、截断和干扰，并推导了它们的误差范围。通过一组仿真，证明了干扰在复杂性、速度、误差和偏置等方面都具有理想的特性。最后，通过一个额外的舍入处理，可以使CORDIC输出精确到数字，这对于应该将CORDIC输出截断到最终DAC(数模转换器)宽度的DDFS应用程序特别有用。

引用次数: 20

Fast radix-4 retimed division with selection by comparisons 快速基数-4重新计时除法与选择比较

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors

Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030718

E. Antelo, T. Lang, P. Montuschi, A. Nannarelli

Since a large portion of the critical path in an implementation of radix-4 division corresponds to the delay of the quotient-digit selection module, it is of interest to reduce this delay. The proposal of this paper extends the approach presented recently of prestoring the selection constants corresponding to the actual value of the divisor and to perform the determination of the quotient digit by carry-free subtraction and sign detection. This extension consists in advancing the subtraction so that it is outside of the critical path. This advancement also provides the possibility of placing the registers so as to minimize the cycle time. We present the method and report results of synthesis using a family of standard cells. We conclude that the extension results in a speedup of 1.35 with respect to the basic implementation and of 1.3 with respect to the previously mentioned approach. We estimate that the areas of all three units are about the same.

由于在基数-4除法的实现中，关键路径的很大一部分对应于商数选择模块的延迟，因此减少这种延迟是很有意义的。本文扩展了最近提出的保留与除数实际值对应的选择常数的方法，并通过无进位减法和符号检测来确定商数。这个扩展包括推进减法，使其在关键路径之外。这一进步也提供了放置寄存器的可能性，以尽量减少周期时间。我们提出的方法和报告的结果合成使用一个家族的标准细胞。我们得出的结论是，与基本实现相比，扩展的速度提高了1.35，与前面提到的方法相比，速度提高了1.3。我们估计这三个单元的面积大致相同。

引用次数: 10

Implementation of a 32-bit RISC processor for the data-intensive architecture processing-in-memory chip 实现了一种32位RISC处理器，用于数据密集型架构的内存处理芯片

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors

Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030716

J. Draper, J. Sondeen, S. Mediratta, Ihn Kim

The Data-Intensive Architecture(DIVA) system employs Processing-In-Memory(PIM) chips as smart-memory coprocessors to a micorprocessor. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth- limited applications, including multimedia applications and pointer-based and sparse-matrix computations. The DIVA project is building a prototype workstation-class system using PIM chips in place of standard DRAMs to demonstrate these concepts. We have recently completed initial testing of the rst version of the prototype PIM device. A key component of this architecture is the scalar processor that coordinates all activ-ity within a PIM node. Since such a component is present in each PIM node,we exploit parallelism to achieve significant speedups rather than relying on costly, high-performance processor design. The resulting scalar processor is then an in-order 32-bit RISC microcontroller that is extremely area-efficient. This paper details the design and implementation of this scalar processor in TSMC 0.18cm technology. In conjunction with other publications, this paper demonstrates that impressive gains can be achieved with very little "smart" logic added to memory devices.

数据密集型架构(DIVA)系统采用内存中处理(PIM)芯片作为微处理器的智能内存协处理器。这种架构利用芯片上和整个系统的固有内存带宽，以几种带宽有限的应用程序为目标，包括多媒体应用程序和基于指针和稀疏矩阵的计算。DIVA项目正在构建一个原型工作站级系统，使用PIM芯片代替标准dram来演示这些概念。我们最近完成了原型PIM设备的第一个版本的初步测试。该体系结构的一个关键组件是协调PIM节点内所有活动的标量处理器。由于这样的组件存在于每个PIM节点中，因此我们利用并行性来实现显著的速度提升，而不是依赖于昂贵的高性能处理器设计。由此产生的标量处理器是一个有序的32位RISC微控制器，具有极高的面积效率。本文详细介绍了该标量处理器在TSMC 0.18cm工艺下的设计与实现。结合其他出版物，本文证明了在存储器设备中添加很少的“智能”逻辑就可以实现令人印象深刻的增益。

{"title":"Implementation of a 32-bit RISC processor for the data-intensive architecture processing-in-memory chip","authors":"J. Draper, J. Sondeen, S. Mediratta, Ihn Kim","doi":"10.1109/ASAP.2002.1030716","DOIUrl":"https://doi.org/10.1109/ASAP.2002.1030716","url":null,"abstract":"The Data-Intensive Architecture(DIVA) system employs Processing-In-Memory(PIM) chips as smart-memory coprocessors to a micorprocessor. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth- limited applications, including multimedia applications and pointer-based and sparse-matrix computations. The DIVA project is building a prototype workstation-class system using PIM chips in place of standard DRAMs to demonstrate these concepts. We have recently completed initial testing of the rst version of the prototype PIM device. A key component of this architecture is the scalar processor that coordinates all activ-ity within a PIM node. Since such a component is present in each PIM node,we exploit parallelism to achieve significant speedups rather than relying on costly, high-performance processor design. The resulting scalar processor is then an in-order 32-bit RISC microcontroller that is extremely area-efficient. This paper details the design and implementation of this scalar processor in TSMC 0.18cm technology. In conjunction with other publications, this paper demonstrates that impressive gains can be achieved with very little \"smart\" logic added to memory devices.","PeriodicalId":424082,"journal":{"name":"Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors","volume":"945 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123303509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

A novel pipelined threads architecture for AES encryption algorithm 一种新的用于AES加密算法的流水线线程架构

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors

Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030728

Mehboob Alam, Wael Badawy, G. Jullien

This paper presents a single-chip parallel architecture for advanced encryption standard (AES). The proposed architecture uses the thread approach, which integrates fully pipelined parallel units, that process 128 bits/cycle and quadruples the data throughput. The threads architecture allows a reduction of the clock rate by a factor of four, while maintaining the data throughput, and consumes less power. The prototype runs at a data rate of 7.68 Gbps on a Xilinx xc2V1500 Virtex-II FPGA. The data rate shows that the proposed thread approach produces one of the fastest single-chip FPGA implementations currently available. In addition, the proposed architecture is scalable to 192, 256 and higher bits.

提出了一种高级加密标准(AES)的单芯片并行架构。所提出的架构使用线程方法，它集成了完全流水线的并行单元，每周期处理128位，数据吞吐量提高了四倍。线程架构允许将时钟速率降低四分之一，同时保持数据吞吐量，并且消耗更少的功率。该原型在Xilinx xc2V1500 Virtex-II FPGA上以7.68 Gbps的数据速率运行。数据速率表明，所提出的线程方法产生了目前可用的最快的单芯片FPGA实现之一。此外，所提出的架构可扩展到192、256和更高位。

引用次数: 24

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀