首页 > 最新文献

1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)最新文献

英文 中文
Vector computations on an orthogonal memory access multiprocessing system 正交存储器存取多处理系统的矢量计算
Pub Date : 1987-05-18 DOI: 10.1109/ARITH.1987.6158715
I. Scherson, Yiming Ma
An Orthogonal Memory Access system allows a multiplicity of processors to concurrently access distinct rows or columns of a rectangular array of data elements. The resulting tightly-coupled multi-processing system is feasible with current technology and has even been suggested for VLSI as a “reduced mesh”. In this paper we introduce the architecture and concentrate on its application to a number of basic vector and numerical computations. Matrix multiplication, L-U decomposition, polynomial evaluation and solutions to linear systems and partial differential equations, all show a speed-up of 0(n) for a n-processor system. The flexibility in the choice of the number of PEs makes the architecture a strong competitor in the world of special-purpose parallel systems. Actually, we prove that the machine exhibits the same performance as any other system with the same number of processors within a factor of 3.
一种正交存储器访问系统,允许多个处理器并发地访问数据元素矩形数组的不同行或列。由此产生的紧密耦合多处理系统在当前技术下是可行的,甚至被认为是VLSI的“简化网格”。在本文中,我们介绍了该体系结构,并重点介绍了它在一些基本矢量和数值计算中的应用。矩阵乘法、L-U分解、多项式求值以及线性系统和偏微分方程的解,对于n处理器系统都显示出0(n)的加速。pe数量选择的灵活性使该体系结构成为专用并行系统领域的有力竞争者。实际上,我们证明了该机器的性能与具有相同处理器数量的任何其他系统的性能相差3倍。
{"title":"Vector computations on an orthogonal memory access multiprocessing system","authors":"I. Scherson, Yiming Ma","doi":"10.1109/ARITH.1987.6158715","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158715","url":null,"abstract":"An Orthogonal Memory Access system allows a multiplicity of processors to concurrently access distinct rows or columns of a rectangular array of data elements. The resulting tightly-coupled multi-processing system is feasible with current technology and has even been suggested for VLSI as a “reduced mesh”. In this paper we introduce the architecture and concentrate on its application to a number of basic vector and numerical computations. Matrix multiplication, L-U decomposition, polynomial evaluation and solutions to linear systems and partial differential equations, all show a speed-up of 0(n) for a n-processor system. The flexibility in the choice of the number of PEs makes the architecture a strong competitor in the world of special-purpose parallel systems. Actually, we prove that the machine exhibits the same performance as any other system with the same number of processors within a factor of 3.","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132140890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Algorithm for high speed shared radix 4 division and radix 4 square-root 高速共享基数4除法和基数4平方根算法
Pub Date : 1987-05-18 DOI: 10.1109/ARITH.1987.6158696
J. Fandrianto
An algorithm to implement radix four division and radix four square-root in a shared hardware for IEEE standard for binary floating point format will be described. The algorithm is best suited to be implemented in either off-the-shelf components or being a portion of a VLSI floating-point chip. Division and square-root bits are generated by a non-restoring method while keeping the partial remainder, partial radicand, quotient and root all in redundant forms. The core iteration involves a 8-bit carry look-ahead adder, a multiplexer to convert two's complement to sign magnitude, a 19-term next quotient/root prediction PLA, a divisor/root multiple selector, and a carry save adder. At the end, two iterations of carry look-ahead adder across the length of the mantissa are required to generate the quotient/root in a correctly rounded form. Despite its simplicity in the hardware requirement, the algorithm takes only about 30 cycles to compute double precision division or square-root. Finally, extending the algorithm to radix eight or higher division/square-root will be discussed.
本文描述了一种在IEEE二进制浮点格式标准的共享硬件上实现基数四除法和基数四平方根的算法。该算法最适合在现成组件或VLSI浮点芯片的一部分中实现。除法位和平方根位通过非还原方法生成,同时保持部分余数、部分根数、商和根都是冗余形式。核心迭代包括一个8位进位预判加法器,一个将两个补码转换为符号幅度的多路复用器,一个19项下商/根预测PLA,一个除数/根多重选择器和一个进位保存加法器。最后,需要在尾数长度上进行两次进位预判加法器迭代,以正确的四舍五入形式生成商/根。尽管硬件要求简单,但该算法只需30个周期即可计算双精度除法或平方根。最后,将该算法扩展到基数为8或更高的除法/平方根。
{"title":"Algorithm for high speed shared radix 4 division and radix 4 square-root","authors":"J. Fandrianto","doi":"10.1109/ARITH.1987.6158696","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158696","url":null,"abstract":"An algorithm to implement radix four division and radix four square-root in a shared hardware for IEEE standard for binary floating point format will be described. The algorithm is best suited to be implemented in either off-the-shelf components or being a portion of a VLSI floating-point chip. Division and square-root bits are generated by a non-restoring method while keeping the partial remainder, partial radicand, quotient and root all in redundant forms. The core iteration involves a 8-bit carry look-ahead adder, a multiplexer to convert two's complement to sign magnitude, a 19-term next quotient/root prediction PLA, a divisor/root multiple selector, and a carry save adder. At the end, two iterations of carry look-ahead adder across the length of the mantissa are required to generate the quotient/root in a correctly rounded form. Despite its simplicity in the hardware requirement, the algorithm takes only about 30 cycles to compute double precision division or square-root. Finally, extending the algorithm to radix eight or higher division/square-root will be discussed.","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124989761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Fast area-efficient VLSI adders 快速高效VLSI加法器
Pub Date : 1987-05-18 DOI: 10.1109/ARITH.1987.6158699
T. Han, D. A. Carlson
In this paper, we study area-time tradeoffs in VLSI for prefix computation using graph representations of this problem. Since the problem is intimately related to binary addition, the results we obtain lead to the design of area-time efficient VLSI adders. This is a major goal of our work: to design very low latency addition circuitry that is also area efficient. To this end, we present a new graph representation for prefix computation that leads to the design of a fast, area-efficient binary adder. The new graph is a combination of previously known graph representations for prefix computation, and its area is close to known lower bounds on the VLSI area of parallel prefix graphs. Using it, we are able to design VLSI adders having area A = 0(n log n) whose delay time is the lowest possible value, i. e. the fastest possible area-efficient VLSI adder.
在本文中,我们研究了在VLSI的区域-时间权衡的前缀计算使用图表示的问题。由于该问题与二进制加法密切相关,我们得到的结果将导致设计面积时间高效的VLSI加法器。这是我们工作的一个主要目标:设计非常低延迟的附加电路,也具有面积效率。为此,我们提出了一种新的前缀计算图表示,从而设计了一种快速,面积有效的二进制加法器。新图结合了先前已知的用于前缀计算的图表示,其面积接近并行前缀图的VLSI区域的已知下界。使用它,我们能够设计面积A = 0(n log n)的VLSI加法器,其延迟时间是尽可能低的值,即可能最快的面积高效VLSI加法器。
{"title":"Fast area-efficient VLSI adders","authors":"T. Han, D. A. Carlson","doi":"10.1109/ARITH.1987.6158699","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158699","url":null,"abstract":"In this paper, we study area-time tradeoffs in VLSI for prefix computation using graph representations of this problem. Since the problem is intimately related to binary addition, the results we obtain lead to the design of area-time efficient VLSI adders. This is a major goal of our work: to design very low latency addition circuitry that is also area efficient. To this end, we present a new graph representation for prefix computation that leads to the design of a fast, area-efficient binary adder. The new graph is a combination of previously known graph representations for prefix computation, and its area is close to known lower bounds on the VLSI area of parallel prefix graphs. Using it, we are able to design VLSI adders having area A = 0(n log n) whose delay time is the lowest possible value, i. e. the fastest possible area-efficient VLSI adder.","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115011280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 354
期刊
1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1