Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18043
R. Hughey, Daniel P. Lopresti
The architecture of a simple but programmable linear systolic array tuned to support a variety of symbolic computations is presented. The system, the Brown Systolic Array (B-SYS) is currently being implemented in CMOS. B-SYS demonstrates that programmable processor arrays may be made fully systolic with no need for local program memory or global instruction broadcasting. Any hazards introduced by the systolic instruction stream can be avoided using a processing phase concept. The application of these ideas results in a basic cell that is both simple and flexible, making it possible to build massively parallel, programmable systolic arrays.<>
{"title":"Architecture of a programmable systolic array","authors":"R. Hughey, Daniel P. Lopresti","doi":"10.1109/ARRAYS.1988.18043","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18043","url":null,"abstract":"The architecture of a simple but programmable linear systolic array tuned to support a variety of symbolic computations is presented. The system, the Brown Systolic Array (B-SYS) is currently being implemented in CMOS. B-SYS demonstrates that programmable processor arrays may be made fully systolic with no need for local program memory or global instruction broadcasting. Any hazards introduced by the systolic instruction stream can be avoided using a processing phase concept. The application of these ideas results in a basic cell that is both simple and flexible, making it possible to build massively parallel, programmable systolic arrays.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123797125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18058
K. Collins, J.B.G. Roberts
A demanding problem involving several algorithmic phases with varying degrees of regularity and data dependence is used to show that a network of transputers programmed in OCCAM has all the attributes needed to explore several processing paradigms. Two alternative organizations of the problem on a network of 21 transputers are compared from the standpoints of speed, hardware efficiency, and ease of programming. Two highly parallel implementations of an algorithm that constitutes part of a real-time system to generate terrain relief maps from satellite stereo image pairs have been programmed. An optimum strategy that demonstrates the power of MIMD (multiple instruction, multiple data streams) parallel computing is determined.<>
{"title":"Stereo matching of satellite images with transputers","authors":"K. Collins, J.B.G. Roberts","doi":"10.1109/ARRAYS.1988.18058","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18058","url":null,"abstract":"A demanding problem involving several algorithmic phases with varying degrees of regularity and data dependence is used to show that a network of transputers programmed in OCCAM has all the attributes needed to explore several processing paradigms. Two alternative organizations of the problem on a network of 21 transputers are compared from the standpoints of speed, hardware efficiency, and ease of programming. Two highly parallel implementations of an algorithm that constitutes part of a real-time system to generate terrain relief maps from satellite stereo image pairs have been programmed. An optimum strategy that demonstrates the power of MIMD (multiple instruction, multiple data streams) parallel computing is determined.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122014001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18062
Robert E. Morley, T. J. Sullivan
The design of a massively parallel processor, comprised of 2304-bit-serial processor elements arranged in a 48 by 48 systolic array, is described. The system consists of the processor array, a microstore controller, and a host computer interface. Program development tools are available on the host computer. The processor array uses 32 NCR GAPP (Geometric Arithmetic Parallel Processor) microprocessor chips, while the microstore controller is implemented with a TMS32010 DSP chip and TTL (transistor-transistor logic) circuitry. Utilizing the nearest neighbor communication capabilities of the GAPP, the array receives data from the host at the south end of the array, outputs data to the host at the north edge of the array, and can wrap data between either the east and west or north and south edges. The array can also be configured as a linear array of 2304 processor elements. The microstore controller interfaces with the host and facilitates downloading of GAPP array machine code, provides for the debugging and monitoring of GAPP array execution from the host, and implements user-defined instructions.<>
{"title":"A massively parallel systolic array processor system","authors":"Robert E. Morley, T. J. Sullivan","doi":"10.1109/ARRAYS.1988.18062","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18062","url":null,"abstract":"The design of a massively parallel processor, comprised of 2304-bit-serial processor elements arranged in a 48 by 48 systolic array, is described. The system consists of the processor array, a microstore controller, and a host computer interface. Program development tools are available on the host computer. The processor array uses 32 NCR GAPP (Geometric Arithmetic Parallel Processor) microprocessor chips, while the microstore controller is implemented with a TMS32010 DSP chip and TTL (transistor-transistor logic) circuitry. Utilizing the nearest neighbor communication capabilities of the GAPP, the array receives data from the host at the south end of the array, outputs data to the host at the north edge of the array, and can wrap data between either the east and west or north and south edges. The array can also be configured as a linear array of 2304 processor elements. The microstore controller interfaces with the host and facilitates downloading of GAPP array machine code, provides for the debugging and monitoring of GAPP array execution from the host, and implements user-defined instructions.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121055109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18074
C.N. Zhang, H. Martin, D. Yun
Two algorithms for computing very large integer modular exponentiation are proposed. One is based on a recording technique that significantly reduces the total number of modular multiplications. The second is parallel algorithm that can be implemented by two parallel processors and achieves optimal performance. Two corresponding systolic array designs are developed. The main advantage of these systolic architectures is to provide a potentially higher throughput for a large number of computations, namely, encryptions and decryptions in an RSA cryptosystem.<>
{"title":"Parallel algorithms and systolic array designs for RSA cryptosystem","authors":"C.N. Zhang, H. Martin, D. Yun","doi":"10.1109/ARRAYS.1988.18074","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18074","url":null,"abstract":"Two algorithms for computing very large integer modular exponentiation are proposed. One is based on a recording technique that significantly reduces the total number of modular multiplications. The second is parallel algorithm that can be implemented by two parallel processors and achieves optimal performance. Two corresponding systolic array designs are developed. The main advantage of these systolic architectures is to provide a potentially higher throughput for a large number of computations, namely, encryptions and decryptions in an RSA cryptosystem.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121072496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18085
C. J. Anfinson, F. Luk
Algorithm-based fault tolerance provides a means of low-cost error protection in real-time signal-processing environments. A novel linear algebraic interpretation is developed for previously proposed algorithm-based fault-tolerance schemes. The concepts of distance, code space, and the definitions of detection and correction in the vector space R/sup n/ are clarified. Error detection and error correction performances are proved for distance-d+1 codes. It is shown why the correction scheme does not work for general weight vectors, and a novel fast-correction algorithm is derived for a distance-5 code.<>
{"title":"A linear algebraic model of algorithmic-based fault tolerance","authors":"C. J. Anfinson, F. Luk","doi":"10.1109/ARRAYS.1988.18085","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18085","url":null,"abstract":"Algorithm-based fault tolerance provides a means of low-cost error protection in real-time signal-processing environments. A novel linear algebraic interpretation is developed for previously proposed algorithm-based fault-tolerance schemes. The concepts of distance, code space, and the definitions of detection and correction in the vector space R/sup n/ are clarified. Error detection and error correction performances are proved for distance-d+1 codes. It is shown why the correction scheme does not work for general weight vectors, and a novel fast-correction algorithm is derived for a distance-5 code.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129666720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18088
R. Lea
The author discusses the Associative String Processor (ASP), a homogeneous, reconfigurable, and programmable parallel processing computational architecture that provides the base technology for the development of high-performance, fault-tolerant computer add-on processors to be applied to a wide range of information processing tasks. He presents the architectural principles and VLSI/ULSI/WSI implementation of the ASP and indicates its cost-performance potential.<>
{"title":"The ASP, a fault tolerant VLSI/ULSI/WSI associative string processor for cost-effective systolic processing","authors":"R. Lea","doi":"10.1109/ARRAYS.1988.18088","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18088","url":null,"abstract":"The author discusses the Associative String Processor (ASP), a homogeneous, reconfigurable, and programmable parallel processing computational architecture that provides the base technology for the development of high-performance, fault-tolerant computer add-on processors to be applied to a wide range of information processing tasks. He presents the architectural principles and VLSI/ULSI/WSI implementation of the ASP and indicates its cost-performance potential.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"398 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123543152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18091
Steven D. Kugelmass, K. Steiglitz
A probabilistic model for the accumulation of clock skew in synchronous systems is presented. The model is used to derive upper bounds for expected skew and its variance, in tree distribution systems with N synchronously clocked processing elements. The results are applied to two specific models for clock distribution. In the first, which is called metric-free, the skew in a buffer stage is Gaussian with a variance independent of wire length. The second, metric, model, is intended to reflect VLSI constraints: the clock skew in a stage is Gaussian with a variance proportional to wire length, and the distribution tree is an H-tree embedded in the plane. Upper bounds on skew are obtained for both models. Estimates of the constants of proportionality as well as the asymptotic behavior have been obtained and verified by simulation.<>
{"title":"A probabilistic model for clock skew","authors":"Steven D. Kugelmass, K. Steiglitz","doi":"10.1109/ARRAYS.1988.18091","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18091","url":null,"abstract":"A probabilistic model for the accumulation of clock skew in synchronous systems is presented. The model is used to derive upper bounds for expected skew and its variance, in tree distribution systems with N synchronously clocked processing elements. The results are applied to two specific models for clock distribution. In the first, which is called metric-free, the skew in a buffer stage is Gaussian with a variance independent of wire length. The second, metric, model, is intended to reflect VLSI constraints: the clock skew in a stage is Gaussian with a variance proportional to wire length, and the distribution tree is an H-tree embedded in the plane. Upper bounds on skew are obtained for both models. Estimates of the constants of proportionality as well as the asymptotic behavior have been obtained and verified by simulation.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114425557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18101
F. Gaston, G. Irwin
An alternative square-root-information Kalman filter algorithm based on orthogonal transformation is described and proved mathematically. The filter can be realized on a rectangular systolic array using n(n+1) processing cells and takes 3n+m timesteps between measurements. Comparisons are made with recent work of M.J. Chen and K. Yao (1986) and S.Y. Kung (1988), and it is shown that this algorithm achieves a processor utilization of approximately twice that of Chen and Yao at a speed that is 25% faster than Kung's.<>
本文描述了一种基于正交变换的卡尔曼滤波算法,并对其进行了数学证明。该滤波器可在矩形收缩阵列上使用 n(n+1) 个处理单元实现,测量间隔时间为 3n+m 步。该算法与 M.J. Chen 和 K. Yao(1986 年)以及 S.Y. Kung(1988 年)的最新研究成果进行了比较,结果表明,该算法的处理器利用率约为 Chen 和 Yao 的两倍,速度比 Kung 的算法快 25%。
{"title":"A systolic square root information Kalman filter","authors":"F. Gaston, G. Irwin","doi":"10.1109/ARRAYS.1988.18101","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18101","url":null,"abstract":"An alternative square-root-information Kalman filter algorithm based on orthogonal transformation is described and proved mathematically. The filter can be realized on a rectangular systolic array using n(n+1) processing cells and takes 3n+m timesteps between measurements. Comparisons are made with recent work of M.J. Chen and K. Yao (1986) and S.Y. Kung (1988), and it is shown that this algorithm achieves a processor utilization of approximately twice that of Chen and Yao at a speed that is 25% faster than Kung's.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129549837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18055
W. Phillips, W. Robertson
The first step in the development of a chip set to support eigenvalue-eigenvector-based estimation algorithms is presented. It is based on the assumption that an averaging technique will produce a symmetric covariance matrix. Such a matrix can be reduced to a symmetric tridiagonal matrix, and hence the eigenvalues and eigenvectors can be found by successive iterations involving QR decomposition. The architecture is unique in that other architectures either solve only for the eigenvalues or use methods other than QR iteration. It has potential for use in a systolic computer for computer intensive digital signal processing based on modern spectral-analysis techniques.<>
{"title":"A systolic architecture for the symmetric tridiagonal eigenvalue problem","authors":"W. Phillips, W. Robertson","doi":"10.1109/ARRAYS.1988.18055","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18055","url":null,"abstract":"The first step in the development of a chip set to support eigenvalue-eigenvector-based estimation algorithms is presented. It is based on the assumption that an averaging technique will produce a symmetric covariance matrix. Such a matrix can be reduced to a symmetric tridiagonal matrix, and hence the eigenvalues and eigenvectors can be found by successive iterations involving QR decomposition. The architecture is unique in that other architectures either solve only for the eigenvalues or use methods other than QR iteration. It has potential for use in a systolic computer for computer intensive digital signal processing based on modern spectral-analysis techniques.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129113833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18070
Hans-Werner Lang
The instruction systolic array (ISA) is an array processor architecture that is characterized by a systolic flow of instructions (instead of data as in standard systolic arrays). It is shown how the well-known Warshall algorithm for computing the transitive closure of a directed graph can be implemented on an n*n ISA. For problem sizes m>
{"title":"Transitive closure on an instruction systolic array","authors":"Hans-Werner Lang","doi":"10.1109/ARRAYS.1988.18070","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18070","url":null,"abstract":"The instruction systolic array (ISA) is an array processor architecture that is characterized by a systolic flow of instructions (instead of data as in standard systolic arrays). It is shown how the well-known Warshall algorithm for computing the transitive closure of a directed graph can be implemented on an n*n ISA. For problem sizes m<or=n the time complexity of this implementation is O(m).<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115046774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}