Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18074
C.N. Zhang, H. Martin, D. Yun
Two algorithms for computing very large integer modular exponentiation are proposed. One is based on a recording technique that significantly reduces the total number of modular multiplications. The second is parallel algorithm that can be implemented by two parallel processors and achieves optimal performance. Two corresponding systolic array designs are developed. The main advantage of these systolic architectures is to provide a potentially higher throughput for a large number of computations, namely, encryptions and decryptions in an RSA cryptosystem.<>
{"title":"Parallel algorithms and systolic array designs for RSA cryptosystem","authors":"C.N. Zhang, H. Martin, D. Yun","doi":"10.1109/ARRAYS.1988.18074","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18074","url":null,"abstract":"Two algorithms for computing very large integer modular exponentiation are proposed. One is based on a recording technique that significantly reduces the total number of modular multiplications. The second is parallel algorithm that can be implemented by two parallel processors and achieves optimal performance. Two corresponding systolic array designs are developed. The main advantage of these systolic architectures is to provide a potentially higher throughput for a large number of computations, namely, encryptions and decryptions in an RSA cryptosystem.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121072496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18088
R. Lea
The author discusses the Associative String Processor (ASP), a homogeneous, reconfigurable, and programmable parallel processing computational architecture that provides the base technology for the development of high-performance, fault-tolerant computer add-on processors to be applied to a wide range of information processing tasks. He presents the architectural principles and VLSI/ULSI/WSI implementation of the ASP and indicates its cost-performance potential.<>
{"title":"The ASP, a fault tolerant VLSI/ULSI/WSI associative string processor for cost-effective systolic processing","authors":"R. Lea","doi":"10.1109/ARRAYS.1988.18088","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18088","url":null,"abstract":"The author discusses the Associative String Processor (ASP), a homogeneous, reconfigurable, and programmable parallel processing computational architecture that provides the base technology for the development of high-performance, fault-tolerant computer add-on processors to be applied to a wide range of information processing tasks. He presents the architectural principles and VLSI/ULSI/WSI implementation of the ASP and indicates its cost-performance potential.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"398 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123543152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18062
Robert E. Morley, T. J. Sullivan
The design of a massively parallel processor, comprised of 2304-bit-serial processor elements arranged in a 48 by 48 systolic array, is described. The system consists of the processor array, a microstore controller, and a host computer interface. Program development tools are available on the host computer. The processor array uses 32 NCR GAPP (Geometric Arithmetic Parallel Processor) microprocessor chips, while the microstore controller is implemented with a TMS32010 DSP chip and TTL (transistor-transistor logic) circuitry. Utilizing the nearest neighbor communication capabilities of the GAPP, the array receives data from the host at the south end of the array, outputs data to the host at the north edge of the array, and can wrap data between either the east and west or north and south edges. The array can also be configured as a linear array of 2304 processor elements. The microstore controller interfaces with the host and facilitates downloading of GAPP array machine code, provides for the debugging and monitoring of GAPP array execution from the host, and implements user-defined instructions.<>
{"title":"A massively parallel systolic array processor system","authors":"Robert E. Morley, T. J. Sullivan","doi":"10.1109/ARRAYS.1988.18062","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18062","url":null,"abstract":"The design of a massively parallel processor, comprised of 2304-bit-serial processor elements arranged in a 48 by 48 systolic array, is described. The system consists of the processor array, a microstore controller, and a host computer interface. Program development tools are available on the host computer. The processor array uses 32 NCR GAPP (Geometric Arithmetic Parallel Processor) microprocessor chips, while the microstore controller is implemented with a TMS32010 DSP chip and TTL (transistor-transistor logic) circuitry. Utilizing the nearest neighbor communication capabilities of the GAPP, the array receives data from the host at the south end of the array, outputs data to the host at the north edge of the array, and can wrap data between either the east and west or north and south edges. The array can also be configured as a linear array of 2304 processor elements. The microstore controller interfaces with the host and facilitates downloading of GAPP array machine code, provides for the debugging and monitoring of GAPP array execution from the host, and implements user-defined instructions.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121055109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18058
K. Collins, J.B.G. Roberts
A demanding problem involving several algorithmic phases with varying degrees of regularity and data dependence is used to show that a network of transputers programmed in OCCAM has all the attributes needed to explore several processing paradigms. Two alternative organizations of the problem on a network of 21 transputers are compared from the standpoints of speed, hardware efficiency, and ease of programming. Two highly parallel implementations of an algorithm that constitutes part of a real-time system to generate terrain relief maps from satellite stereo image pairs have been programmed. An optimum strategy that demonstrates the power of MIMD (multiple instruction, multiple data streams) parallel computing is determined.<>
{"title":"Stereo matching of satellite images with transputers","authors":"K. Collins, J.B.G. Roberts","doi":"10.1109/ARRAYS.1988.18058","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18058","url":null,"abstract":"A demanding problem involving several algorithmic phases with varying degrees of regularity and data dependence is used to show that a network of transputers programmed in OCCAM has all the attributes needed to explore several processing paradigms. Two alternative organizations of the problem on a network of 21 transputers are compared from the standpoints of speed, hardware efficiency, and ease of programming. Two highly parallel implementations of an algorithm that constitutes part of a real-time system to generate terrain relief maps from satellite stereo image pairs have been programmed. An optimum strategy that demonstrates the power of MIMD (multiple instruction, multiple data streams) parallel computing is determined.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122014001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18085
C. J. Anfinson, F. Luk
Algorithm-based fault tolerance provides a means of low-cost error protection in real-time signal-processing environments. A novel linear algebraic interpretation is developed for previously proposed algorithm-based fault-tolerance schemes. The concepts of distance, code space, and the definitions of detection and correction in the vector space R/sup n/ are clarified. Error detection and error correction performances are proved for distance-d+1 codes. It is shown why the correction scheme does not work for general weight vectors, and a novel fast-correction algorithm is derived for a distance-5 code.<>
{"title":"A linear algebraic model of algorithmic-based fault tolerance","authors":"C. J. Anfinson, F. Luk","doi":"10.1109/ARRAYS.1988.18085","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18085","url":null,"abstract":"Algorithm-based fault tolerance provides a means of low-cost error protection in real-time signal-processing environments. A novel linear algebraic interpretation is developed for previously proposed algorithm-based fault-tolerance schemes. The concepts of distance, code space, and the definitions of detection and correction in the vector space R/sup n/ are clarified. Error detection and error correction performances are proved for distance-d+1 codes. It is shown why the correction scheme does not work for general weight vectors, and a novel fast-correction algorithm is derived for a distance-5 code.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129666720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18043
R. Hughey, Daniel P. Lopresti
The architecture of a simple but programmable linear systolic array tuned to support a variety of symbolic computations is presented. The system, the Brown Systolic Array (B-SYS) is currently being implemented in CMOS. B-SYS demonstrates that programmable processor arrays may be made fully systolic with no need for local program memory or global instruction broadcasting. Any hazards introduced by the systolic instruction stream can be avoided using a processing phase concept. The application of these ideas results in a basic cell that is both simple and flexible, making it possible to build massively parallel, programmable systolic arrays.<>
{"title":"Architecture of a programmable systolic array","authors":"R. Hughey, Daniel P. Lopresti","doi":"10.1109/ARRAYS.1988.18043","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18043","url":null,"abstract":"The architecture of a simple but programmable linear systolic array tuned to support a variety of symbolic computations is presented. The system, the Brown Systolic Array (B-SYS) is currently being implemented in CMOS. B-SYS demonstrates that programmable processor arrays may be made fully systolic with no need for local program memory or global instruction broadcasting. Any hazards introduced by the systolic instruction stream can be avoided using a processing phase concept. The application of these ideas results in a basic cell that is both simple and flexible, making it possible to build massively parallel, programmable systolic arrays.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123797125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18091
Steven D. Kugelmass, K. Steiglitz
A probabilistic model for the accumulation of clock skew in synchronous systems is presented. The model is used to derive upper bounds for expected skew and its variance, in tree distribution systems with N synchronously clocked processing elements. The results are applied to two specific models for clock distribution. In the first, which is called metric-free, the skew in a buffer stage is Gaussian with a variance independent of wire length. The second, metric, model, is intended to reflect VLSI constraints: the clock skew in a stage is Gaussian with a variance proportional to wire length, and the distribution tree is an H-tree embedded in the plane. Upper bounds on skew are obtained for both models. Estimates of the constants of proportionality as well as the asymptotic behavior have been obtained and verified by simulation.<>
{"title":"A probabilistic model for clock skew","authors":"Steven D. Kugelmass, K. Steiglitz","doi":"10.1109/ARRAYS.1988.18091","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18091","url":null,"abstract":"A probabilistic model for the accumulation of clock skew in synchronous systems is presented. The model is used to derive upper bounds for expected skew and its variance, in tree distribution systems with N synchronously clocked processing elements. The results are applied to two specific models for clock distribution. In the first, which is called metric-free, the skew in a buffer stage is Gaussian with a variance independent of wire length. The second, metric, model, is intended to reflect VLSI constraints: the clock skew in a stage is Gaussian with a variance proportional to wire length, and the distribution tree is an H-tree embedded in the plane. Upper bounds on skew are obtained for both models. Estimates of the constants of proportionality as well as the asymptotic behavior have been obtained and verified by simulation.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114425557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18070
Hans-Werner Lang
The instruction systolic array (ISA) is an array processor architecture that is characterized by a systolic flow of instructions (instead of data as in standard systolic arrays). It is shown how the well-known Warshall algorithm for computing the transitive closure of a directed graph can be implemented on an n*n ISA. For problem sizes m>
{"title":"Transitive closure on an instruction systolic array","authors":"Hans-Werner Lang","doi":"10.1109/ARRAYS.1988.18070","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18070","url":null,"abstract":"The instruction systolic array (ISA) is an array processor architecture that is characterized by a systolic flow of instructions (instead of data as in standard systolic arrays). It is shown how the well-known Warshall algorithm for computing the transitive closure of a directed graph can be implemented on an n*n ISA. For problem sizes m<or=n the time complexity of this implementation is O(m).<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115046774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18048
T. De Saint Pierre, M. Milgram
Straight-line-edge extraction can be carried out in two successive phases: identifying the pixels that belong to edges and conducting straight-line segments from these edge pixels. A parallel approach based on a cellular algorithm is proposed for the second phase. Each cell sends a message that compiles distances between a pattern segment and the real segment on the image. The value of the message identifies a segment and codifies its length and endpoints. If the parameters of the algorithm are properly chosen, it can be adjusted to different kinds of contours: noised or blurred edges and disconnected segments. The algorithm takes computation time proportional to the linear dimension of the image (for an image of N*N pixels the linear dimension is N) and the number of generalized directions.<>
{"title":"A cellular algorithm for straight line extraction","authors":"T. De Saint Pierre, M. Milgram","doi":"10.1109/ARRAYS.1988.18048","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18048","url":null,"abstract":"Straight-line-edge extraction can be carried out in two successive phases: identifying the pixels that belong to edges and conducting straight-line segments from these edge pixels. A parallel approach based on a cellular algorithm is proposed for the second phase. Each cell sends a message that compiles distances between a pattern segment and the real segment on the image. The value of the message identifies a segment and codifies its length and endpoints. If the parameters of the algorithm are properly chosen, it can be adjusted to different kinds of contours: noised or blurred edges and disconnected segments. The algorithm takes computation time proportional to the linear dimension of the image (for an image of N*N pixels the linear dimension is N) and the number of generalized directions.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129985712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1988-05-25DOI: 10.1109/ARRAYS.1988.18083
S. Manohar, G. Baudet
Limitations of current systolic designs are pointed out, and constraints are imposed to make systolic solutions practical. Matrix multiplication is used as an illustration, and a simple but very-high-performance systolic architecture, the Superprocessor for Matrix Problems (S-MP), that satisfies these constraints is presented. Implementation alternatives for the linear systolic array for matrix-vector multiplication, which forms the core of S-MP are described.<>
{"title":"A pragmatic approach to systolic design","authors":"S. Manohar, G. Baudet","doi":"10.1109/ARRAYS.1988.18083","DOIUrl":"https://doi.org/10.1109/ARRAYS.1988.18083","url":null,"abstract":"Limitations of current systolic designs are pointed out, and constraints are imposed to make systolic solutions practical. Matrix multiplication is used as an illustration, and a simple but very-high-performance systolic architecture, the Superprocessor for Matrix Problems (S-MP), that satisfies these constraints is presented. Implementation alternatives for the linear systolic array for matrix-vector multiplication, which forms the core of S-MP are described.<<ETX>>","PeriodicalId":339807,"journal":{"name":"[1988] Proceedings. International Conference on Systolic Arrays","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130524549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}