A. Gottlieb, R. Grishman, C. Kruskal, K. McAuliffe, L. Rudolph, M. Snir
{"title":"The NYU ultracomputer—designing a MIMD, shared-memory parallel machine","authors":"A. Gottlieb, R. Grishman, C. Kruskal, K. McAuliffe, L. Rudolph, M. Snir","doi":"10.1145/285930.285983","DOIUrl":null,"url":null,"abstract":"The design for the NYU ultracomputer, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements is presented. This machine uses an enhanced message switching network with the geometry of an omega-network to approximate the ideal behaviour of Schwartz's paracomputer model of computation and to implement efficiently the important fetch-and-add synchronisation primitive. The hardware which would be required to build a 4096 processor system using 1990s technology is outlined. System software issues are discussed and analytic studies of the network performance are presented. A sample of efforts to implement and simulate parallel variants of important scientific programs is included. 37 references.","PeriodicalId":91388,"journal":{"name":"Proceedings. International Symposium on Computer Architecture","volume":"22 1","pages":"239-254"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"86","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/285930.285983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 86
Abstract
The design for the NYU ultracomputer, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements is presented. This machine uses an enhanced message switching network with the geometry of an omega-network to approximate the ideal behaviour of Schwartz's paracomputer model of computation and to implement efficiently the important fetch-and-add synchronisation primitive. The hardware which would be required to build a 4096 processor system using 1990s technology is outlined. System software issues are discussed and analytic studies of the network performance are presented. A sample of efforts to implement and simulate parallel variants of important scientific programs is included. 37 references.