首页 > 最新文献

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing最新文献

英文 中文
A high level language for the RAPID-2 massively parallel accelerator board RAPID-2大规模并行加速器板的高级语言
P. Faudemay, L. Winckel
In this paper, we present an abstract model of the RAPID-2 SIMD architecture. RAPID-2 is a massively parallel add-on board for PCs. It implements a "paginated set-associative" model of architecture, and has systolic capabilities. The L1 language implements the abstract model. L1 is a co-specification language for the programming and micro-programming of RAPID-2. It is derived from C. In order to check their semantic, L1 programs can be emulated in a C++ environment. In the near future, they should be compiled into C application programs and the corresponding microprograms.<>
在本文中,我们提出了一个RAPID-2 SIMD体系结构的抽象模型。RAPID-2是用于pc的大规模并行附加板。它实现了体系结构的“分页集关联”模型,并具有收缩功能。L1语言实现抽象模型。L1是一种用于RAPID-2编程和微编程的协同规范语言。为了检查它们的语义,可以在c++环境中模拟L1程序。在不久的将来,它们应该被编译成C应用程序和相应的微程序。
{"title":"A high level language for the RAPID-2 massively parallel accelerator board","authors":"P. Faudemay, L. Winckel","doi":"10.1109/ICAPP.1995.472171","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472171","url":null,"abstract":"In this paper, we present an abstract model of the RAPID-2 SIMD architecture. RAPID-2 is a massively parallel add-on board for PCs. It implements a \"paginated set-associative\" model of architecture, and has systolic capabilities. The L1 language implements the abstract model. L1 is a co-specification language for the programming and micro-programming of RAPID-2. It is derived from C. In order to check their semantic, L1 programs can be emulated in a C++ environment. In the near future, they should be compiled into C application programs and the corresponding microprograms.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126469071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mapping nested loop algorithms into fault-tolerant systolic array architectures 将嵌套循环算法映射到容错收缩数组架构中
M. Esonu, A. Al-Khalili, S. Hariri
Progress in VLSI and WSI technologies has resulted in the manufacture of special purpose VLSI chips with multiple copies of low-cost processors. These processors can be used to design high performance systems such as systolic arrays. This paper proposes a new systematic approach which can be used to detect and correct errors in systolic array architectures. The approach relies on space-time mapping of algorithms into systolic arrays. Fault-tolerant algorithms are designed by introducing redundant computations at the algorithmic level. This is done by deriving several versions of a given algorithm, each of which can be mapped into respective systolic architecture. Fault-tolerant systolic array is constructed by merging the corresponding systolic array of several versions of the algorithm.<>
VLSI和WSI技术的进步导致了具有多个低成本处理器副本的特殊用途VLSI芯片的制造。这些处理器可用于设计高性能系统,如收缩阵列。本文提出了一种新的系统方法,可用于检测和纠正收缩阵列结构中的错误。该方法依赖于算法到收缩数组的时空映射。容错算法是通过在算法级引入冗余计算来设计的。这是通过推导给定算法的几个版本来完成的,每个版本都可以映射到各自的收缩结构中。通过合并多个版本算法的相应收缩数组,构建容错收缩数组。
{"title":"Mapping nested loop algorithms into fault-tolerant systolic array architectures","authors":"M. Esonu, A. Al-Khalili, S. Hariri","doi":"10.1109/ICAPP.1995.472167","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472167","url":null,"abstract":"Progress in VLSI and WSI technologies has resulted in the manufacture of special purpose VLSI chips with multiple copies of low-cost processors. These processors can be used to design high performance systems such as systolic arrays. This paper proposes a new systematic approach which can be used to detect and correct errors in systolic array architectures. The approach relies on space-time mapping of algorithms into systolic arrays. Fault-tolerant algorithms are designed by introducing redundant computations at the algorithmic level. This is done by deriving several versions of a given algorithm, each of which can be mapped into respective systolic architecture. Fault-tolerant systolic array is constructed by merging the corresponding systolic array of several versions of the algorithm.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128024685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast parallel algorithms for testing k-connectivity of directed and undirected graphs 有向图和无向图k-连通性的快速并行算法
W. Liang, B. McKay
It appears that no NC algorithms have previously appeared for testing a directed graph for k-edge connectivity or k-vertex connectivity, even for fixed k>1. Using an elementary flow method we give such algorithms, with time complexity O(k log n) using nP(n,m) or (n+k/sup 2/)P(n,m) processors, respectively. Here, n is the number of vertices, m is the number of edges, P(n,m) is the number of processors needed to find some path in time O(log n) time between two specified vertices in a directed graph with O(n) vertices and O(m) edges, and the computation model is a CRCW PRAM. These algorithms of course apply also to undirected graphs, but using sparse certificates we can improve the factors P(n,m) to P(n,kn) for both types of connectivity. This is better in time by a factor of O(k) over previous algorithms for undirected graphs. We also note that edge connectivity is NC-reducible to vertex connectivity even if k is not fixed.<>
似乎以前没有NC算法出现用于测试有向图的k边连通性或k顶点连通性,即使是固定k>1。使用基本流方法,我们给出了这样的算法,时间复杂度为O(k log n),分别使用nP(n,m)或(n+k/sup 2/)P(n,m)个处理器。其中,n为顶点数,m为边数,P(n,m)为具有O(n)个顶点和O(m)条边的有向图中,在O(log n)时间内找到两个指定顶点之间的路径所需的处理器数,计算模型为CRCW PRAM。这些算法当然也适用于无向图,但是使用稀疏证明我们可以将因子P(n,m)提高到P(n,kn),用于两种类型的连通性。这比以前的无向图算法在时间上要好0 (k)倍。我们还注意到,即使k不固定,边缘连通性也可以nc约化为顶点连通性。
{"title":"Fast parallel algorithms for testing k-connectivity of directed and undirected graphs","authors":"W. Liang, B. McKay","doi":"10.1109/ICAPP.1995.472215","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472215","url":null,"abstract":"It appears that no NC algorithms have previously appeared for testing a directed graph for k-edge connectivity or k-vertex connectivity, even for fixed k>1. Using an elementary flow method we give such algorithms, with time complexity O(k log n) using nP(n,m) or (n+k/sup 2/)P(n,m) processors, respectively. Here, n is the number of vertices, m is the number of edges, P(n,m) is the number of processors needed to find some path in time O(log n) time between two specified vertices in a directed graph with O(n) vertices and O(m) edges, and the computation model is a CRCW PRAM. These algorithms of course apply also to undirected graphs, but using sparse certificates we can improve the factors P(n,m) to P(n,kn) for both types of connectivity. This is better in time by a factor of O(k) over previous algorithms for undirected graphs. We also note that edge connectivity is NC-reducible to vertex connectivity even if k is not fixed.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127287316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Stereocorrelation on the parallel OPENVISION system 并行OPENVISION系统的立体相关
C. Mazzoni, H. Essafi, P. Julien, O. Jamet
The aim of this work is to reduce the computation time needed to produce the Digital Elevation Models (DEM) by using a parallel machine. It is made in collaboration between the French "Institut Geographique National" (IGN) and LETI-DEIN, a department of the French Atomic Energy Commission (CEA). The IGN has developed a system which provides accurate DEM, that is used to produce the commercialized topographic map. The kernel of this system is the correlator. The correlator is a part of software which automatically matches pairs of homologous points (i.e. a pair of points representing the same ground detail), and supplies disparities. Nevertheless the correlator is expensive in computing time. CEA-LETI. is involved in parallel architecture and image processing and has developed a SIMD (Single Instruction Multiple Data) parallel architecture called SYMPATI 2. This structure constitute the kernel of the parallel OPENVISION system that is commercialized by Centralp Automatisme. In order to reduce the computation time and to produce the DEM with the same accuracy than the scalar approach, this two partners tried to parallize the IGN's correlator on the OPENVISION system.<>
这项工作的目的是减少使用并行机产生数字高程模型(DEM)所需的计算时间。它是由法国国家地理研究所(IGN)和法国原子能委员会(CEA)的一个部门LETI-DEIN合作制作的。IGN开发了一个系统,可以提供精确的DEM,用于制作商业化的地形图。这个系统的核心是相关器。相关器是软件的一部分,它自动匹配成对的同源点(即一对点代表相同的地面细节),并提供差异。然而,相关器在计算时间上是昂贵的。CEA-LETI。参与并行架构和图像处理,并开发了SIMD(单指令多数据)并行架构,称为SYMPATI 2。这种结构构成了由Centralp Automatisme公司商业化的并行OPENVISION系统的核心。为了减少计算时间并产生与标量方法相同精度的DEM,这两个合作伙伴试图在OPENVISION系统上并行化IGN的相关器。
{"title":"Stereocorrelation on the parallel OPENVISION system","authors":"C. Mazzoni, H. Essafi, P. Julien, O. Jamet","doi":"10.1109/ICAPP.1995.472214","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472214","url":null,"abstract":"The aim of this work is to reduce the computation time needed to produce the Digital Elevation Models (DEM) by using a parallel machine. It is made in collaboration between the French \"Institut Geographique National\" (IGN) and LETI-DEIN, a department of the French Atomic Energy Commission (CEA). The IGN has developed a system which provides accurate DEM, that is used to produce the commercialized topographic map. The kernel of this system is the correlator. The correlator is a part of software which automatically matches pairs of homologous points (i.e. a pair of points representing the same ground detail), and supplies disparities. Nevertheless the correlator is expensive in computing time. CEA-LETI. is involved in parallel architecture and image processing and has developed a SIMD (Single Instruction Multiple Data) parallel architecture called SYMPATI 2. This structure constitute the kernel of the parallel OPENVISION system that is commercialized by Centralp Automatisme. In order to reduce the computation time and to produce the DEM with the same accuracy than the scalar approach, this two partners tried to parallize the IGN's correlator on the OPENVISION system.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121606759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A parallel method for finding the convex hull of discs 一种求圆盘凸包的平行方法
Wei Chen, K. Wada, K. Kawaguchi
We present a parallel method for finding the convex hull of a set of discs in the CREW PRAM model. We show that the convex hull of n discs can be computed in O(log/sup 1+/spl epsiv// n) time using O(n/log/sup /spl epsiv// n) processors, where /spl epsiv/ is any positive constant. We also show that it can be constructed in O(log n loglog n) time using O(n log n) processors. The first result achieves cost optimal and the second one runs faster. The main technique which we used in the algorithm is a complex divide-and-conquer technique.<>
提出了一种求解CREW PRAM模型中一组圆盘凸壳的并行方法。我们证明了n个圆盘的凸包可以用O(n/log/sup /spl epsiv// n)处理器在O(log/sup 1+/spl epsiv// n)时间内计算出来,其中/spl epsiv/是任意正常数。我们也证明了它可以用O(n log n)个处理器在O(log n log n)个时间内构造出来。第一个结果达到成本最优,第二个结果运行得更快。我们在算法中使用的主要技术是一种复杂的分治技术。
{"title":"A parallel method for finding the convex hull of discs","authors":"Wei Chen, K. Wada, K. Kawaguchi","doi":"10.1109/ICAPP.1995.472195","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472195","url":null,"abstract":"We present a parallel method for finding the convex hull of a set of discs in the CREW PRAM model. We show that the convex hull of n discs can be computed in O(log/sup 1+/spl epsiv// n) time using O(n/log/sup /spl epsiv// n) processors, where /spl epsiv/ is any positive constant. We also show that it can be constructed in O(log n loglog n) time using O(n log n) processors. The first result achieves cost optimal and the second one runs faster. The main technique which we used in the algorithm is a complex divide-and-conquer technique.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127773577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A novel stereovision architecture for real-time obstacle detection 一种用于实时障碍物检测的新型立体视觉结构
B. Arion, Y. Ni, F. Devos
This paper presents a novel stereo architecture to implement a stereovision algorithm suited to obstacle detection and collision avoidance applications in real-time environments. The algorithm is derived from the spatio-frequency analysis proposed by Marr and Poggio (1979), with the matching primitives resulting from a local extremum extraction in the band-pass filtered stereopair images. A VLSI implementation is motivated both by the high processing speed required in such a real-time content and by the simplicity and regularity of the algorithm. We have therefore designed a retina-like architecture, with an in-line parallel processing unit interfaced with an on-chip photodiode matrix sensor. This stereo retina system has been succesfully simulated with real scenes images taken from a running car on a highway. A 128-pixel line CMOS retina has been designed and fabricated. First-hand experiments are positive.<>
本文提出了一种新的立体结构来实现适合于实时环境中障碍物检测和避碰应用的立体视觉算法。该算法来源于Marr和Poggio(1979)提出的空间-频率分析,匹配原语来源于带通滤波后的立体对图像的局部极值提取。在这种实时内容中所要求的高处理速度以及算法的简单性和规律性都是VLSI实现的动力。因此,我们设计了一种类似视网膜的结构,具有与片上光电二极管矩阵传感器接口的在线并行处理单元。这种立体视网膜系统已经成功地模拟了高速公路上行驶的汽车的真实场景图像。设计并制作了一个128像素的直线CMOS视网膜。第一手实验结果是肯定的。
{"title":"A novel stereovision architecture for real-time obstacle detection","authors":"B. Arion, Y. Ni, F. Devos","doi":"10.1109/ICAPP.1995.472210","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472210","url":null,"abstract":"This paper presents a novel stereo architecture to implement a stereovision algorithm suited to obstacle detection and collision avoidance applications in real-time environments. The algorithm is derived from the spatio-frequency analysis proposed by Marr and Poggio (1979), with the matching primitives resulting from a local extremum extraction in the band-pass filtered stereopair images. A VLSI implementation is motivated both by the high processing speed required in such a real-time content and by the simplicity and regularity of the algorithm. We have therefore designed a retina-like architecture, with an in-line parallel processing unit interfaced with an on-chip photodiode matrix sensor. This stereo retina system has been succesfully simulated with real scenes images taken from a running car on a highway. A 128-pixel line CMOS retina has been designed and fabricated. First-hand experiments are positive.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129016407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A numeric weather prediction model for the IBM SP2 parallel computer IBM SP2并行计算机的数值天气预报模型
Glenn R. Wightwick, L. Leslie, S. F. Wail
A limited-area numeric weather prediction model specifically targeted for parallel computers has been successfully implemented an an IBM SP2 distributed-memory parallel computer. The model employs an explicit finite-difference scheme and was parallelised using a simple domain decomposition technique. On a twelve processor SP2, a 24 hour forecast using archived operational data and including a sophisticated representation of physical processes was run at a range of resolutions between 150 km and 19 km and near-linear speedups were achieved. Major weather centres have indicated a requirement for regional prediction models to be run at resolutions of approximately 5 km by the end of the decade. Based on this work, it appears that this target can be achieved through the use of scalable parallel computers.<>
一个专门针对并行计算机的有限区域数值天气预报模型在IBM SP2分布式内存并行计算机上成功实现。该模型采用显式有限差分格式,并使用简单的域分解技术进行并行化。在12个处理器SP2上,使用存档的操作数据和包括物理过程的复杂表示的24小时预测在150公里到19公里的分辨率范围内运行,并实现了近似线性的速度。各大气象中心已表示,到本十年结束时,区域预报模式的分辨率必须达到约5公里。基于这项工作,似乎可以通过使用可扩展的并行计算机来实现这一目标。
{"title":"A numeric weather prediction model for the IBM SP2 parallel computer","authors":"Glenn R. Wightwick, L. Leslie, S. F. Wail","doi":"10.1109/ICAPP.1995.472191","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472191","url":null,"abstract":"A limited-area numeric weather prediction model specifically targeted for parallel computers has been successfully implemented an an IBM SP2 distributed-memory parallel computer. The model employs an explicit finite-difference scheme and was parallelised using a simple domain decomposition technique. On a twelve processor SP2, a 24 hour forecast using archived operational data and including a sophisticated representation of physical processes was run at a range of resolutions between 150 km and 19 km and near-linear speedups were achieved. Major weather centres have indicated a requirement for regional prediction models to be run at resolutions of approximately 5 km by the end of the decade. Based on this work, it appears that this target can be achieved through the use of scalable parallel computers.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129033805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Design and performance measurements of an execution model for the parallel processing of Prolog programs Prolog程序并行处理执行模型的设计和性能测量
Dong Wang, Hiroaki Kobayashi, Tadao Nakamura
This paper presents a hierarchical parallel execution model for Prolog programs, the execution model is based on Or-parallelism/And-parallelism as coarse-grain parallelism, and parallel unification as fine-grain parallelism. At the coarse-grain parallelism level we propose an extended And-Or tree. Consequently, the tree can exploit high degree of parallelism from Prolog programs. Exploiting parallelism of Prolog programs is based an the binding-arrays method for Or-parallelism and the restricted And-parallelism (RAP) method for And-parallelism. At the fine-grain parallelism level, parallel unification is performed. In general, the parallel unification consists of parallel argument matching and consistency checking. However, since the RAP method does not need consistency checking, consistency checking at the fine-grain parallelism level is also removed. The measurements of the parallelism degree of this model are also to be presented in this paper.<>
提出了一种Prolog程序的分层并行执行模型,该模型以or -并行/ and -并行为粗粒度并行,并行统一为细粒度并行。在粗粒度并行级,我们提出了一个扩展的And-Or树。因此,树可以利用Prolog程序的高度并行性。利用Prolog程序的并行性是基于or并行性的绑定数组方法和and并行性的限制and并行(RAP)方法。在细粒度并行级,执行并行统一。一般来说,并行统一包括并行参数匹配和一致性检查。但是,由于RAP方法不需要一致性检查,因此也去掉了细粒度并行级的一致性检查。本文还给出了该模型平行度的测量方法。
{"title":"Design and performance measurements of an execution model for the parallel processing of Prolog programs","authors":"Dong Wang, Hiroaki Kobayashi, Tadao Nakamura","doi":"10.1109/ICAPP.1995.472252","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472252","url":null,"abstract":"This paper presents a hierarchical parallel execution model for Prolog programs, the execution model is based on Or-parallelism/And-parallelism as coarse-grain parallelism, and parallel unification as fine-grain parallelism. At the coarse-grain parallelism level we propose an extended And-Or tree. Consequently, the tree can exploit high degree of parallelism from Prolog programs. Exploiting parallelism of Prolog programs is based an the binding-arrays method for Or-parallelism and the restricted And-parallelism (RAP) method for And-parallelism. At the fine-grain parallelism level, parallel unification is performed. In general, the parallel unification consists of parallel argument matching and consistency checking. However, since the RAP method does not need consistency checking, consistency checking at the fine-grain parallelism level is also removed. The measurements of the parallelism degree of this model are also to be presented in this paper.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114926995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the doubly-linked list protocol for distributed shared memory multiprocessor systems 分布式共享内存多处理器系统双链表协议研究
Author Lau, K. Leung, N. Yung, Y. Cheung
This paper introduces the doubly-linked list (DLL) protocol for distributed shared memory (DSM) multiprocessor systems. The protocol makes use of two linked lists to keep track of valid copies of pages in the system, thus eliminating the use of copysets. Simulation studies show that the DLL protocol achieved considerable speed-up for common mathematical problems including a linear equations solver and a matrix multiplier. Performance improvement of up to 51.9% over the dynamic distributed manager algorithm is obtained. Further improvement and possible modification of the protocol are also discussed.<>
介绍了分布式共享内存(DSM)多处理器系统的双链表(DLL)协议。该协议使用两个链表来跟踪系统中页面的有效副本,从而消除了对副本集的使用。仿真研究表明,DLL协议在求解线性方程和矩阵乘法器等常见数学问题上实现了相当大的加速。与动态分布式管理器算法相比,性能提高高达51.9%。对协议的进一步改进和可能的修改也进行了讨论
{"title":"On the doubly-linked list protocol for distributed shared memory multiprocessor systems","authors":"Author Lau, K. Leung, N. Yung, Y. Cheung","doi":"10.1109/ICAPP.1995.472198","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472198","url":null,"abstract":"This paper introduces the doubly-linked list (DLL) protocol for distributed shared memory (DSM) multiprocessor systems. The protocol makes use of two linked lists to keep track of valid copies of pages in the system, thus eliminating the use of copysets. Simulation studies show that the DLL protocol achieved considerable speed-up for common mathematical problems including a linear equations solver and a matrix multiplier. Performance improvement of up to 51.9% over the dynamic distributed manager algorithm is obtained. Further improvement and possible modification of the protocol are also discussed.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130799477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Implementing photon event recognition algorithms on a 3D-flow system 在三维流系统上实现光子事件识别算法
M. Alderighi, D. Crosetto, S. D'Angelo, G. Sechi
This report describes an implementation on the 3D-flow system developed at the Superconducting Super Collider Lab. of the algorithms and equipment to recognize valid photon events using a morphological analysis of the signals of an intensified CCD in the photon counting mode. The analysis consists of calculating the coordinates of a matrix corresponding to the exact position of each incident photon on the channel plate. Several off-line calculations with efficiency studies aiming at finding the best algorithm for event reconstruction have been performed. This off-line algorithm can be accomplished in real time at the CCD input rate (up to 2000 frames/sec). The communication-intensive nature of the algorithm and of the topology of this application and the particular architecture of the 3D-flow system lead to a very efficient implementation. The existing hardware simulator allows studies of the entire system before actual construction.<>
本报告描述了在超导超级对撞机实验室开发的3d流系统的实现。在光子计数模式下,利用增强CCD信号的形态学分析来识别有效光子事件的算法和设备。分析包括计算与通道板上每个入射光子的精确位置相对应的矩阵的坐标。为了寻找事件重建的最佳算法,进行了几次离线计算和效率研究。这种离线算法可以在CCD输入速率(高达2000帧/秒)下实时完成。该算法的通信密集型特性以及该应用程序的拓扑结构以及3d流系统的特定架构导致了非常有效的实现。现有的硬件模拟器允许在实际构建之前对整个系统进行研究
{"title":"Implementing photon event recognition algorithms on a 3D-flow system","authors":"M. Alderighi, D. Crosetto, S. D'Angelo, G. Sechi","doi":"10.1109/ICAPP.1995.472265","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472265","url":null,"abstract":"This report describes an implementation on the 3D-flow system developed at the Superconducting Super Collider Lab. of the algorithms and equipment to recognize valid photon events using a morphological analysis of the signals of an intensified CCD in the photon counting mode. The analysis consists of calculating the coordinates of a matrix corresponding to the exact position of each incident photon on the channel plate. Several off-line calculations with efficiency studies aiming at finding the best algorithm for event reconstruction have been performed. This off-line algorithm can be accomplished in real time at the CCD input rate (up to 2000 frames/sec). The communication-intensive nature of the algorithm and of the topology of this application and the particular architecture of the 3D-flow system lead to a very efficient implementation. The existing hardware simulator allows studies of the entire system before actual construction.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131565090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1