首页 > 最新文献

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing最新文献

英文 中文
A communication framework for heterogeneous distributed pattern analysis 异构分布式模式分析的通信框架
Gernot A. Fink, N. Jungclaus, Helge Ritter, G. Sagerer
Unlike in traditional approaches to parallel or distributed processing where normally well structured problems are implemented completely in some programming environment we are faced with the problem of integrating existing heterogeneous software systems. Furthermore, pattern analysis stresses special aspects of communication capabilities. Therefore, we propose a new communication framework dedicated to heterogeneous pattern analysis systems that handles typed structured data, enables completely symmetric interaction, and provides various call semantics. A first prototype evaluating some of the concepts in practical situations is presented.<>
在传统的并行或分布式处理方法中,通常结构良好的问题在某些编程环境中完全实现,而我们面临的问题是集成现有的异构软件系统。此外,模式分析强调通信能力的特殊方面。因此,我们提出了一种新的通信框架,专门用于异构模式分析系统,该系统处理类型化结构化数据,支持完全对称交互,并提供各种调用语义。第一个原型评估了一些概念在实际情况下
{"title":"A communication framework for heterogeneous distributed pattern analysis","authors":"Gernot A. Fink, N. Jungclaus, Helge Ritter, G. Sagerer","doi":"10.1109/ICAPP.1995.472283","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472283","url":null,"abstract":"Unlike in traditional approaches to parallel or distributed processing where normally well structured problems are implemented completely in some programming environment we are faced with the problem of integrating existing heterogeneous software systems. Furthermore, pattern analysis stresses special aspects of communication capabilities. Therefore, we propose a new communication framework dedicated to heterogeneous pattern analysis systems that handles typed structured data, enables completely symmetric interaction, and provides various call semantics. A first prototype evaluating some of the concepts in practical situations is presented.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130247737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Asynchronous interaction in massively parallel computing 大规模并行计算中的异步交互
V.L. Varscavsky
From the standpoint of hardware experts, asynchronism is connected with the concept of physical time as an independent physical variable and is determined by the variations of transient process durations in hardware circuits, modules and blocks that are physical objects by their nature. Software and architecture experts treat asynchronism as a partial order on events that are logical objects, i.e. they think in terms of logical time. In these terms, asynchronism is the variation of the process step quantity without respect to the real duration of these seeps in physical time. The measuring tool for time is a clock and the precision of the clock (along with the system of signal delivery) we can attain determines the area of its application (the allowed value of physical time step). The basic idea of self-timing is detecting the moments when transient processes in physical components are over and producing the corresponding logical signals that provide the transition to logical time (delay-insensitive design) in spite of the delay variation reasons. As all the logical signals invariant to the physical time and representing the events in the system are formed, self-timed methodology has a number of efficient hardware support methods to coordinate the events of the corresponding concurrent specification.<>
从硬件专家的角度来看,异步性与物理时间的概念有关,作为一个独立的物理变量,由硬件电路、模块和块中的瞬态过程持续时间的变化决定,这些电路、模块和块本质上是物理对象。软件和体系结构专家将异步视为逻辑对象事件的部分顺序,也就是说,他们从逻辑时间的角度进行思考。在这些术语中,异步是过程步骤数量的变化,而不考虑这些泄漏在物理时间中的实际持续时间。时间的测量工具是一个时钟,时钟的精度(以及信号传输系统)决定了它的应用范围(物理时间步长允许值)。自定时的基本思想是检测物理组件中的瞬态过程结束时的时刻,并产生相应的逻辑信号,尽管存在延迟变化的原因,但这些信号提供了向逻辑时间(延迟不敏感设计)的过渡。由于所有的逻辑信号都与物理时间保持不变,并表示系统中的事件,因此自定时方法有许多有效的硬件支持方法来协调相应并发规范的事件。
{"title":"Asynchronous interaction in massively parallel computing","authors":"V.L. Varscavsky","doi":"10.1109/ICAPP.1995.472302","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472302","url":null,"abstract":"From the standpoint of hardware experts, asynchronism is connected with the concept of physical time as an independent physical variable and is determined by the variations of transient process durations in hardware circuits, modules and blocks that are physical objects by their nature. Software and architecture experts treat asynchronism as a partial order on events that are logical objects, i.e. they think in terms of logical time. In these terms, asynchronism is the variation of the process step quantity without respect to the real duration of these seeps in physical time. The measuring tool for time is a clock and the precision of the clock (along with the system of signal delivery) we can attain determines the area of its application (the allowed value of physical time step). The basic idea of self-timing is detecting the moments when transient processes in physical components are over and producing the corresponding logical signals that provide the transition to logical time (delay-insensitive design) in spite of the delay variation reasons. As all the logical signals invariant to the physical time and representing the events in the system are formed, self-timed methodology has a number of efficient hardware support methods to coordinate the events of the corresponding concurrent specification.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128726601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Orchid: the design of a parallel and portable software platform for local area networks 兰花:一个并行和便携式的局域网软件平台的设计
G. Manis, K. Voliotis, C. Lekatsas, P. Tsanakas, G. Papakonstantinou
Orchid is a portable software platform aiming to decouple the parallel software development from the underlying system. Having layered structure, Orchid can be easily ported to different architectures only by reconstructing its lowest level. It also provides advanced facilities not supported by most operating systems and software platforms.<>
兰花是一个可移植的软件平台,旨在将并行软件开发与底层系统解耦。兰花具有分层结构,只需重建其最低层即可轻松移植到不同的架构中。它还提供了大多数操作系统和软件平台不支持的高级功能。
{"title":"Orchid: the design of a parallel and portable software platform for local area networks","authors":"G. Manis, K. Voliotis, C. Lekatsas, P. Tsanakas, G. Papakonstantinou","doi":"10.1109/ICAPP.1995.472299","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472299","url":null,"abstract":"Orchid is a portable software platform aiming to decouple the parallel software development from the underlying system. Having layered structure, Orchid can be easily ported to different architectures only by reconstructing its lowest level. It also provides advanced facilities not supported by most operating systems and software platforms.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123439334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Asynchronous interaction in massively parallel computing systems 大规模并行计算系统中的异步交互
V. Varshavsky, V.B. Marakhovsky, R.A. Lashevsky
The problems are discussed that arise when designing massively parallel computer systems. The transition from globally synchronized working of such systems to globally asynchronous behavior resolves most of them. This transition implies that all the local processes in the system should interact between each other on the base of asynchronous interfaces. The problems of asynchronous interaction of local processes with the system of their global coordination on the base of handshake are considered as well as the problems of self-timed data transmission between processes. If the system modules that realize local processes are not asynchronous and implemented in CMOS-technology, then, to detect the moments of the transient processes completion in them, the idea of current indication is used. A circuit of a current sensor is suggested with wide range of permissible changes of the measured current and with admissible characteristics. Two ways of organizing the interaction between circuits with current sensors are developed. The principles of self-timed data exchange between local processes of the system and data transmission by means of a dual-rail code and binary code with handshake for every bit are considered. The possibility of organizing single-wire bit handshake is demonstrated and its self-timed implementation is developed with the transmission rate no worse than that of double-wire bit handshake.<>
讨论了在设计大规模并行计算机系统时出现的问题。从这些系统的全局同步工作到全局异步行为的转变解决了其中的大多数问题。这种转换意味着系统中的所有本地流程都应该在异步接口的基础上相互交互。研究了基于握手的局部进程与全局协调系统的异步交互问题,以及进程间的自定时数据传输问题。如果实现局部进程的系统模块不是异步的,并且采用cmos技术实现,那么,为了检测瞬时进程在其中完成的力矩,就使用电流指示的思想。提出了一种电流传感器电路,该电路具有广泛的被测电流的允许变化范围和可接受的特性。提出了两种组织电路与电流传感器相互作用的方法。考虑了系统各局部进程之间的自定时数据交换原理,以及采用双轨码和每位握手的二进制码进行数据传输。论证了组织单线位握手的可能性,并开发了传输速率不低于双线位握手的自定时实现。
{"title":"Asynchronous interaction in massively parallel computing systems","authors":"V. Varshavsky, V.B. Marakhovsky, R.A. Lashevsky","doi":"10.1109/ICAPP.1995.472230","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472230","url":null,"abstract":"The problems are discussed that arise when designing massively parallel computer systems. The transition from globally synchronized working of such systems to globally asynchronous behavior resolves most of them. This transition implies that all the local processes in the system should interact between each other on the base of asynchronous interfaces. The problems of asynchronous interaction of local processes with the system of their global coordination on the base of handshake are considered as well as the problems of self-timed data transmission between processes. If the system modules that realize local processes are not asynchronous and implemented in CMOS-technology, then, to detect the moments of the transient processes completion in them, the idea of current indication is used. A circuit of a current sensor is suggested with wide range of permissible changes of the measured current and with admissible characteristics. Two ways of organizing the interaction between circuits with current sensors are developed. The principles of self-timed data exchange between local processes of the system and data transmission by means of a dual-rail code and binary code with handshake for every bit are considered. The possibility of organizing single-wire bit handshake is demonstrated and its self-timed implementation is developed with the transmission rate no worse than that of double-wire bit handshake.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123075206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fault-tolerant orthogonal fat-trees as interconnection networks 作为互连网络的容错正交胖树
M. Valerio, L. Moser, P. Melliar-Smith
Orthogonal fat-trees are a type of interconnection network with several desirable characteristics: short distance between processors, constant degree of the switching elements, uniform traffic load, symmetry, and recursive scalability. We first show how to build two-level orthogonal fat-trees, where each node has a fixed degree and there is a maximum distance of two between any two leaves. We then show how to provide fault tolerance by including redundant paths at the cost of reducing the number of leaves. Finally, we show how to construct large orthogonal fat-trees from two-level fat-trees recursively.<>
正交胖树是一种互连网络类型,具有几个理想的特性:处理器之间的距离短、交换元素的恒定程度、均匀的流量负载、对称性和递归可伸缩性。我们首先展示如何构建两层正交胖树,其中每个节点具有固定度,任意两个叶子之间的最大距离为2。然后,我们将展示如何以减少叶节点数量为代价,通过包含冗余路径来提供容错性。最后,我们展示了如何从两层脂肪树递归地构造大型正交脂肪树
{"title":"Fault-tolerant orthogonal fat-trees as interconnection networks","authors":"M. Valerio, L. Moser, P. Melliar-Smith","doi":"10.1109/ICAPP.1995.472263","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472263","url":null,"abstract":"Orthogonal fat-trees are a type of interconnection network with several desirable characteristics: short distance between processors, constant degree of the switching elements, uniform traffic load, symmetry, and recursive scalability. We first show how to build two-level orthogonal fat-trees, where each node has a fixed degree and there is a maximum distance of two between any two leaves. We then show how to provide fault tolerance by including redundant paths at the cost of reducing the number of leaves. Finally, we show how to construct large orthogonal fat-trees from two-level fat-trees recursively.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123562504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Embedded real-time video decompression algorithm and architecture for HDTV applications 用于高清电视应用的嵌入式实时视频解压算法和体系结构
R. Neogi
DCT/IDCT bared source coding and decoding techniques are widely accepted in HDTV systems and other MPEG based applications. In this paper, we propose a new direct 2-D IDCT algorithm bared on the parallel divide-and-conquer approach. The algorithm distributes computation by considering one transformed coefficient at a time and doing partial computation and updating as every coefficient arrives. A novel parallel and fully pipelined architecture with an effective processing time of one cycle per pixel for an N/spl times/N size block is designed to implement the algorithm. An unique feature of this architecture is that it integrates inverse-shuffling, inverse-quantization, inverse-source-coding, and motion-compensation into a single compact data-path. We avoid the insertion of a FIFO between the bit-stream decoder and decompression engine. The entire block of pixel values are sampled in a single cycle for post-processing after de-compression. Also, we use only (N/2(N/2+1))/2 multipliers and N/sup 2/ adders.<>
DCT/IDCT裸源编码和解码技术在高清电视系统和其他基于MPEG的应用中被广泛接受。本文提出了一种基于并行分治法的直接二维IDCT算法。该算法通过每次只考虑一个变换后的系数,并在每个系数到达时进行部分计算和更新来分配计算量。为了实现该算法,设计了一种新颖的并行和全流水线架构,对于N/spl × /N大小的块,其有效处理时间为每像素一个周期。该体系结构的一个独特之处在于它将反洗牌、反量化、反源编码和运动补偿集成到单个紧凑的数据路径中。我们避免了在码流解码器和解压缩引擎之间插入FIFO。整个块像素值在一个周期内采样,用于解压缩后的后处理。此外,我们只使用(N/2(N/2+1))/2个乘数和N/sup 2个加法器。
{"title":"Embedded real-time video decompression algorithm and architecture for HDTV applications","authors":"R. Neogi","doi":"10.1109/ICAPP.1995.472212","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472212","url":null,"abstract":"DCT/IDCT bared source coding and decoding techniques are widely accepted in HDTV systems and other MPEG based applications. In this paper, we propose a new direct 2-D IDCT algorithm bared on the parallel divide-and-conquer approach. The algorithm distributes computation by considering one transformed coefficient at a time and doing partial computation and updating as every coefficient arrives. A novel parallel and fully pipelined architecture with an effective processing time of one cycle per pixel for an N/spl times/N size block is designed to implement the algorithm. An unique feature of this architecture is that it integrates inverse-shuffling, inverse-quantization, inverse-source-coding, and motion-compensation into a single compact data-path. We avoid the insertion of a FIFO between the bit-stream decoder and decompression engine. The entire block of pixel values are sampled in a single cycle for post-processing after de-compression. Also, we use only (N/2(N/2+1))/2 multipliers and N/sup 2/ adders.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122309485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An efficient linear systolic algorithm for recovering longest common subsequences 一种恢复最长公共子序列的有效线性收缩算法
G. Luce, J. Myoupo
This paper presents an implementable linear systolic array of m cells which computes both a longest common subsequence and its length in time n+3m+p-1, where m/spl les/n and p is the length of the LCS. Our algorithm can be extended to recover more than one LCS. Another important property of our algorithm is that each element of an LCS is extracted with its ranks in A and B respectively. Thus we can precisely localize the elements of A and B which match each other. In practice, this information is essential in some situations.<>
本文提出了一种可实现的由m个单元组成的线性收缩阵列,它计算最长公共子序列及其在n+3m+p-1时间内的长度,其中m/spl /n和p为LCS的长度。我们的算法可以扩展到恢复多个LCS。我们算法的另一个重要特性是LCS的每个元素分别以其在A和B中的排名提取。因此,我们可以精确地定位A和B中相互匹配的元素。在实践中,这些信息在某些情况下是必不可少的。
{"title":"An efficient linear systolic algorithm for recovering longest common subsequences","authors":"G. Luce, J. Myoupo","doi":"10.1109/ICAPP.1995.472166","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472166","url":null,"abstract":"This paper presents an implementable linear systolic array of m cells which computes both a longest common subsequence and its length in time n+3m+p-1, where m/spl les/n and p is the length of the LCS. Our algorithm can be extended to recover more than one LCS. Another important property of our algorithm is that each element of an LCS is extracted with its ranks in A and B respectively. Thus we can precisely localize the elements of A and B which match each other. In practice, this information is essential in some situations.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122603905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
On deflection worm routing on meshes 关于蜗杆在网格上的偏转
A. Roberts, A. Symvonis
In this paper, we consider the deflection worm routing problem on two dimensional n/spl times/n meshes. Our results include: (i) an off-line algorithm for routing permutations in O(kn) steps, and (ii) a general method to obtain deflection worm routing algorithms from packet routing algorithms.<>
本文考虑二维n/spl次/n网格上的偏转蜗杆走线问题。我们的结果包括:(i) O(kn)步路由排列的离线算法,以及(ii)从分组路由算法中获得偏转蠕虫路由算法的一般方法。
{"title":"On deflection worm routing on meshes","authors":"A. Roberts, A. Symvonis","doi":"10.1109/ICAPP.1995.472207","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472207","url":null,"abstract":"In this paper, we consider the deflection worm routing problem on two dimensional n/spl times/n meshes. Our results include: (i) an off-line algorithm for routing permutations in O(kn) steps, and (ii) a general method to obtain deflection worm routing algorithms from packet routing algorithms.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114075009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A comparison between the powers of the PARBS and the RMBM parb和人民币的权力比较
K. Miyashita, Y. Tsujino, N. Tokura
The Processor Array with Reconfigurable Bus System (PARBS) and the Reconfigurable Multiple Bus Machine (RMBM) are models of parallel computation based on reconfigurable bus and processor array. The PARBS is a processor array that consists of processors arranged to a 2-dimensional grid with a reconfigurable bus system. The RMBM is also made of processors and reconfigurable bus system, but the processors are located in a row and the number of processors and the number of buses are independent of each other. In this paper, we describe that the computational power of the PARBS is equal to that of the RMBM on condition that two models are polynomially bounded. This is because that one model can be simulated in constant time by another model.<>
可重构总线处理器阵列系统(PARBS)和可重构多总线机(RMBM)是基于可重构总线和处理器阵列的并行计算模型。PARBS是一个处理器阵列,它由排列成二维网格的处理器组成,具有可重构的总线系统。RMBM也由处理器和可重构总线系统组成,但是处理器是排在一排的,处理器的数量和总线的数量是相互独立的。本文描述了在两个模型多项式有界的条件下,PARBS的计算能力等于RMBM的计算能力。这是因为一个模型可以在恒定时间内被另一个模型模拟。
{"title":"A comparison between the powers of the PARBS and the RMBM","authors":"K. Miyashita, Y. Tsujino, N. Tokura","doi":"10.1109/ICAPP.1995.472234","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472234","url":null,"abstract":"The Processor Array with Reconfigurable Bus System (PARBS) and the Reconfigurable Multiple Bus Machine (RMBM) are models of parallel computation based on reconfigurable bus and processor array. The PARBS is a processor array that consists of processors arranged to a 2-dimensional grid with a reconfigurable bus system. The RMBM is also made of processors and reconfigurable bus system, but the processors are located in a row and the number of processors and the number of buses are independent of each other. In this paper, we describe that the computational power of the PARBS is equal to that of the RMBM on condition that two models are polynomially bounded. This is because that one model can be simulated in constant time by another model.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114510048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Package of iterative solvers for the Fujitsu VPP500 parallel supercomputer 富士通VPP500并行超级计算机的迭代求解器包
Z. Leyk, M. Dow
We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<>
我们正在VPP500并行计算机上实现迭代方法。在这个过程中,我们遇到了各种各样的问题。很容易注意到,VPP500上的性能主要取决于用于计算的矩阵的类型。在稀疏计算中,利用矩阵的结构是很重要的。从以对角线格式存储的矩阵和以更通用格式存储的矩阵中获得的性能可能存在很大差异。因此,有必要为计算中使用的矩阵选择适当的格式。初步测试表明,相对于处理器的数量,包的实现是可扩展的,特别是对于大型问题。我们越来越清楚,传统的高效预处理技术最多只能使速度提高2倍。我们需要寻找更适合并行计算的新前置条件。由于所涉及的预处理成本可以忽略不计,多项式预处理方法很有吸引力。我们倾向于使用反向通信接口,以增加使用不同存储格式和前置条件进行测试所需的灵活性。我们可以得出结论,用现有的并行机器进行实验,以更好地理解难以从理论中得出的影响,如通信成本或存储数据方式的影响,是至关重要的。
{"title":"Package of iterative solvers for the Fujitsu VPP500 parallel supercomputer","authors":"Z. Leyk, M. Dow","doi":"10.1109/ICAPP.1995.472196","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472196","url":null,"abstract":"We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116743448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1