首页 > 最新文献

Proceedings Sixth International Parallel Processing Symposium最新文献

英文 中文
A conceptual framework for implementing neural networks on massively parallel machines 在大规模并行机器上实现神经网络的概念框架
Pub Date : 1992-03-01 DOI: 10.1109/IPPS.1992.222973
Magali E. Azema-Barac
This paper describes a framework for implementing neural networks on massively parallel machines. The framework is generic and applies to a range of neural networks (Multi Layer Perceptron, Competitive Learning, Self-Organising Map, etc.) as well as a range of massively parallel machines (Connection Machine, Distributed Array Processor, MasPar). It consists of two phases: an abstract decomposition of neural networks and a machine specific decomposition. The abstract decomposition identifies the parallelism implemented by neural networks, and provides alternative distribution schemes according to the required exploitation of parallelism. The machine specific decomposition considers the relevant machine criteria, and integrates these with the result of the abstract decomposition to form a 'decision' system. This system formalises the relative gain of each distribution scheme according to neural network and machine criteria. It then identifies their possible optimisations. Finally, it computes and ranks the absolute speed up of each distribution scheme.<>
本文描述了一个在大规模并行机器上实现神经网络的框架。该框架是通用的,适用于一系列神经网络(多层感知器,竞争学习,自组织地图等)以及一系列大规模并行机器(连接机,分布式阵列处理器,MasPar)。它包括两个阶段:神经网络的抽象分解和机器特定的分解。抽象分解识别神经网络实现的并行性,并根据并行性开发的需要提供可选的分配方案。特定于机器的分解考虑了相关的机器标准,并将这些标准与抽象分解的结果集成在一起,形成一个“决策”系统。该系统根据神经网络和机器准则对各分配方案的相对增益进行形式化。然后识别它们可能的优化。最后,对各分配方案的绝对速度进行了计算和排序。
{"title":"A conceptual framework for implementing neural networks on massively parallel machines","authors":"Magali E. Azema-Barac","doi":"10.1109/IPPS.1992.222973","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222973","url":null,"abstract":"This paper describes a framework for implementing neural networks on massively parallel machines. The framework is generic and applies to a range of neural networks (Multi Layer Perceptron, Competitive Learning, Self-Organising Map, etc.) as well as a range of massively parallel machines (Connection Machine, Distributed Array Processor, MasPar). It consists of two phases: an abstract decomposition of neural networks and a machine specific decomposition. The abstract decomposition identifies the parallelism implemented by neural networks, and provides alternative distribution schemes according to the required exploitation of parallelism. The machine specific decomposition considers the relevant machine criteria, and integrates these with the result of the abstract decomposition to form a 'decision' system. This system formalises the relative gain of each distribution scheme according to neural network and machine criteria. It then identifies their possible optimisations. Finally, it computes and ranks the absolute speed up of each distribution scheme.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133600571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A structuring technique for compute-aggregate-broadcast algorithms on distributed memory computers 分布式存储计算机上计算-聚合-广播算法的结构化技术
Pub Date : 1992-03-01 DOI: 10.1109/IPPS.1992.223079
A. W. Kwan, L. Bic
A technique for structuring compute-aggregate-broadcast algorithms on distributed memory computers is presented. The compute-aggregate-broadcast paradigm provides an abstraction of the problem for the programmer, allowing for separation of computation and synchronization. Such algorithms are well suited for application on distributed memory computers. The structuring technique assists the parallel programmer with synchronization, allowing the programmer to concentrate more on developing code for computation. Two examples are presented.<>
提出了一种在分布式存储计算机上构建计算-聚合-广播算法的技术。计算-聚合-广播范式为程序员提供了问题的抽象,允许将计算和同步分离。这种算法非常适合在分布式存储计算机上应用。结构化技术帮助并行程序员进行同步,使程序员能够更多地集中精力开发用于计算的代码。给出了两个例子。
{"title":"A structuring technique for compute-aggregate-broadcast algorithms on distributed memory computers","authors":"A. W. Kwan, L. Bic","doi":"10.1109/IPPS.1992.223079","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223079","url":null,"abstract":"A technique for structuring compute-aggregate-broadcast algorithms on distributed memory computers is presented. The compute-aggregate-broadcast paradigm provides an abstraction of the problem for the programmer, allowing for separation of computation and synchronization. Such algorithms are well suited for application on distributed memory computers. The structuring technique assists the parallel programmer with synchronization, allowing the programmer to concentrate more on developing code for computation. Two examples are presented.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127181399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimal allocation of shared data over distributed memory hierarchies 分布式内存层次结构上共享数据的最佳分配
Pub Date : 1992-03-01 DOI: 10.1109/IPPS.1992.222974
E. Haddad
Nonreplicated shared data of distributed applications is optimally allocated to pre-specified multilevel memory partitions at the sites of a heterogeneous multicomputer network to minimize a weighted combination of systemwide mean time delay performance and mean communication cost per access request. Greedy and fast optimization algorithms are presented for nonqueueing lightly-loaded as well as heavily-loaded multiqueue system models with channel, l/O, and memory hierarchy queues. Extensions to data exhibiting nonuniform access demand rates and distinct query and update statistics are presented.<>
分布式应用程序的非复制共享数据被最佳地分配到异构多计算机网络站点上预先指定的多层内存分区,以最小化系统范围内平均时延性能和每个访问请求的平均通信成本的加权组合。针对具有通道、l/O和内存层次队列的非排队轻负载和重负载多队列系统模型,提出了贪婪和快速优化算法。对显示非统一访问需求率和不同查询和更新统计的数据进行了扩展
{"title":"Optimal allocation of shared data over distributed memory hierarchies","authors":"E. Haddad","doi":"10.1109/IPPS.1992.222974","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222974","url":null,"abstract":"Nonreplicated shared data of distributed applications is optimally allocated to pre-specified multilevel memory partitions at the sites of a heterogeneous multicomputer network to minimize a weighted combination of systemwide mean time delay performance and mean communication cost per access request. Greedy and fast optimization algorithms are presented for nonqueueing lightly-loaded as well as heavily-loaded multiqueue system models with channel, l/O, and memory hierarchy queues. Extensions to data exhibiting nonuniform access demand rates and distinct query and update statistics are presented.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116684505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A scheme for state change in a distributed environment using weighted throw counting 一种在分布式环境中使用加权抛出计数的状态改变方案
Pub Date : 1992-03-01 DOI: 10.1109/IPPS.1992.222992
K. Rokusawa, N. Ichiyoshi
This paper proposes a scheme for changing the execution state of a pool of processes in a distributed environment where there may be processes in transit. The scheme can detect the completion of state change using weighted throw counting and detect the termination as well. It works whether the communication channels are synchronous or asynchronous, FIFO or non-FIFO. The message complexity of the scheme is typically O(number of processing elements).<>
本文提出了一种在分布式环境中改变进程池执行状态的方案。该方案可以使用加权抛出计数检测状态变化的完成情况,也可以检测状态变化的终止情况。无论通信通道是同步还是异步,FIFO还是非FIFO,它都可以工作。该方案的消息复杂度通常为0(处理元素的数量)
{"title":"A scheme for state change in a distributed environment using weighted throw counting","authors":"K. Rokusawa, N. Ichiyoshi","doi":"10.1109/IPPS.1992.222992","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222992","url":null,"abstract":"This paper proposes a scheme for changing the execution state of a pool of processes in a distributed environment where there may be processes in transit. The scheme can detect the completion of state change using weighted throw counting and detect the termination as well. It works whether the communication channels are synchronous or asynchronous, FIFO or non-FIFO. The message complexity of the scheme is typically O(number of processing elements).<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134351561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Processor assignment in heterogeneous parallel architectures 异构并行体系结构中的处理器分配
Pub Date : 1992-03-01 DOI: 10.1109/IPPS.1992.223049
D. Menascé, S. Porto, S. Tripathi
It has been already demonstrated that cost-effective multiprocessor designs may be obtained by combining in the same architecture processors of different speeds (heterogeneous architecture) so that the serial and critical portions of the application may benefit from a fast single processor. The paper presents a systematic way to build static heuristic scheduling algorithms for such environments. Several algorithms are proposed and their performances are compared through simulation. One of the proposed algorithms is shown to achieve substantial performance gains as the degree of heterogeneity of the architecture increases.<>
已经证明,通过在同一体系结构中组合不同速度的处理器(异构体系结构),可以获得具有成本效益的多处理器设计,以便应用程序的串行和关键部分可以受益于快速的单个处理器。本文提出了一种系统的方法来构建这种环境下的静态启发式调度算法。提出了几种算法,并通过仿真对其性能进行了比较。随着体系结构异构程度的增加,其中一种提出的算法获得了实质性的性能提升。
{"title":"Processor assignment in heterogeneous parallel architectures","authors":"D. Menascé, S. Porto, S. Tripathi","doi":"10.1109/IPPS.1992.223049","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223049","url":null,"abstract":"It has been already demonstrated that cost-effective multiprocessor designs may be obtained by combining in the same architecture processors of different speeds (heterogeneous architecture) so that the serial and critical portions of the application may benefit from a fast single processor. The paper presents a systematic way to build static heuristic scheduling algorithms for such environments. Several algorithms are proposed and their performances are compared through simulation. One of the proposed algorithms is shown to achieve substantial performance gains as the degree of heterogeneity of the architecture increases.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"109 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124159810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Parallel heap operations on EREW PRAM: summary of results EREW PRAM上的并行堆操作:结果总结
Pub Date : 1992-03-01 DOI: 10.1109/IPPS.1992.223027
Weixiong Zhang, R. Korf
The authors present parallel algorithms for heap operations on an EREW PRAM. They first present a parallel heap construction algorithm with p processors running in O(n/p+logp) time. It takes 3.625n/p+4log p time in the worst case. The algorithm is optimal when p= theta (n/logn). They then propose a method to delete the root of a heap in parallel. To facilitate dynamic processor allocation, a data structure is developed in a preparatory step using O((n/logn)/sup 1-1/p/) processors in O(logp) time. A sequence of root deletion operations is realized such that each of these operations takes O((logn)/p+logp+loglogn) time using p processors. The authors also suggest an O((logn)/p+log p) time optimal parallel insert algorithm using p processors. When p= theta ((logn)/loglogn), both algorithms run in O(loglogn) time. The algorithms can also be extended to a parallel algorithm for deleting an element from a heap, given the address of the element.<>
作者提出了在EREW PRAM上进行堆操作的并行算法。他们首先提出了一个并行堆构建算法,其中p个处理器在O(n/p+logp)时间内运行。最坏情况下需要3.625n/p+4log p的时间。当p= theta (n/logn)时,算法最优。然后,他们提出了一种并行删除堆根的方法。为了便于动态处理器分配,在准备步骤中使用O((n/logn)/sup 1-1/p/)个处理器在O(logp)时间内开发数据结构。通过使用p个处理器,实现了一个根删除操作序列,使得每个操作花费O((logn)/p+logp+loglog)时间。作者还提出了一种使用p个处理器的O((logn)/p+log p)时间最优的并行插入算法。当p= theta ((logn)/loglogn)时,两种算法都在O(loglogn)时间内运行。这些算法还可以扩展为一个并行算法,用于从堆中删除给定元素地址的元素
{"title":"Parallel heap operations on EREW PRAM: summary of results","authors":"Weixiong Zhang, R. Korf","doi":"10.1109/IPPS.1992.223027","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223027","url":null,"abstract":"The authors present parallel algorithms for heap operations on an EREW PRAM. They first present a parallel heap construction algorithm with p processors running in O(n/p+logp) time. It takes 3.625n/p+4log p time in the worst case. The algorithm is optimal when p= theta (n/logn). They then propose a method to delete the root of a heap in parallel. To facilitate dynamic processor allocation, a data structure is developed in a preparatory step using O((n/logn)/sup 1-1/p/) processors in O(logp) time. A sequence of root deletion operations is realized such that each of these operations takes O((logn)/p+logp+loglogn) time using p processors. The authors also suggest an O((logn)/p+log p) time optimal parallel insert algorithm using p processors. When p= theta ((logn)/loglogn), both algorithms run in O(loglogn) time. The algorithms can also be extended to a parallel algorithm for deleting an element from a heap, given the address of the element.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115173649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A fast parallel scheduler for resource requests implemented using optical devices 使用光学设备实现的资源请求的快速并行调度程序
Pub Date : 1992-03-01 DOI: 10.1109/IPPS.1992.223052
T. V. Lakshman, A. Bagchi, K. Rastani
The paper describes a scheme to schedule uncoordinated requests for resources that arrive in parallel. The specific application that it considered is that of scheduling transmission requests in ATM switches. The scheme is capable of handling both unicast and multicast transmission requests. Two implementations of the scheme using photonic devices are described. A novel aspect of the scheme is that it uses photonic devices to implement a heuristic graph-coloring algorithm needed to generate transmission schedules.<>
本文描述了一种对并行到达的资源的不协调请求进行调度的方案。它考虑的具体应用是调度ATM交换机中的传输请求。该方案能够同时处理单播和组播传输请求。描述了两种使用光子器件的方案实现。该方案的新颖之处在于,它使用光子器件来实现生成传输调度所需的启发式图形着色算法。
{"title":"A fast parallel scheduler for resource requests implemented using optical devices","authors":"T. V. Lakshman, A. Bagchi, K. Rastani","doi":"10.1109/IPPS.1992.223052","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223052","url":null,"abstract":"The paper describes a scheme to schedule uncoordinated requests for resources that arrive in parallel. The specific application that it considered is that of scheduling transmission requests in ATM switches. The scheme is capable of handling both unicast and multicast transmission requests. Two implementations of the scheme using photonic devices are described. A novel aspect of the scheme is that it uses photonic devices to implement a heuristic graph-coloring algorithm needed to generate transmission schedules.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115827498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive graph computations with a connection machine 连接机的自适应图计算
Pub Date : 1992-03-01 DOI: 10.1109/IPPS.1992.223078
A. Aggarwal, W. T. Ma, G. Sandri, S. Sarkar
Results from parallel computing on a CM-2 Connection Machine are reported for a variety of graph-theoretic models for fitness optimization in evolutionary biology. These computations are among the most complex ever undertaken in this field and make full use of the internal hypercube architecture of the CM-2.<>
本文报道了在CM-2连接机上对进化生物学中适应度优化的各种图论模型进行并行计算的结果。这些计算是该领域迄今为止进行的最复杂的计算之一,充分利用了CM-2.>的内部超立方体架构
{"title":"Adaptive graph computations with a connection machine","authors":"A. Aggarwal, W. T. Ma, G. Sandri, S. Sarkar","doi":"10.1109/IPPS.1992.223078","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223078","url":null,"abstract":"Results from parallel computing on a CM-2 Connection Machine are reported for a variety of graph-theoretic models for fitness optimization in evolutionary biology. These computations are among the most complex ever undertaken in this field and make full use of the internal hypercube architecture of the CM-2.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125863376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Analytical modeling of a parallel branch-and-bound algorithm on MIN-based multiprocessors 基于min的多处理机并行分支定界算法的解析建模
Pub Date : 1992-03-01 DOI: 10.1109/IPPS.1992.223037
Myung-Kook Yang, C. Das
The authors propose a parallel decomposite, best-first' search branch-and bound algorithm for MIN-based multiprocessors. They start with a new probabilistic model to estimate the number of evaluated nodes for a serial algorithm. The proposed algorithm initially decomposes a problem into several subproblems. Each processor executes the serial best-first search to find a local feasible solution. The local solutions are broadcast through the network to compute the final solution. The speed-up analysis considers both the computation and communication overheads. It is seen that the parallel decomposite best-first search algorithm performs better than other reported schemes when communication overhead is taken into consideration.<>
针对基于最小值的多处理机,提出了一种并行分解、最佳优先搜索的分支定界算法。他们从一个新的概率模型开始估计一个串行算法的评估节点的数量。该算法首先将一个问题分解为若干子问题。每个处理器执行串行最佳优先搜索以找到局部可行解。局部解通过网络广播来计算最终解。加速分析同时考虑了计算和通信开销。在考虑通信开销的情况下,并行复合最佳优先搜索算法的性能优于其他已报道的算法。
{"title":"Analytical modeling of a parallel branch-and-bound algorithm on MIN-based multiprocessors","authors":"Myung-Kook Yang, C. Das","doi":"10.1109/IPPS.1992.223037","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223037","url":null,"abstract":"The authors propose a parallel decomposite, best-first' search branch-and bound algorithm for MIN-based multiprocessors. They start with a new probabilistic model to estimate the number of evaluated nodes for a serial algorithm. The proposed algorithm initially decomposes a problem into several subproblems. Each processor executes the serial best-first search to find a local feasible solution. The local solutions are broadcast through the network to compute the final solution. The speed-up analysis considers both the computation and communication overheads. It is seen that the parallel decomposite best-first search algorithm performs better than other reported schemes when communication overhead is taken into consideration.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121503518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The odd-even expansion storage scheme and its implementation issues 奇偶扩展存储方案及其实现问题
Pub Date : 1992-03-01 DOI: 10.1109/IPPS.1992.222969
Zhiyong Liu, Jia-Huai You, Xiaobo Li
The authors present a parallel storage scheme to distribute the elements of an N*N matrix over N memory banks, where N is any (odd or even) power of two, such that any rows, columns, forward and backward diagonals, and square or rectangular blocks can be accessed simultaneously without memory conflict. They present a simple scheme for address generation, which requires only logic operations and can be completed in constant time. They present two network implementation methods for data alignments for this storage scheme. Different from previously proposed routing algorithms, the algorithms for hypercube routing in this paper are free from network conflict. They do not require buffering and time length of a 'step' is shorter, therefore they are more efficient in terms of both hardware cost and speed. The authors also present a simple MIN implementation scheme for the realization of the data alignments. Schemes for processing smaller matrices efficiently on larger scale systems are also developed.<>
提出了一种并行存储方案,将N*N矩阵的元素分布在N个存储库上,其中N是2的任意(奇数或偶数)次幂,使得任何行、列、前后对角线、正方形或矩形块都可以同时访问,而不会产生内存冲突。他们提出了一种简单的地址生成方案,该方案只需要逻辑运算,并且可以在恒定时间内完成。他们为这种存储方案提供了两种数据对齐的网络实现方法。与以往提出的路由算法不同,本文提出的超立方体路由算法不存在网络冲突。它们不需要缓冲,并且“步骤”的时间长度更短,因此它们在硬件成本和速度方面都更有效。作者还提出了一种简单的MIN实现方案来实现数据对齐。在大型系统上有效处理较小矩阵的方案也得到了发展。
{"title":"The odd-even expansion storage scheme and its implementation issues","authors":"Zhiyong Liu, Jia-Huai You, Xiaobo Li","doi":"10.1109/IPPS.1992.222969","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222969","url":null,"abstract":"The authors present a parallel storage scheme to distribute the elements of an N*N matrix over N memory banks, where N is any (odd or even) power of two, such that any rows, columns, forward and backward diagonals, and square or rectangular blocks can be accessed simultaneously without memory conflict. They present a simple scheme for address generation, which requires only logic operations and can be completed in constant time. They present two network implementation methods for data alignments for this storage scheme. Different from previously proposed routing algorithms, the algorithms for hypercube routing in this paper are free from network conflict. They do not require buffering and time length of a 'step' is shorter, therefore they are more efficient in terms of both hardware cost and speed. The authors also present a simple MIN implementation scheme for the realization of the data alignments. Schemes for processing smaller matrices efficiently on larger scale systems are also developed.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":" 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120828750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Proceedings Sixth International Parallel Processing Symposium
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1