Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing最新文献

英文中文

Mapping of backpropagation learning onto distributed memory multiprocessors 反向传播学习在分布式内存多处理器上的映射

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472188

S. Mahapatra, R. Mahapatra

This paper presents a mapping scheme for parallel pipelined execution of the Backpropagation Learning Algorithm on distributed memory multiprocessors (DMMs). The proposed implementation exhibits training set parallelism that involves batch updating. Simple algorithms have been presented, which allow the data transfer involved in both forward and backward executions phases of the backpropagation algorithm to be carried out with a small communication overhead. The effectiveness of our mapping has been illustrated, by estimating the speedup of a proposed implementation on an array of T-805 transputers.<>

本文提出了一种在分布式存储多处理器(dmm)上并行流水线执行反向传播学习算法的映射方案。提出的实现展示了涉及批量更新的训练集并行性。提出了一种简单的算法，它允许反向传播算法的前向和后向执行阶段的数据传输以很小的通信开销进行。通过估计在T-805转发器阵列上提出的实现的加速，说明了我们映射的有效性。

引用次数: 1

A performance comparison of buffering schemes for multistage switches 多级开关缓冲方案的性能比较

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472271

Bin Zhou, Mohammed Atiquzzaman

Multistage Interconnection Networks (MIN) are used to connect processors and memories in large scale scalable multiprocessor systems. MINs have also been proposed as switching fabrics in ATM networks in the future Broadband ISDN networks. A MIN consists of several stages of small crossbar switching elements (SE). Buffers are used in the SEs to increase the throughput of the MIN and prevent internal loss of packets. Different buffering schemes for the SEs are discussed in this paper. The objective of this paper is to study the performance of MINs with different buffering schemes, in the presence of uniform and hot spot traffic patterns. The results obtained from the study will help the network designers in choosing appropriate buffering strategies for MINs. For comparing different buffering strategies, the throughput and packet delay have been used as the performance measures.<>

多级互连网络(multi - stage Interconnection network, MIN)用于连接大规模可扩展多处理器系统中的处理器和存储器。在未来的宽带ISDN网络中，MINs也被提议作为ATM网络中的交换结构。最小值由若干级的小交叉开关元件(SE)组成。在se中使用缓冲区来提高MIN的吞吐量，防止内部丢包。本文讨论了系统的不同缓冲方案。本文的目的是研究在均匀和热点交通模式下，采用不同缓冲方案的min的性能。研究结果将有助于网络设计者选择合适的min缓冲策略。为了比较不同的缓冲策略，使用吞吐量和数据包延迟作为性能指标。

引用次数: 6

On deflection worm routing on meshes 关于蜗杆在网格上的偏转

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472207

A. Roberts, A. Symvonis

In this paper, we consider the deflection worm routing problem on two dimensional n/spl times/n meshes. Our results include: (i) an off-line algorithm for routing permutations in O(kn) steps, and (ii) a general method to obtain deflection worm routing algorithms from packet routing algorithms.<>

本文考虑二维n/spl次/n网格上的偏转蜗杆走线问题。我们的结果包括:(i) O(kn)步路由排列的离线算法，以及(ii)从分组路由算法中获得偏转蠕虫路由算法的一般方法。

引用次数: 5

Embedded real-time video decompression algorithm and architecture for HDTV applications 用于高清电视应用的嵌入式实时视频解压算法和体系结构

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472212

R. Neogi

DCT/IDCT bared source coding and decoding techniques are widely accepted in HDTV systems and other MPEG based applications. In this paper, we propose a new direct 2-D IDCT algorithm bared on the parallel divide-and-conquer approach. The algorithm distributes computation by considering one transformed coefficient at a time and doing partial computation and updating as every coefficient arrives. A novel parallel and fully pipelined architecture with an effective processing time of one cycle per pixel for an N/spl times/N size block is designed to implement the algorithm. An unique feature of this architecture is that it integrates inverse-shuffling, inverse-quantization, inverse-source-coding, and motion-compensation into a single compact data-path. We avoid the insertion of a FIFO between the bit-stream decoder and decompression engine. The entire block of pixel values are sampled in a single cycle for post-processing after de-compression. Also, we use only (N/2(N/2+1))/2 multipliers and N/sup 2/ adders.<>

DCT/IDCT裸源编码和解码技术在高清电视系统和其他基于MPEG的应用中被广泛接受。本文提出了一种基于并行分治法的直接二维IDCT算法。该算法通过每次只考虑一个变换后的系数，并在每个系数到达时进行部分计算和更新来分配计算量。为了实现该算法，设计了一种新颖的并行和全流水线架构，对于N/spl × /N大小的块，其有效处理时间为每像素一个周期。该体系结构的一个独特之处在于它将反洗牌、反量化、反源编码和运动补偿集成到单个紧凑的数据路径中。我们避免了在码流解码器和解压缩引擎之间插入FIFO。整个块像素值在一个周期内采样，用于解压缩后的后处理。此外，我们只使用(N/2(N/2+1))/2个乘数和N/sup 2个加法器。

引用次数: 1

A multicast mechanism for actual causal ordering 用于实际因果排序的多播机制

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472199

W. Cheng, X. Jia, M. Werner

A number of multicast algorithms have been proposed to guarantee potential causal ordering. In these mechanisms, the delivery of a message would be blocked waiting for the delivery of causally earlier messages, even though it may not actually have a causal relationship with these messages. This represents a higher latency cost than necessary. Our objective is to reduce the latency time of multicast message delivery. This can be achieved by reducing the number of messages that a multicast message has to wait for. We propose a mechanism in which the delivery of a message would only be blocked by the delivery of messages with which it has an actual causal relationship. The mechanism includes causality information, supplied by users, in the multicast messages. Receivers deliver messages to application processes according to this information. We introduce a programming construction, message blocks, to simplify the task of expressing causality. Simulation results are included and discussed in detail.<>

为了保证潜在的因果顺序，已经提出了许多组播算法。在这些机制中，消息的传递将被阻塞，等待因果关系较早的消息的传递，即使它实际上可能与这些消息没有因果关系。这表示比必要的延迟成本更高。我们的目标是减少多播消息传递的延迟时间。这可以通过减少多播消息必须等待的消息数量来实现。我们提出了一种机制，在这种机制中，消息的传递只会被与其具有实际因果关系的消息的传递所阻止。该机制在组播消息中包含由用户提供的因果关系信息。接收器根据此信息向应用程序进程传递消息。我们引入了一种编程结构——消息块，以简化表达因果关系的任务。并对仿真结果进行了详细的讨论。

引用次数: 2

Analysis of shared buffer multistage networks with hot spot 带热点的共享缓冲区多级网络分析

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472270

M. Saleh, Mohammed Atiquzzaman

Multistage interconnection networks based on shared buffering are known to have better performance and buffer utilization than input or output buffered switches. Shared buffer switches do not suffer from head of line blocking which is a common problem in simple input buffering. Shared buffer switches have previously been studied under uniform and unbalanced traffic patterns. However, due to the complexity of the model, the performance of such a network, in the presence of a single hot spot, has not been fully explored. A hot spot arises when one of the outputs of the network becomes very popular. We develop a model for a multistage interconnection network constructed from shared buffer switching elements and operating under a hot spot traffic pattern. The model is validated by comparison with simulation results. The model is used to study the network performance in terms of the throughput, packet delay, packet loss probability and the optimal buffer utilization, Numerical results show that, in the presence of hot spot traffic, shared buffer switches degrade more significantly than switches with dedicated input and/or output buffers.<>

基于共享缓冲的多级互连网络比输入或输出缓冲交换机具有更好的性能和缓冲利用率。共享缓冲开关不会受到行头阻塞的困扰，这是简单输入缓冲中常见的问题。共享缓冲交换机已经在均匀和不平衡流量模式下进行了研究。然而，由于模型的复杂性，在存在单一热点的情况下，这种网络的性能并没有得到充分的探索。当网络的某个输出变得非常流行时，就会出现热点。我们建立了一个由共享缓冲交换元件构成的多级互连网络模型，该网络在热点流量模式下运行。通过与仿真结果的比较，验证了模型的正确性。利用该模型从吞吐量、数据包延迟、丢包概率和最优缓冲区利用率等方面对网络性能进行了研究。数值结果表明，在存在热点流量的情况下，共享缓冲区交换机比具有专用输入和/或输出缓冲区的交换机劣化更明显

{"title":"Analysis of shared buffer multistage networks with hot spot","authors":"M. Saleh, Mohammed Atiquzzaman","doi":"10.1109/ICAPP.1995.472270","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472270","url":null,"abstract":"Multistage interconnection networks based on shared buffering are known to have better performance and buffer utilization than input or output buffered switches. Shared buffer switches do not suffer from head of line blocking which is a common problem in simple input buffering. Shared buffer switches have previously been studied under uniform and unbalanced traffic patterns. However, due to the complexity of the model, the performance of such a network, in the presence of a single hot spot, has not been fully explored. A hot spot arises when one of the outputs of the network becomes very popular. We develop a model for a multistage interconnection network constructed from shared buffer switching elements and operating under a hot spot traffic pattern. The model is validated by comparison with simulation results. The model is used to study the network performance in terms of the throughput, packet delay, packet loss probability and the optimal buffer utilization, Numerical results show that, in the presence of hot spot traffic, shared buffer switches degrade more significantly than switches with dedicated input and/or output buffers.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127732435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Package of iterative solvers for the Fujitsu VPP500 parallel supercomputer 富士通VPP500并行超级计算机的迭代求解器包

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472196

Z. Leyk, M. Dow

We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<>

我们正在VPP500并行计算机上实现迭代方法。在这个过程中，我们遇到了各种各样的问题。很容易注意到，VPP500上的性能主要取决于用于计算的矩阵的类型。在稀疏计算中，利用矩阵的结构是很重要的。从以对角线格式存储的矩阵和以更通用格式存储的矩阵中获得的性能可能存在很大差异。因此，有必要为计算中使用的矩阵选择适当的格式。初步测试表明，相对于处理器的数量，包的实现是可扩展的，特别是对于大型问题。我们越来越清楚，传统的高效预处理技术最多只能使速度提高2倍。我们需要寻找更适合并行计算的新前置条件。由于所涉及的预处理成本可以忽略不计，多项式预处理方法很有吸引力。我们倾向于使用反向通信接口，以增加使用不同存储格式和前置条件进行测试所需的灵活性。我们可以得出结论，用现有的并行机器进行实验，以更好地理解难以从理论中得出的影响，如通信成本或存储数据方式的影响，是至关重要的。

{"title":"Package of iterative solvers for the Fujitsu VPP500 parallel supercomputer","authors":"Z. Leyk, M. Dow","doi":"10.1109/ICAPP.1995.472196","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472196","url":null,"abstract":"We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116743448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Motion detection on distributed-memory machines: a case study 分布式内存机器上的运动检测:一个案例研究

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472211

P. Cremonesi, M. Pugassi, N. Scarabottolo

The problem considered in this paper is the implementation of motion-detection on distributed-memory MIMD machines. The solution here proposed is based on pyramidal algorithms that, by iteratively discarding uninteresting details, allow to focus on the moving parts of an image stream. Different parallelisation methodologies have been evaluated and the most promising ones have been implemented on a Transputer-based parallel machine. Experimental results are here presented and compared with the theoretical ones.<>

本文考虑的问题是在分布式存储器MIMD机器上实现运动检测。这里提出的解决方案是基于金字塔算法，通过迭代丢弃无趣的细节，允许专注于图像流的运动部分。对不同的并行化方法进行了评估，其中最有前途的方法已经在基于transpter的并行机上实现。本文给出了实验结果，并与理论结果进行了比较。

引用次数: 0

Improved parallel algorithms for finding connected components 改进的寻找连接组件的并行算法

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472217

K. W. Chong, Tak-Wah Lam

Finding the connected components of a graph is a basic computational problem. In recent years, there were several exciting results in breaking the log/sup 2/ n-time barrier to finding connected components on parallel machines using shared memory without concurrent-write capability. This paper further presents two new parallel algorithms both using less than log/sup 2/ n time. The merit of the first algorithm is that it uses only a sublinear number of processors, yet retains the time complexity of the fastest existing algorithm. The second algorithm is slightly slower but its work (i.e., the time-processor product) is closer to optimal than all previous algorithms using less than log/sup 2/ n time.<>

寻找图的连通分量是一个基本的计算问题。近年来，有几个令人兴奋的结果打破了log/sup 2/ n-time的障碍，在使用共享内存而没有并发写能力的并行机器上找到连接的组件。本文进一步提出了两种新的并行算法，时间均小于log/sup 2/ n。第一种算法的优点是它只使用亚线性数量的处理器，但保留了现有最快算法的时间复杂度。第二种算法稍微慢一点，但它的工作(即时间处理器乘积)比之前的所有算法更接近于最优，所用时间小于log/sup 2/ n

引用次数: 3

Essential features of a compiler target language for parallel machines 并行机器的编译器目标语言的基本特征

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472172

G. A. Papadopoulos

Term Graph Rewriting Systems (TGRS) have been used extensively as an implementation vehicle for a number of, often divergent, programming paradigms ranging from the traditional functional programming ones to the (concurrent) logic programming ones and various amalgamations of them, to (concurrent) object-oriented ones. More recently, the relationship between TGRS and process calculi (such as the /spl pi/-calculus) as well as Linear Logic has also been explored. In this paper we describe our experience in using an intermediate Compiler Target Language (CTL) based on TGRS for mapping a variety of programming paradigms of the aforementioned types onto it, highlighting in the process some of the issues which we feel any such intermediate representation should address and which form effectively a minimum set of features every CTL should possess.<>

术语图重写系统(TGRS)已被广泛用作许多编程范式的实现工具，这些范式通常是发散的，从传统的函数式编程范式到(并发的)逻辑编程范式及其各种合并，再到(并发的)面向对象的编程范式。最近，TGRS与过程演算(如/spl pi/-演算)以及线性逻辑之间的关系也得到了探讨。在本文中，我们描述了我们使用基于TGRS的中间编译器目标语言(CTL)的经验，用于将上述类型的各种编程范式映射到其上，并在此过程中突出了我们认为任何此类中间表示应该解决的一些问题，这些问题有效地形成了每个CTL应该拥有的最小特征集。

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀