Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing最新文献

英文中文

Restructuring the flow of image and video processing programs to increase instruction level parallelism 重构图像和视频处理程序的流程以增加指令级并行性

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing

Pub Date : 2001-02-07 DOI: 10.1109/EMPDP.2001.905066

M. Maresca, N. Zingirian

This paper addresses the problem of preparing efficient implementations of Image Processing (IP) tasks for Instruction Level Parallel (ILP, i.e., superscalar and pipelined) architectures. First it shows an accurate analysis of ILP architectures and IP task structures. This analysis allows identifying specific sources of inefficiency that affect typical implementations of IP programs for ILP architectures. Then, it introduces a novel processing model, named Bucket Processing (BP), aimed at reducing the inefficiencies of IP programs characterized by the presence of nested loops, typical of image processing, and by the presence of conditional statements in the innermost loop bodies. Finally, it describes how BP restructures the program flow in such a way to deliver significant speed up in programs running on real ILP platforms.

本文讨论了为指令级并行(ILP，即超标量和流水线)架构准备有效实现图像处理(IP)任务的问题。首先，它显示了对ILP架构和IP任务结构的准确分析。该分析允许识别影响ILP体系结构的典型IP程序实现的低效率的特定来源。然后，它引入了一种新的处理模型，称为桶处理(BP)，旨在减少IP程序的低效率，其特征是存在嵌套循环，典型的图像处理，以及在最内层循环体中存在条件语句。最后，它描述了BP如何以这种方式重组程序流程，从而在实际ILP平台上运行的程序中提供显着的速度。

引用次数: 0

Programming cooperative systems in Drago 在Drago编程协作系统

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing

Pub Date : 2001-02-07 DOI: 10.1109/EMPDP.2001.905045

Javier Miranda, F. Santana, A. Alvarez, S. Arévalo

Drago is an experimental Ada extension designed to facilitate the implementation of fault-tolerant and cooperative distributed applications. It is the result of an effort to impose discipline and give linguistic support to the main concepts of the group communication paradigm. In this paper we focus our attention on the Drago linguistic support for the implementation of distributed cooperative applications. We introduce Drago and give some simple examples of its use.

Drago是一个实验性的Ada扩展，旨在促进容错和协作分布式应用程序的实现。它是强加纪律的努力的结果，并为群体交际范式的主要概念提供语言支持。在本文中，我们将重点关注Drago语言对分布式协作应用实现的支持。我们介绍Drago并给出一些简单的例子。

引用次数: 0

Coarse reconfigurable multimedia unit extension 粗略的可重构多媒体单元扩展

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing

Pub Date : 2001-02-07 DOI: 10.1109/EMPDP.2001.905048

Stephan Wong, S. Cotofana, S. Vassiliadis

In this paper we introduce a coarse reconfigurable multimedia functional unit (rMFU) extension to a superscalar general-purpose processor (GPP) and a set of specialized multimedia instructions to extend the GPPs instruction set. Two multimedia operations, the DCT operation and the Huffman encoding operation, were chosen to assess the expected performance of our proposal. The performance of the extended processor including the rMFU was evaluated by utilizing modified versions of the ijpeg and mpeg2enc benchmarks and a cycle accurate simulator. Our experiments suggest that the usage of the rMFU in an out-of-order superscalar processor (without increasing the cycle time) is able to decrease the total number of execution cycles by a value between 12.40% and 23.72% when compared to the same processor without such an unit. Moreover, the number of executed instructions are reduced by between 13.67% and 23.61% and the executed branches by between 9.83% and 15.98%.

本文介绍了一种用于超标量通用处理器(GPP)的粗可重构多媒体功能单元(rMFU)扩展，以及一组用于扩展GPP指令集的专用多媒体指令集。两个多媒体操作，DCT操作和霍夫曼编码操作，被选择来评估我们的建议的预期性能。利用改进版本的ijpeg和mpeg2enc基准测试和周期精确模拟器对包括rMFU在内的扩展处理器的性能进行了评估。我们的实验表明，在无序超标量处理器中使用rMFU(不增加周期时间)能够将执行周期的总数减少12.40%到23.72%，与没有这样一个单元的相同处理器相比。此外，执行的指令数量减少了13.67%到23.61%，执行的分支减少了9.83%到15.98%。

引用次数: 17

MPI collective communication operations on large shared memory systems 大型共享内存系统上的MPI集体通信操作

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing

Pub Date : 2001-02-07 DOI: 10.1109/EMPDP.2001.905038

M. Bernaschi, G. Richelli

Collective communication performance is critical in a number of MPI applications yet relatively few results are available to assess the performance of MPI implementations specially for shared memory multiprocessors. In this paper we focus on the most widely used primitive, broadcast, and present experimental results for the Sun Enterprise 10000. We compare the performance of the Sun MPI primitives with our implementation based on a quasi-optimal algorithm. Our tests highlight advantages and drawbacks of vendors' implementations of collective communication primitives and suggest that the choice of the best algorithm may depend on exogenous factors like load balancing among tasks.

在许多MPI应用程序中，集体通信性能是至关重要的，但相对较少的结果可用于评估MPI实现的性能，特别是对于共享内存多处理器。本文以应用最广泛的原语广播为研究对象，给出了Sun Enterprise 10000的实验结果。我们比较了Sun MPI原语和我们基于准最优算法的实现的性能。我们的测试突出了供应商集体通信原语实现的优点和缺点，并表明最佳算法的选择可能取决于外部因素，如任务之间的负载平衡。

引用次数: 12

The SDAARC architecture SDAARC架构

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing

Pub Date : 2001-02-07 DOI: 10.1109/EMPDP.2001.905071

R. Moore, B. Klauer, K. Waldschmidt

While traditional parallel computing systems are still struggling to gain a wider acceptance, the largest parallel computer that has ever been available is currently growing with the communication resource Internet. Unfortunately it is also rarely used in the parallel computation field. The reason for the rejection of parallel computers is mainly the difficulty of parallel programming. In this paper we propose the Self Distributing Associative ARChitecture (SDAARC). It has been derived from the Cache Only Memory Architecture (COMA). COMAs provide a distributed shared memory (DSM) with automatic distribution of data. We show how this paradigm of data distribution can be extended to the automatic distribution of instruction sequences (microthreads). We show how microthreads can be extracted from legacy C code to produce code that can automatically be parallelized by SDAARC at run time. We also discuss how SDAARC can be implemented on a rightly coupled multiprocessor systems on heterogenous LAN based computer networks (Intranet) and on WANs of computing resources.

虽然传统的并行计算系统仍在努力获得更广泛的接受，但迄今为止最大的并行计算机正在随着通信资源Internet的发展而发展。不幸的是，它也很少用于并行计算领域。人们拒绝并行计算机的原因主要是并行编程的难度。本文提出了自分布关联体系结构(SDAARC)。它是由纯缓存内存架构(COMA)衍生而来的。COMAs提供了一种分布式共享内存(DSM)，具有数据的自动分布。我们将展示如何将这种数据分布范例扩展到指令序列(微线程)的自动分布。我们将展示如何从遗留C代码中提取微线程，以生成可以在运行时由SDAARC自动并行化的代码。我们还讨论了如何在基于异构局域网的计算机网络(Intranet)和计算资源广域网上正确耦合的多处理器系统上实现SDAARC。

引用次数: 5

Heterogeneous matrix-matrix multiplication or partitioning a square into rectangles: NP-completeness and approximation algorithms 异质矩阵-矩阵乘法或将正方形划分为矩形:np完备性和近似算法

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing

Pub Date : 2001-02-07 DOI: 10.1109/EMPDP.2001.905056

Olivier Beaumont, Vincent Boudet, Arnaud Legrand, F. Rastello, Y. Robert

In this paper, we deal with two geometric problems arising from heterogeneous parallel computing: how to partition the unit square into p rectangles of given area s/sub 1/, s/sub 2/, ..., s/sub p/ (such that /spl Sigma//sub i=1//sup p/ s/sub i/=1), so as to minimize (i) either the sum of the p perimeters of the rectangles (ii) or the largest perimeter of the p rectangles. For both problems, we prove NP-completeness and we introduce approximation algorithms.

本文研究了异构并行计算中出现的两个几何问题:如何将单位正方形划分为p个给定面积的矩形s/sub 1/， s/sub 2/，…， s/下标p/(使得/spl Sigma//下标i=1//sup p/ s/下标i/=1)，从而使(i)最小化，或者使p个矩形的p个周长之和(ii)最小化，或者使p个矩形的最大周长最小化。对于这两个问题，我们证明了np完备性并引入了近似算法。

引用次数: 13

Parallel simulated annealing for the delivery problem 交货问题的并行模拟退火

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing

Pub Date : 2001-02-07 DOI: 10.1109/EMPDP.2001.905046

Z. Czech

A delivery problem which reduces to an NP-complete set-partitioning problem is considered. Two algorithms of parallel simulated annealing, i.e. the simultaneous independent searches and the simultaneous periodically interacting searches are investigated. The objective is to improve the accuracy of solutions to the problem by applying parallelism. The accuracy of a solution is meant as its proximity to the optimum solution. The empirical evidence supported by the statistical analysis indicates that the interaction of processes in parallel simulated annealing can yield more accurate solutions to the delivery problem as compared to the case when the processes run independently.

考虑了一个可归结为np完全集划分问题的交付问题。研究了并行模拟退火的两种算法，即同时独立搜索和同时周期性交互搜索。目标是通过应用并行性来提高问题解决方案的准确性。一个解的精度是指它与最优解的接近程度。统计分析支持的经验证据表明，与独立运行的过程相比，并行模拟退火过程的相互作用可以产生更准确的交货问题解决方案。

引用次数: 16

COOPE: a tool for representing concurrent object-oriented program execution through visualisation 通过可视化表示并发的面向对象程序执行的工具

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing

Pub Date : 2001-02-07 DOI: 10.1109/EMPDP.2001.905016

Hugo Leroux, C. Exton

There has been a move to introduce concurrency and object-orientation in the undergraduate curriculum. However, both bring forth challenging new concepts to the students. Despite these challenges, the benefits gained from learning concurrent object-oriented programming are numerous. Visualisation holds great promise in expediting comprehension of such complex issues. The aim of this paper is to discuss the potential of our visualisation tool, COOPE, to assist the students in comprehending the complexities of concurrent object-oriented programs. We thus present some broad requirements of a visualisation tool and discuss the design and implementation of COOPE.

在本科课程中引入了并发和面向对象的做法。然而，两者都给学生带来了具有挑战性的新概念。尽管存在这些挑战，但从学习并发面向对象编程中获得的好处还是很多的。可视化在加速理解这些复杂问题方面有着巨大的希望。本文的目的是讨论我们的可视化工具COOPE的潜力，以帮助学生理解并发面向对象程序的复杂性。因此，我们提出了可视化工具的一些广泛要求，并讨论了COOPE的设计和实现。

引用次数: 6

On the relative behavior of source and distributed routing in NOWs using Up*/Down* routing schemes 使用Up /Down路由方案的NOWs中源路由和分布式路由的相对行为

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing

Pub Date : 2001-02-07 DOI: 10.1109/EMPDP.2001.904962

J. Sancho, A. Robles, J. Duato

Networks of workstations (NOWs) are arranged as a switch-based network with irregular topology, which makes routing and deadlock avoidance quite complicated. Current proposals use the up*/down* routing algorithm to remove cyclic dependencies between channels and avoid deadlock. Recently, a simple and effective methodology to compute up*/down* routing tables has been proposed by us. The resulting up*/down* routing scheme increases the number of alternative paths between every pair of switches and allows most messages to follow minimal paths. Also, up*/down* routing is suitable to be implemented using source or distributed routing. Source routing provides a safer and lower cost implementation of up*/down* routing than that provided by distributed routing. However distributed routing may benefit from routing messages through alternative paths to reach their destination. In this paper we evaluate the performance of up*/down* routing when using two methodologies to compute routing tables, and when both source and distributed routing are used. Evaluation results show that it is not worth to implement up*/down* routing in a distributed way in a NOW environment, since its performance is very close to that achieved by implementing it with source routing when a traffic-balancing algorithm is used. Moreover it is shown that a greater improvement in performance can be achieved by modifying the method to compute up*/down* routing tables when source routing is used.

工作站网络是一个基于交换机的不规则拓扑网络，这使得路由和死锁避免变得非常复杂。目前的建议使用向上/向下路由算法来消除通道之间的循环依赖并避免死锁。最近，我们提出了一种简单有效的上/下路由表计算方法。由此产生的up /down路由方案增加了每对交换机之间可选路径的数量，并允许大多数消息遵循最小路径。此外，上行/下行路由适合使用源路由或分布式路由来实现。源路由提供了一种比分布式路由更安全、成本更低的up /down路由实现。然而，分布式路由可能受益于通过备选路径路由消息以到达目的地。在本文中，我们在使用两种方法计算路由表，以及同时使用源路由和分布式路由时，评估了上/下路由的性能。评估结果表明，在NOW环境中以分布式方式实现up*/down*路由是不值得的，因为当使用流量平衡算法时，它的性能非常接近与源路由实现的性能。此外，当使用源路由时，通过修改上/下路由表的计算方法可以实现更大的性能改进。

{"title":"On the relative behavior of source and distributed routing in NOWs using Up*/Down* routing schemes","authors":"J. Sancho, A. Robles, J. Duato","doi":"10.1109/EMPDP.2001.904962","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.904962","url":null,"abstract":"Networks of workstations (NOWs) are arranged as a switch-based network with irregular topology, which makes routing and deadlock avoidance quite complicated. Current proposals use the up*/down* routing algorithm to remove cyclic dependencies between channels and avoid deadlock. Recently, a simple and effective methodology to compute up*/down* routing tables has been proposed by us. The resulting up*/down* routing scheme increases the number of alternative paths between every pair of switches and allows most messages to follow minimal paths. Also, up*/down* routing is suitable to be implemented using source or distributed routing. Source routing provides a safer and lower cost implementation of up*/down* routing than that provided by distributed routing. However distributed routing may benefit from routing messages through alternative paths to reach their destination. In this paper we evaluate the performance of up*/down* routing when using two methodologies to compute routing tables, and when both source and distributed routing are used. Evaluation results show that it is not worth to implement up*/down* routing in a distributed way in a NOW environment, since its performance is very close to that achieved by implementing it with source routing when a traffic-balancing algorithm is used. Moreover it is shown that a greater improvement in performance can be achieved by modifying the method to compute up*/down* routing tables when source routing is used.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117204836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Design and implementation of a data stabilizing software tool 数据稳定软件工具的设计与实现

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing

Pub Date : 2001-02-07 DOI: 10.1109/EMPDP.2001.905009

V. D. Florio, Geert Deconinck, R. Lauwereins, S. Graeber

We describe a software tool which implements a software system for stabilizing data values, capable of tolerating both permanent faults in memory and transient faults affecting computation, input and memory devices by means of a strategy coupling temporal and spatial redundancy. The tool maximizes data integrity allowing a new value to enter the system only after a user-parameterizable stabilization procedure has been successfully passed. Designed and developed in the framework of the ESPRIT project TIRAN, the tool can be used stand-alone but can also be coupled with other dependable mechanisms developed within that project. Its use is being currently investigated within ENEL, the main Italian electricity supplier in order to replace a hardware stable storage device adopted in their high-voltage sub-stations.

我们描述了一个软件工具，它实现了一个稳定数据值的软件系统，能够容忍存储器中的永久故障和影响计算、输入和存储设备的瞬态故障，通过耦合时间和空间冗余的策略。该工具最大限度地提高了数据完整性，只有在用户可参数化的稳定程序成功通过后，才允许新值进入系统。该工具是在ESPRIT项目TIRAN的框架内设计和开发的，可以单独使用，也可以与该项目中开发的其他可靠机制相结合。目前正在意大利主要电力供应商ENEL内部调查其使用情况，以取代其高压分站采用的硬件稳定储存装置。

引用次数: 7

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀