首页 > 最新文献

Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing最新文献

英文 中文
A QoS performance measure framework for distributed heterogeneous networks 面向分布式异构网络的QoS性能度量框架
Pub Date : 2000-01-19 DOI: 10.1109/EMPDP.2000.823388
Jong-Kook Kim, D. Hensgen, T. Kidd, H. Siegel, David St. John, C. Irvine, T. Levin, N. W. Porter, V. Prasanna, R. F. Freund
In a distributed heterogeneous computing environment, users' tasks are allocated resources to simultaneously satisfy, to varying degrees, the tasks' different, and possibly conflicting, quality of service (QoS) requirements. When the total demand placed on system resources by the tasks, for a given interval of time, exceeds the resources available, some tasks will receive degraded service or no service at all. One part of a measure to quantify the success of a resource management system (RMS) in such a distributed environment is the collective value of the tasks completed during an interval of time, as perceived by the user, application, or policy maker. The flexible integrated system capability (FISC) ratio introduced here is a measure for quantifying this collective value. The FISC ratio is a multi-dimensional measure, and may include priorities, versions of a task or data, deadlines, situational mode, security, application- and domain-specific QoS, and dependencies. In addition to being used for evaluating and comparing RMS, the FISC ratio can be incorporated as part of the objective function in a system's scheduling heuristics.
在分布式异构计算环境中,为用户的任务分配资源,以不同程度地同时满足任务之间不同的、可能相互冲突的服务质量(QoS)需求。当任务对系统资源的总需求在给定时间间隔内超过可用资源时,一些任务将收到降级的服务或根本没有服务。在这样一个分布式环境中,量化资源管理系统(RMS)成功的度量的一部分是在一段时间内完成的任务的集体价值,由用户、应用程序或政策制定者感知。本文引入的柔性集成系统能力(FISC)比率是量化这一集体价值的一种度量。FISC比率是一个多维度量,可能包括优先级、任务或数据的版本、截止日期、情景模式、安全性、特定于应用程序和领域的QoS以及依赖关系。除了用于评估和比较RMS之外,FISC比率还可以作为系统调度启发式中的目标函数的一部分。
{"title":"A QoS performance measure framework for distributed heterogeneous networks","authors":"Jong-Kook Kim, D. Hensgen, T. Kidd, H. Siegel, David St. John, C. Irvine, T. Levin, N. W. Porter, V. Prasanna, R. F. Freund","doi":"10.1109/EMPDP.2000.823388","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823388","url":null,"abstract":"In a distributed heterogeneous computing environment, users' tasks are allocated resources to simultaneously satisfy, to varying degrees, the tasks' different, and possibly conflicting, quality of service (QoS) requirements. When the total demand placed on system resources by the tasks, for a given interval of time, exceeds the resources available, some tasks will receive degraded service or no service at all. One part of a measure to quantify the success of a resource management system (RMS) in such a distributed environment is the collective value of the tasks completed during an interval of time, as perceived by the user, application, or policy maker. The flexible integrated system capability (FISC) ratio introduced here is a measure for quantifying this collective value. The FISC ratio is a multi-dimensional measure, and may include priorities, versions of a task or data, deadlines, situational mode, security, application- and domain-specific QoS, and dependencies. In addition to being used for evaluating and comparing RMS, the FISC ratio can be incorporated as part of the objective function in a system's scheduling heuristics.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128211955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
An efficient algorithm for the physical mapping of clustered task graphs onto multiprocessor architectures 聚类任务图到多处理器架构物理映射的有效算法
Pub Date : 2000-01-19 DOI: 10.1109/EMPDP.2000.823437
N. Koziris, M. Romesis, P. Tsanakas, G. Papakonstantinou
The most important issue in sequential program parallelisation is the efficient assignment of computations into different processing elements. In the past, too many approaches were devoted in efficient program parallelization considering various models for the parallel programs and the target architectures. The most widely used parallelism description model is the task graph model with precedence constraints. Nevertheless, as far as physical mapping of tasks onto parallel architectures is concerned little research has given practical results. It is well known that the physical mapping problem is NP-hard in the strong sense, thus allowing only for heuristic approaches. Most researchers or tool programmers use exhaustive algorithms, or the classical method of simulated annealing. This paper presents an alternative approach onto the mapping problem. Given the graph of clustered tasks, and the graph of the target distributed architecture, our heuristic finds a mapping by first placing the highly communicative tasks on adjacent nodes of the processor network. Once these "backbone" tasks are mapped there is no backtracking, thus achieving low complexity. Therefore, the remaining tasks are placed beginning from those close to the "backbone" tasks. The paper concludes with performance and comparison results which reveal the method's efficiency.
顺序程序并行化中最重要的问题是将计算有效地分配到不同的处理元素中。过去,考虑到并行程序的各种模型和目标体系结构,有太多的方法致力于高效的程序并行化。最广泛使用的并行描述模型是具有优先约束的任务图模型。然而,就任务到并行架构的物理映射而言,很少有研究给出实际结果。众所周知,物理映射问题在强意义上是np困难的,因此只允许启发式方法。大多数研究人员或工具程序员使用穷举算法,或模拟退火的经典方法。本文提出了解决映射问题的另一种方法。给定集群任务图和目标分布式架构图,我们的启发式算法首先将高通信任务放置在处理器网络的相邻节点上,从而找到映射。一旦这些“骨干”任务被映射,就没有回溯,从而实现低复杂性。因此,剩余的任务从那些接近“骨干”任务开始放置。最后给出了性能和对比结果,表明了该方法的有效性。
{"title":"An efficient algorithm for the physical mapping of clustered task graphs onto multiprocessor architectures","authors":"N. Koziris, M. Romesis, P. Tsanakas, G. Papakonstantinou","doi":"10.1109/EMPDP.2000.823437","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823437","url":null,"abstract":"The most important issue in sequential program parallelisation is the efficient assignment of computations into different processing elements. In the past, too many approaches were devoted in efficient program parallelization considering various models for the parallel programs and the target architectures. The most widely used parallelism description model is the task graph model with precedence constraints. Nevertheless, as far as physical mapping of tasks onto parallel architectures is concerned little research has given practical results. It is well known that the physical mapping problem is NP-hard in the strong sense, thus allowing only for heuristic approaches. Most researchers or tool programmers use exhaustive algorithms, or the classical method of simulated annealing. This paper presents an alternative approach onto the mapping problem. Given the graph of clustered tasks, and the graph of the target distributed architecture, our heuristic finds a mapping by first placing the highly communicative tasks on adjacent nodes of the processor network. Once these \"backbone\" tasks are mapped there is no backtracking, thus achieving low complexity. Therefore, the remaining tasks are placed beginning from those close to the \"backbone\" tasks. The paper concludes with performance and comparison results which reveal the method's efficiency.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128212749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
Tailoring a self-distributing architecture to a cluster computer environment 为集群计算机环境定制自分布体系结构
Pub Date : 2000-01-19 DOI: 10.1109/EMPDP.2000.823406
R. Moore, B. Klauer, K. Waldschmidt
This paper analyzes the consequences of existing network structure for the design of a protocol for a radical COMA (Cache Only Memory Architecture). Parallel computing today faces two significant challenges: the difficulty of programming and the need to leverage existing "off-the-shelf" hardware. The difficulty of programming parallel computers can be split into two problems: distributing the data, and distributing the computation. Parallelizing compilers address both problems, but have limited application outside the domain of loop intensive "scientific" code. Conventional COMAs provide an adaptive, self-distributing solution to data distribution, but do not address computation distribution. Our proposal leverages parallelizing compilers, and then extends COMA to provide adaptive self-distribution of both data and computation. The radical COMA protocols can be implemented in hardware, software, or a combination of both. When, however, the implementation is constrained to operate in a cluster computing environment (that is, to use only existing, already installed hardware), the protocols have to be reengineered to accommodate the deficiencies of the hardware. This paper identifies the critical quantities of various existing network structures, and discusses their repercussions for protocol design. A new protocol is presented in detail.
本文分析了现有网络结构对根本的纯缓存存储器结构(COMA)协议设计的影响。当今的并行计算面临两个重大挑战:编程的困难和利用现有“现成”硬件的需要。并行计算机编程的难点可分为数据分布和计算分布两个方面。并行编译器解决了这两个问题,但在循环密集的“科学”代码领域之外的应用有限。传统的coma为数据分布提供了一种自适应的自分布解决方案,但不解决计算分布问题。我们的建议利用并行编译器,然后扩展昏迷来提供数据和计算的自适应分布。激进的COMA协议可以在硬件、软件或两者的组合中实现。但是,当实现被限制在集群计算环境中运行时(即仅使用现有的、已经安装的硬件),必须重新设计协议以适应硬件的缺陷。本文确定了各种现有网络结构的临界数量,并讨论了它们对协议设计的影响。详细介绍了一种新的协议。
{"title":"Tailoring a self-distributing architecture to a cluster computer environment","authors":"R. Moore, B. Klauer, K. Waldschmidt","doi":"10.1109/EMPDP.2000.823406","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823406","url":null,"abstract":"This paper analyzes the consequences of existing network structure for the design of a protocol for a radical COMA (Cache Only Memory Architecture). Parallel computing today faces two significant challenges: the difficulty of programming and the need to leverage existing \"off-the-shelf\" hardware. The difficulty of programming parallel computers can be split into two problems: distributing the data, and distributing the computation. Parallelizing compilers address both problems, but have limited application outside the domain of loop intensive \"scientific\" code. Conventional COMAs provide an adaptive, self-distributing solution to data distribution, but do not address computation distribution. Our proposal leverages parallelizing compilers, and then extends COMA to provide adaptive self-distribution of both data and computation. The radical COMA protocols can be implemented in hardware, software, or a combination of both. When, however, the implementation is constrained to operate in a cluster computing environment (that is, to use only existing, already installed hardware), the protocols have to be reengineered to accommodate the deficiencies of the hardware. This paper identifies the critical quantities of various existing network structures, and discusses their repercussions for protocol design. A new protocol is presented in detail.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123747535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Modelling message-passing programs for static mapping 为静态映射建模消息传递程序
Pub Date : 2000-01-19 DOI: 10.1109/EMPDP.2000.823416
C. Roig, A. Ripoll, M. A. Senar, F. Guirado, E. Luque
An efficient mapping of a parallel program in the processors is vital for achieving a high performance on a parallel computer. When the structure of the parallel program in terms of its task execution times, task dependencies, and amount communication data, is known a priori, mapping can be accomplished statically at compile time. Mapping algorithms start from a parallel application model and map automatically tasks to processors in order to minimise the execution time of the program. In this paper we discuss the current models used in mapping parallel programs: Task Precedence Graph (TPG), Task Interaction Graph (TIG) and we define a new model called Temporal Task Interaction Graph (TTIG). The contribution of the TTIG is that it enhances these two previous models with the ability to explicitly capture the potential degree of parallel execution between adjacent tasks allowing the development of efficient mapping algorithms. Experimentation had been performed in order to show the effectiveness of TTIG model for a set of graphs. The results are compared with the optimal assignment and the obtained using TIG model and they confirm that using the TTIG model, better assignments can be obtained.
处理器中并行程序的有效映射对于实现并行计算机的高性能至关重要。当并行程序的结构(包括任务执行时间、任务依赖关系和通信数据量)先验已知时,可以在编译时静态地完成映射。映射算法从并行应用程序模型开始,并自动将任务映射到处理器,以尽量减少程序的执行时间。本文讨论了当前用于并行程序映射的模型:任务优先图(TPG)和任务交互图(TIG),并定义了一个新的模型——时序任务交互图(TTIG)。TTIG的贡献在于,它增强了前面两个模型,能够显式地捕捉相邻任务之间潜在的并行执行程度,从而开发高效的映射算法。为了证明TTIG模型对一组图的有效性,进行了实验。将结果与最优分配和使用TIG模型得到的结果进行了比较,证实使用TTIG模型可以得到更好的分配。
{"title":"Modelling message-passing programs for static mapping","authors":"C. Roig, A. Ripoll, M. A. Senar, F. Guirado, E. Luque","doi":"10.1109/EMPDP.2000.823416","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823416","url":null,"abstract":"An efficient mapping of a parallel program in the processors is vital for achieving a high performance on a parallel computer. When the structure of the parallel program in terms of its task execution times, task dependencies, and amount communication data, is known a priori, mapping can be accomplished statically at compile time. Mapping algorithms start from a parallel application model and map automatically tasks to processors in order to minimise the execution time of the program. In this paper we discuss the current models used in mapping parallel programs: Task Precedence Graph (TPG), Task Interaction Graph (TIG) and we define a new model called Temporal Task Interaction Graph (TTIG). The contribution of the TTIG is that it enhances these two previous models with the ability to explicitly capture the potential degree of parallel execution between adjacent tasks allowing the development of efficient mapping algorithms. Experimentation had been performed in order to show the effectiveness of TTIG model for a set of graphs. The results are compared with the optimal assignment and the obtained using TIG model and they confirm that using the TTIG model, better assignments can be obtained.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132097948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Using agent wills to provide fault-tolerance in distributed shared memory systems 在分布式共享内存系统中使用代理遗嘱提供容错
Pub Date : 2000-01-19 DOI: 10.1109/EMPDP.2000.823426
A. Rowstron
In this paper we describe how we use mobile objects to provide distributed programs coordinating through a persistent distributed shared memory (DSM) with tolerance to sudden agent failure, and use the increasingly popular Linda-like tuple space languages as an example for implementation of the concept. In programs coordinating and communicating through a DSM a data structure is shared between multiple agents, and the agents update the shared structure directly. However, if an agent should suddenly fail it is often hard for the agents to make the data structures consistent with the new application state. For example consider if a data structure contains a list of active agents. In such a case, transactions can be used when adding and removing agent names from the list ensuring that that the data structure is consistent and does not become corrupted should an agent fail. However If failure of the agent occurs after the name has been added, how does the application ensure the list is correct? We argue that using mobile objects we can provide wills for the agents to effectively enable them to ensure the shared data structure is application consistent, even once they have Sailed We show how we have integrated the use of agent wills into a Linda system and show that we have not increased the complexity, of program writing. The integration is simple and general, does not alter the underlying semantics of the operations performed in the will and the use of mobility is transparent to the programmer.
在本文中,我们描述了如何使用移动对象通过持久的分布式共享内存(DSM)来提供分布式程序协调,并容忍突然的代理故障,并使用日益流行的类似linda的元组空间语言作为实现该概念的示例。在通过DSM进行协调和通信的程序中,数据结构在多个代理之间共享,代理直接更新共享结构。但是,如果代理突然失效,代理通常很难使数据结构与新的应用程序状态保持一致。例如,考虑一个数据结构是否包含一个活动代理列表。在这种情况下,可以在从列表中添加和删除代理名称时使用事务,以确保数据结构是一致的,并且在代理失败时不会损坏。但是,如果在添加名称之后代理发生故障,应用程序如何确保列表是正确的呢?我们认为,使用移动对象,我们可以为代理提供遗嘱,以有效地确保共享数据结构与应用程序一致,即使它们已经航行。我们展示了我们如何将代理遗嘱的使用集成到Linda系统中,并表明我们没有增加程序编写的复杂性。这种集成简单而通用,不会改变在遗嘱中执行的操作的底层语义,而且移动性的使用对程序员来说是透明的。
{"title":"Using agent wills to provide fault-tolerance in distributed shared memory systems","authors":"A. Rowstron","doi":"10.1109/EMPDP.2000.823426","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823426","url":null,"abstract":"In this paper we describe how we use mobile objects to provide distributed programs coordinating through a persistent distributed shared memory (DSM) with tolerance to sudden agent failure, and use the increasingly popular Linda-like tuple space languages as an example for implementation of the concept. In programs coordinating and communicating through a DSM a data structure is shared between multiple agents, and the agents update the shared structure directly. However, if an agent should suddenly fail it is often hard for the agents to make the data structures consistent with the new application state. For example consider if a data structure contains a list of active agents. In such a case, transactions can be used when adding and removing agent names from the list ensuring that that the data structure is consistent and does not become corrupted should an agent fail. However If failure of the agent occurs after the name has been added, how does the application ensure the list is correct? We argue that using mobile objects we can provide wills for the agents to effectively enable them to ensure the shared data structure is application consistent, even once they have Sailed We show how we have integrated the use of agent wills into a Linda system and show that we have not increased the complexity, of program writing. The integration is simple and general, does not alter the underlying semantics of the operations performed in the will and the use of mobility is transparent to the programmer.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134082106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ViMPIOS, a "truly" portable MPI-IO implementation ViMPIOS,一个“真正的”便携式MPI-IO实现
Pub Date : 2000-01-19 DOI: 10.1109/EMPDP.2000.823386
Kurt Stockinger, E. Schikuta
We present ViMPIOS, a novel MPI-IO implementation based on ViPIOS, the Vienna Parallel Input Output System. ViMPIOS inherits the defining characteristics of ViPIOS, which makes it a client-server based system focusing on cluster architectures. ViMPIOS stands out from all other MPI-IO implementations by its "truly" portable design, which allows not only applications to be transferred between parallel architectures easily but also to keep their original performance characteristics on the new platform as far as possible. This is kept by the "smart" AI-blackboard module of ViPIOS, which is responsible for an appropriate data layout. Specifically in this paper we concentrate on the algorithm, which maps MPI-IO data structures on respective ViPIOS structures, and thus allows to exploit the ViPIOS properties.
我们提出了一种基于ViPIOS(维也纳并行输入输出系统)的新型MPI-IO实现。ViMPIOS继承了ViPIOS的定义特征,这使它成为一个基于客户机-服务器的系统,专注于集群体系结构。ViMPIOS以其“真正的”可移植设计从所有其他MPI-IO实现中脱颖而出,这不仅允许应用程序在并行体系结构之间轻松传输,而且还允许在新平台上尽可能保持其原有的性能特征。这是由ViPIOS的“智能”AI-blackboard模块保持的,该模块负责适当的数据布局。特别是在本文中,我们专注于算法,该算法将MPI-IO数据结构映射到各自的ViPIOS结构上,从而允许利用ViPIOS属性。
{"title":"ViMPIOS, a \"truly\" portable MPI-IO implementation","authors":"Kurt Stockinger, E. Schikuta","doi":"10.1109/EMPDP.2000.823386","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823386","url":null,"abstract":"We present ViMPIOS, a novel MPI-IO implementation based on ViPIOS, the Vienna Parallel Input Output System. ViMPIOS inherits the defining characteristics of ViPIOS, which makes it a client-server based system focusing on cluster architectures. ViMPIOS stands out from all other MPI-IO implementations by its \"truly\" portable design, which allows not only applications to be transferred between parallel architectures easily but also to keep their original performance characteristics on the new platform as far as possible. This is kept by the \"smart\" AI-blackboard module of ViPIOS, which is responsible for an appropriate data layout. Specifically in this paper we concentrate on the algorithm, which maps MPI-IO data structures on respective ViPIOS structures, and thus allows to exploit the ViPIOS properties.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117197697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Consistency requirements of distributed shared memory for Lamport's bakery algorithm for mutual exclusion 分布式共享内存对Lamport面包房互斥算法的一致性要求
Pub Date : 2000-01-19 DOI: 10.1109/EMPDP.2000.823415
J. Brzeziński, D. Wawrzyniak
As is well known Lamport's Bakery algorithm for mutual exclusion of n processes is correct if a physically shared memory is used as the communication facility between processes. An application of weaker consistency models (e.g. causal, processor, PRAM), available in replicated distributed shared memory (DSM) systems appealing due to possible performance improvement may imply incorrectness of the algorithm. It raises consistency requirement problem, a problem of finding weaker consistency models of DSM that is sufficient for the algorithm correctness. In this paper, consistency requirements of distributed shared memory for Lamport's Bakery algorithm for mutual exclusion of n processes are considered It is proven that the algorithm is correct with a consistency model resulting from a combination of sequential consistency and one of the weakest consistency models, PRAM, without explicit synchronisation. The combination is achieved by specifying the consistency model with write operations on shared locations.
众所周知,如果使用物理共享内存作为进程之间的通信设施,那么Lamport的面包房算法对于n个进程的互斥是正确的。在复制分布式共享内存(DSM)系统中可用的较弱一致性模型(例如因果、处理器、PRAM)的应用,由于可能的性能改进而具有吸引力,可能意味着算法不正确。它提出了一致性要求问题,即寻找足以保证算法正确性的弱一致性模型的问题。本文考虑了分布式共享内存中n进程互斥的Lamport的Bakery算法的一致性要求,并证明了该算法是正确的,该一致性模型是由顺序一致性和最弱的一致性模型之一PRAM组合而成的,没有显式同步。这种组合是通过在共享位置上指定写操作的一致性模型来实现的。
{"title":"Consistency requirements of distributed shared memory for Lamport's bakery algorithm for mutual exclusion","authors":"J. Brzeziński, D. Wawrzyniak","doi":"10.1109/EMPDP.2000.823415","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823415","url":null,"abstract":"As is well known Lamport's Bakery algorithm for mutual exclusion of n processes is correct if a physically shared memory is used as the communication facility between processes. An application of weaker consistency models (e.g. causal, processor, PRAM), available in replicated distributed shared memory (DSM) systems appealing due to possible performance improvement may imply incorrectness of the algorithm. It raises consistency requirement problem, a problem of finding weaker consistency models of DSM that is sufficient for the algorithm correctness. In this paper, consistency requirements of distributed shared memory for Lamport's Bakery algorithm for mutual exclusion of n processes are considered It is proven that the algorithm is correct with a consistency model resulting from a combination of sequential consistency and one of the weakest consistency models, PRAM, without explicit synchronisation. The combination is achieved by specifying the consistency model with write operations on shared locations.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121345169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Heterogeneous client-server architecture for a virtual meeting environment 用于虚拟会议环境的异构客户机-服务器架构
Pub Date : 2000-01-19 DOI: 10.1109/EMPDP.2000.823396
M. Masoodian, S. Luz
Magic Lounge is a shared virtual meeting environment which has been designed to support meetings between physically remote people who would like to interact with each other using any one of a number of heterogeneous communication devices. This paper describes the heterogeneous client-server architecture of the Magic Lounge which supports communication between PCs, PDAs, palmtops, and mobile telephones. This architecture combines a number of different technologies, including CORBA and MBone, to provide the necessary means of audio and textual communication between the users of different devices. This paper also discusses the various requirements of this type of meeting environment, as well as describing some of the Magic Lounge software tools and components which have been developed to provide intelligent communication services to its users.
Magic Lounge是一个共享的虚拟会议环境,旨在支持物理上远程的人之间的会议,这些人希望使用许多异构通信设备中的任何一种进行交互。本文描述了Magic Lounge的异构客户机-服务器架构,该架构支持pc、pda、掌上电脑和移动电话之间的通信。该体系结构结合了许多不同的技术,包括CORBA和MBone,以提供不同设备用户之间音频和文本通信的必要手段。本文还讨论了这种类型的会议环境的各种需求,并描述了Magic Lounge的一些软件工具和组件,这些工具和组件已经开发出来,可以为其用户提供智能通信服务。
{"title":"Heterogeneous client-server architecture for a virtual meeting environment","authors":"M. Masoodian, S. Luz","doi":"10.1109/EMPDP.2000.823396","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823396","url":null,"abstract":"Magic Lounge is a shared virtual meeting environment which has been designed to support meetings between physically remote people who would like to interact with each other using any one of a number of heterogeneous communication devices. This paper describes the heterogeneous client-server architecture of the Magic Lounge which supports communication between PCs, PDAs, palmtops, and mobile telephones. This architecture combines a number of different technologies, including CORBA and MBone, to provide the necessary means of audio and textual communication between the users of different devices. This paper also discusses the various requirements of this type of meeting environment, as well as describing some of the Magic Lounge software tools and components which have been developed to provide intelligent communication services to its users.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122633110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A robust multigrid solver on parallel computers 并行计算机上的鲁棒多网格求解器
Pub Date : 2000-01-19 DOI: 10.1109/EMPDP.2000.823393
R. Montero, M. Prieto, I. Llorente, F. Tirado
In this paper two well-known robust multigrid solvers for anisotropic operators on structured grids are compared: alternating-plane smoothers with full coarsening and plane smoothers combined with semicoarsening. The study takes into account not only numerical properties but also architectural ones, focusing on cache memory exploitation and parallel characteristics. Experimental results for the sequential algorithms have been obtained on two different systems based on the MIPS R10000 processor but with different L2 cache sizes (an SGI O2 workstation and an SGI Origin 2000 system). Two different parallel implementations for the latter robust approach have been considered. The first one has optimal parallel characteristics but due to deterioration of the convergence properties its realistic efficiency is not satisfactory. In the second one, some processors remain idle during a short period of time on every multigrid cycle, however the algorithm is more efficient since it preserves the numerical properties of the sequential version. Parallel experiments have also been taken on a Cray T3E system.
本文比较了结构网格上各向异性算子的两种著名的鲁棒多网格求解方法:完全粗化的交替平面光滑法和半粗化结合的平面光滑法。该研究不仅考虑了数值特性,而且考虑了架构特性,重点关注了缓存内存的开发和并行特性。在基于MIPS R10000处理器但L2缓存大小不同的两种不同系统(SGI O2工作站和SGI Origin 2000系统)上获得了顺序算法的实验结果。本文考虑了后一种健壮方法的两种不同的并行实现。第一种方法具有最优的并行特性,但由于收敛性的退化,其实际效率不理想。在第二种算法中,在每个多网格周期中,一些处理器在短时间内保持空闲状态,但是该算法由于保留了顺序版本的数值特性而更有效。在克雷T3E系统上也进行了平行实验。
{"title":"A robust multigrid solver on parallel computers","authors":"R. Montero, M. Prieto, I. Llorente, F. Tirado","doi":"10.1109/EMPDP.2000.823393","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823393","url":null,"abstract":"In this paper two well-known robust multigrid solvers for anisotropic operators on structured grids are compared: alternating-plane smoothers with full coarsening and plane smoothers combined with semicoarsening. The study takes into account not only numerical properties but also architectural ones, focusing on cache memory exploitation and parallel characteristics. Experimental results for the sequential algorithms have been obtained on two different systems based on the MIPS R10000 processor but with different L2 cache sizes (an SGI O2 workstation and an SGI Origin 2000 system). Two different parallel implementations for the latter robust approach have been considered. The first one has optimal parallel characteristics but due to deterioration of the convergence properties its realistic efficiency is not satisfactory. In the second one, some processors remain idle during a short period of time on every multigrid cycle, however the algorithm is more efficient since it preserves the numerical properties of the sequential version. Parallel experiments have also been taken on a Cray T3E system.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129405380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Predictability of bulk synchronous programs using MPI 使用MPI的批量同步程序的可预测性
Pub Date : 2000-01-19 DOI: 10.1109/EMPDP.2000.823402
A. Zavanella, Alessandro Milazzo
The BSP cost model provides a general framework to design efficient and portable data-parallel algorithms. Execution costs of BSP programs are predicted combining a limited number of program and machine dependent parameters. BSP programs can be written using several programming tools. In this work we explore the predictability of bulk synchronous programs implemented with the Message Passing Interface. Two classic computational geometry problems: the convex hull (CH) and the lower envelope (LE) are considered as cases of study. Efficient BSP algorithms have been implemented using MPI and executed on three different parallel architectures: a Fujitsu AP1000 (distributed memory), a CRAY T3E (distributed shared memory) and a cluster of PCs (Backus). The paper compares the degree of predictability on these architectures, analysing the main sources of error.
BSP代价模型为设计高效、便携的数据并行算法提供了一个通用框架。结合有限数量的程序和机器相关参数来预测BSP程序的执行成本。BSP程序可以使用多种编程工具编写。在这项工作中,我们探讨了用消息传递接口实现的批量同步程序的可预测性。两个经典的计算几何问题:凸壳(CH)和下包络(LE)被视为研究的案例。高效的BSP算法已经使用MPI实现,并在三种不同的并行架构上执行:富士通AP1000(分布式内存),CRAY T3E(分布式共享内存)和pc集群(Backus)。本文比较了这些体系结构的可预测性程度,分析了误差的主要来源。
{"title":"Predictability of bulk synchronous programs using MPI","authors":"A. Zavanella, Alessandro Milazzo","doi":"10.1109/EMPDP.2000.823402","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823402","url":null,"abstract":"The BSP cost model provides a general framework to design efficient and portable data-parallel algorithms. Execution costs of BSP programs are predicted combining a limited number of program and machine dependent parameters. BSP programs can be written using several programming tools. In this work we explore the predictability of bulk synchronous programs implemented with the Message Passing Interface. Two classic computational geometry problems: the convex hull (CH) and the lower envelope (LE) are considered as cases of study. Efficient BSP algorithms have been implemented using MPI and executed on three different parallel architectures: a Fujitsu AP1000 (distributed memory), a CRAY T3E (distributed shared memory) and a cluster of PCs (Backus). The paper compares the degree of predictability on these architectures, analysing the main sources of error.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121558055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1