首页 > 最新文献

Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)最新文献

英文 中文
Design patterns for parallel computing using a network of processors 使用处理器网络进行并行计算的设计模式
S. Siu, Ajit Singh
High complexity of building parallel applications is often cited as one of the major impediments to the mainstream adoption of parallel computing. To deal with the complexity of software development, abstractions such as macros, functions, abstract data types, and objects are commonly employed by sequential as well as parallel programming models. This paper describes the concept of a design pattern for the development of parallel applications. A design pattern in our case describes a recurring parallel programming problem and a reusable solution to that problem. A design pattern is implemented as a reusable code skeleton for quick and reliable development of parallel applications. A parallel programming system, called DPnDP (Design Patterns and Distributed Processes), that employs such design patterns is described. In the past, parallel programming systems have allowed fast prototyping of parallel applications based on commonly occurring communication and synchronization structures. The uniqueness of our approach is in the use of a standard structure and interface for a design pattern. This has several important implications: first, design patterns can be defined and added to the system's library in an incremental manner without requiring any major modification of the system (extensibility). Second, customization of a parallel application is possible by mixing design patterns with low level parallel code resulting in a flexible and efficient parallel programming tool (flexibility). Also, a parallel design pattern can be parameterized to provide some variations in terms of structure and behavior.
构建并行应用程序的高复杂性经常被认为是主流采用并行计算的主要障碍之一。为了处理软件开发的复杂性,诸如宏、函数、抽象数据类型和对象之类的抽象通常被顺序编程模型和并行编程模型所采用。本文描述了并行应用程序开发的设计模式的概念。在我们的示例中,设计模式描述了一个反复出现的并行编程问题和该问题的可重用解决方案。设计模式被实现为可重用的代码框架,用于快速可靠地开发并行应用程序。本文描述了一种采用这种设计模式的并行编程系统,称为DPnDP(设计模式和分布式过程)。在过去,并行编程系统允许基于常见的通信和同步结构的并行应用程序的快速原型。我们方法的独特之处在于为设计模式使用标准结构和接口。这有几个重要的含义:首先,可以以增量的方式定义设计模式并将其添加到系统库中,而无需对系统进行任何重大修改(可扩展性)。其次,通过将设计模式与低级并行代码混合,可以定制并行应用程序,从而产生灵活高效的并行编程工具(灵活性)。此外,可以对并行设计模式进行参数化,以提供结构和行为方面的一些变化。
{"title":"Design patterns for parallel computing using a network of processors","authors":"S. Siu, Ajit Singh","doi":"10.1109/HPDC.1997.626434","DOIUrl":"https://doi.org/10.1109/HPDC.1997.626434","url":null,"abstract":"High complexity of building parallel applications is often cited as one of the major impediments to the mainstream adoption of parallel computing. To deal with the complexity of software development, abstractions such as macros, functions, abstract data types, and objects are commonly employed by sequential as well as parallel programming models. This paper describes the concept of a design pattern for the development of parallel applications. A design pattern in our case describes a recurring parallel programming problem and a reusable solution to that problem. A design pattern is implemented as a reusable code skeleton for quick and reliable development of parallel applications. A parallel programming system, called DPnDP (Design Patterns and Distributed Processes), that employs such design patterns is described. In the past, parallel programming systems have allowed fast prototyping of parallel applications based on commonly occurring communication and synchronization structures. The uniqueness of our approach is in the use of a standard structure and interface for a design pattern. This has several important implications: first, design patterns can be defined and added to the system's library in an incremental manner without requiring any major modification of the system (extensibility). Second, customization of a parallel application is possible by mixing design patterns with low level parallel code resulting in a flexible and efficient parallel programming tool (flexibility). Also, a parallel design pattern can be parameterized to provide some variations in terms of structure and behavior.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"255 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114057751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Channel allocation methods for data dissemination in mobile computing environments 移动计算环境中数据分发的信道分配方法
Wang-Chien Lee, Qinglong Hu, Lee
We discuss several channel allocation methods for data dissemination in mobile computing systems. We suggest that the broadcast and on-demand channels have different access performance under different system parameters and that a mobile cell should use a combination of both to obtain optimal access time for a given workload and system parameters. We study the data access efficiency of three channel configurations: all channels are used as on-demand channels (exclusive on-demand); all channels are used for broadcast (exclusive broadcast); and some channels are on-demand channels and some are broadcast channels (hybrid). Simulations on obtaining the optimal channel allocation for lightly-loaded, medium-loaded, and heavy-loaded conditions is conducted and the result shows that an optimal channel allocation significantly improves the system performance.
讨论了移动计算系统中用于数据分发的几种信道分配方法。我们建议,广播和点播频道在不同的系统参数下具有不同的访问性能,并且移动小区应该使用两者的组合来获得给定工作负载和系统参数下的最佳访问时间。我们研究了三种通道配置的数据访问效率:所有通道都用作按需通道(专属按需);所有频道都用于广播(独家广播);有些频道是点播频道,有些是广播频道(混合)。对轻负载、中负载和重载条件下的最优信道分配进行了仿真,结果表明,最优信道分配能显著提高系统性能。
{"title":"Channel allocation methods for data dissemination in mobile computing environments","authors":"Wang-Chien Lee, Qinglong Hu, Lee","doi":"10.1109/HPDC.1997.626430","DOIUrl":"https://doi.org/10.1109/HPDC.1997.626430","url":null,"abstract":"We discuss several channel allocation methods for data dissemination in mobile computing systems. We suggest that the broadcast and on-demand channels have different access performance under different system parameters and that a mobile cell should use a combination of both to obtain optimal access time for a given workload and system parameters. We study the data access efficiency of three channel configurations: all channels are used as on-demand channels (exclusive on-demand); all channels are used for broadcast (exclusive broadcast); and some channels are on-demand channels and some are broadcast channels (hybrid). Simulations on obtaining the optimal channel allocation for lightly-loaded, medium-loaded, and heavy-loaded conditions is conducted and the result shows that an optimal channel allocation significantly improves the system performance.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124210066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Flexible general purpose communication primitives for distributed systems 用于分布式系统的灵活通用通信原语
R. Baldoni, R. Beraldi, R. Prakash
This paper presents the slotted-FIFO communication mode that supports communication primitives for the entire spectrum of reliability and ordering requirements of distributed applications: FIFO as well as non-FIFO, and reliable as well as unreliable communication. Hence, the slotted-FIFO communication mode is suitable for multimedia applications, as well as non real-time distributed applications. As FIFO ordering is not required for all messages, message buffering requirements are considerably reduced. Also, message latencies are lower. We quantify such advantages by means of a simulation study. A low overhead protocol implementing slotted-FIFO communication is also presented. The protocol incurs a small resequencing cost.
本文提出了槽式FIFO通信模式,该模式支持分布式应用程序的整个可靠性和排序要求的通信原语:FIFO和非FIFO,可靠和不可靠的通信。因此,开槽fifo通信模式既适用于多媒体应用,也适用于非实时分布式应用。由于并非所有消息都需要FIFO排序,因此消息缓冲需求大大减少。此外,消息延迟也更低。我们通过模拟研究来量化这些优势。提出了一种低开销的协议,实现了分槽fifo通信。该方案产生一个小的重测序成本。
{"title":"Flexible general purpose communication primitives for distributed systems","authors":"R. Baldoni, R. Beraldi, R. Prakash","doi":"10.1109/HPDC.1997.626404","DOIUrl":"https://doi.org/10.1109/HPDC.1997.626404","url":null,"abstract":"This paper presents the slotted-FIFO communication mode that supports communication primitives for the entire spectrum of reliability and ordering requirements of distributed applications: FIFO as well as non-FIFO, and reliable as well as unreliable communication. Hence, the slotted-FIFO communication mode is suitable for multimedia applications, as well as non real-time distributed applications. As FIFO ordering is not required for all messages, message buffering requirements are considerably reduced. Also, message latencies are lower. We quantify such advantages by means of a simulation study. A low overhead protocol implementing slotted-FIFO communication is also presented. The protocol incurs a small resequencing cost.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124730659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Run-time support for scheduling parallel applications in heterogeneous NOWs 在异构now中调度并行应用程序的运行时支持
J. Weissman, Xin Zhao
This paper describes the current state of Prophet-a system that provides run-time scheduling support for parallel applications in heterogeneous workstation networks. Prior work on Prophet demonstrated that scheduling SPMD applications could be effectively automated with excellent performance. Enhancements have been made to Prophet to broaden its use to other application types including parallel pipelines, and to make more effective use of dynamic system state information to further improve performance. The results indicate that both SPMD and parallel pipeline applications can be scheduled to produce reduced completion time by exploiting the application structure and run-time information.
本文描述了一个为异构工作站网络中的并行应用程序提供运行时调度支持的系统prophet的现状。先前在Prophet上的工作表明,调度SPMD应用程序可以有效地自动化并具有优异的性能。对Prophet进行了增强,将其应用范围扩大到其他应用类型,包括并行管道,并更有效地利用动态系统状态信息,以进一步提高性能。结果表明,SPMD和并行管道应用程序都可以通过利用应用程序结构和运行时信息来调度以减少完成时间。
{"title":"Run-time support for scheduling parallel applications in heterogeneous NOWs","authors":"J. Weissman, Xin Zhao","doi":"10.1109/HPDC.1997.626442","DOIUrl":"https://doi.org/10.1109/HPDC.1997.626442","url":null,"abstract":"This paper describes the current state of Prophet-a system that provides run-time scheduling support for parallel applications in heterogeneous workstation networks. Prior work on Prophet demonstrated that scheduling SPMD applications could be effectively automated with excellent performance. Enhancements have been made to Prophet to broaden its use to other application types including parallel pipelines, and to make more effective use of dynamic system state information to further improve performance. The results indicate that both SPMD and parallel pipeline applications can be scheduled to produce reduced completion time by exploiting the application structure and run-time information.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122526411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Speed up your database client with adaptable multithreaded prefetching 使用可适应的多线程预取来加速数据库客户端
Nils Knafla
In many client/server object database applications, performance is limited by the delay in transferring pages from the server to the client. We present a prefetching technique that can avoid this delay, especially where there are several database servers. Part of the novelty of this approach lies in the way that multithreading on the client workstation is exploited, in particular for activities such as prefetching and flushing dirty pages to the server. Using our own complex object benchmark we analyze the performance of the prefetching technique with multiple clients, multiple servers and different buffer pool sizes.
在许多客户机/服务器对象数据库应用程序中,性能受到将页面从服务器传输到客户机的延迟的限制。我们提出了一种预取技术,可以避免这种延迟,特别是在有多个数据库服务器的情况下。这种方法的新颖之处在于,它利用了客户机工作站上的多线程,特别是在预取脏页和将脏页刷新到服务器等活动中。使用我们自己的复杂对象基准测试,我们分析了在多个客户端、多个服务器和不同缓冲池大小下预取技术的性能。
{"title":"Speed up your database client with adaptable multithreaded prefetching","authors":"Nils Knafla","doi":"10.1109/HPDC.1997.622367","DOIUrl":"https://doi.org/10.1109/HPDC.1997.622367","url":null,"abstract":"In many client/server object database applications, performance is limited by the delay in transferring pages from the server to the client. We present a prefetching technique that can avoid this delay, especially where there are several database servers. Part of the novelty of this approach lies in the way that multithreading on the client workstation is exploited, in particular for activities such as prefetching and flushing dirty pages to the server. Using our own complex object benchmark we analyze the performance of the prefetching technique with multiple clients, multiple servers and different buffer pool sizes.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122615787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Cut-through delivery in Trapeze: An exercise in low-latency messaging 梯形中的直通传递:低延迟消息传递的练习
K. Yocum, J. Chase, Andrew J. Gallatin, A. Lebeck
New network technology continues to improve both the latency and bandwidth of communication in computer clusters. The fastest high-speed networks approach or exceed the I/O bus bandwidths of "gigabit-ready" hosts. These advances introduce new considerations for the design of network interfaces and messaging systems for low-latency communication. This paper investigates cut-through delivery, a technique for overlapping host I/O DMA transfers with network traversal. Cut-through delivery significantly reduces end-to-end latency of large messages, which are often critical for application performance. We have implemented cut-through delivery in Trapeze, a new messaging substrate for network memory and other distributed operating system services. Our current Trapeze prototype is capable of demand-fetching 8 K virtual memory pages in 200 /spl mu/s across a Myrinet cluster of DEC AlphaStations.
新的网络技术不断改善计算机集群中的通信延迟和带宽。最快的高速网络接近或超过“千兆位就绪”主机的I/O总线带宽。这些进步为设计用于低延迟通信的网络接口和消息传递系统引入了新的考虑因素。本文研究了一种通过网络遍历来重叠主机I/O DMA传输的技术。直通传递显著降低了大型消息的端到端延迟,这通常对应用程序性能至关重要。我们已经在Trapeze中实现了直通交付,这是一种用于网络内存和其他分布式操作系统服务的新的消息传递基础。我们目前的trapapeze原型能够在myinet DEC AlphaStations集群上以200 /spl mu/s的速度获取8 K虚拟内存页面。
{"title":"Cut-through delivery in Trapeze: An exercise in low-latency messaging","authors":"K. Yocum, J. Chase, Andrew J. Gallatin, A. Lebeck","doi":"10.1109/HPDC.1997.626425","DOIUrl":"https://doi.org/10.1109/HPDC.1997.626425","url":null,"abstract":"New network technology continues to improve both the latency and bandwidth of communication in computer clusters. The fastest high-speed networks approach or exceed the I/O bus bandwidths of \"gigabit-ready\" hosts. These advances introduce new considerations for the design of network interfaces and messaging systems for low-latency communication. This paper investigates cut-through delivery, a technique for overlapping host I/O DMA transfers with network traversal. Cut-through delivery significantly reduces end-to-end latency of large messages, which are often critical for application performance. We have implemented cut-through delivery in Trapeze, a new messaging substrate for network memory and other distributed operating system services. Our current Trapeze prototype is capable of demand-fetching 8 K virtual memory pages in 200 /spl mu/s across a Myrinet cluster of DEC AlphaStations.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115894133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Design issues in building Web-based parallel programming environments 构建基于web的并行编程环境中的设计问题
K. Dinçer, Geoffrey Fox
We exploited the recent advances in Internet connectivity and Web technologies for building Web-based parallel programming environments (WPPEs) that facilitate the development and execution of parallel programs on remote high-performance computers. A Web browser running on the user's machine provides a user-friendly interface to server-site user accounts and allows the use of parallel computing platforms and software in a convenient manner. The user may create, edit, and execute files through this Web browser interface. This new Web-based client-server architecture has the potential of being used as a future front-end to high-performance computer systems. We discuss the design and implementation of several prototype WPPEs that are currently in use at the Northeast Parallel Architectures Center and the Cornell Theory Center. These initial prototypes support high-level parallel programming with Fortran 90 and High Performance Fortran (HPF), as well as explicit low-level programming with Message Passing Interface (MPI). We detail the lessons learned during the development process and outline the tradeoffs of various design choices in the realization of the design. We especially concentrate on providing server-site user accounts, mechanisms to access those accounts through the Web, and the Web-related system security issues.
我们利用Internet连接和Web技术的最新进展来构建基于Web的并行编程环境(wppe),这些环境促进了远程高性能计算机上并行程序的开发和执行。在用户机器上运行的Web浏览器为服务器站点用户帐户提供了用户友好的界面,并允许以方便的方式使用并行计算平台和软件。用户可以通过这个Web浏览器界面创建、编辑和执行文件。这种新的基于web的客户机-服务器体系结构有可能被用作高性能计算机系统的未来前端。我们讨论了目前在东北并行体系结构中心和康奈尔理论中心使用的几个原型wppe的设计和实现。这些初始原型支持使用Fortran 90和高性能Fortran (HPF)进行高级并行编程,以及使用消息传递接口(MPI)进行显式低级编程。我们详细介绍了在开发过程中获得的经验教训,并概述了在设计实现过程中各种设计选择的权衡。我们特别关注提供服务器站点用户帐户、通过Web访问这些帐户的机制以及与Web相关的系统安全问题。
{"title":"Design issues in building Web-based parallel programming environments","authors":"K. Dinçer, Geoffrey Fox","doi":"10.1109/HPDC.1997.626432","DOIUrl":"https://doi.org/10.1109/HPDC.1997.626432","url":null,"abstract":"We exploited the recent advances in Internet connectivity and Web technologies for building Web-based parallel programming environments (WPPEs) that facilitate the development and execution of parallel programs on remote high-performance computers. A Web browser running on the user's machine provides a user-friendly interface to server-site user accounts and allows the use of parallel computing platforms and software in a convenient manner. The user may create, edit, and execute files through this Web browser interface. This new Web-based client-server architecture has the potential of being used as a future front-end to high-performance computer systems. We discuss the design and implementation of several prototype WPPEs that are currently in use at the Northeast Parallel Architectures Center and the Cornell Theory Center. These initial prototypes support high-level parallel programming with Fortran 90 and High Performance Fortran (HPF), as well as explicit low-level programming with Message Passing Interface (MPI). We detail the lessons learned during the development process and outline the tradeoffs of various design choices in the realization of the design. We especially concentrate on providing server-site user accounts, mechanisms to access those accounts through the Web, and the Web-related system security issues.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116354622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Performance aspects of switched SCI systems 切换SCI系统的性能方面
M. Liebhart
The Scalable Coherent Interface (SCI) defines a high-speed interconnect that provides a coherent distributed shared memory system. With the use of switches separate rings can be connected to form large topology-independent configurations. It has been realized that congestion in SCI systems generates additional retry traffic which reduces the available communication bandwidth. This paper investigates additional flow control mechanisms for overloaded switches. They are based on a supplementary retry delay and show a significant throughput gain. Furthermore two different management schemes for the output buffers are investigated. Computer simulations are used to compare the models and to determine system parameters.
可扩展的相干接口(SCI)定义了一种高速互连,提供了一个相干的分布式共享内存系统。通过使用交换机,可以将不同的环连接起来,形成与拓扑无关的大型配置。人们已经意识到,SCI系统中的拥塞会产生额外的重试流量,从而减少了可用的通信带宽。本文研究了过载开关的附加流量控制机制。它们基于补充的重试延迟,并显示出显著的吞吐量增益。此外,还研究了两种不同的输出缓冲区管理方案。计算机模拟用于比较模型和确定系统参数。
{"title":"Performance aspects of switched SCI systems","authors":"M. Liebhart","doi":"10.1109/HPDC.1997.626408","DOIUrl":"https://doi.org/10.1109/HPDC.1997.626408","url":null,"abstract":"The Scalable Coherent Interface (SCI) defines a high-speed interconnect that provides a coherent distributed shared memory system. With the use of switches separate rings can be connected to form large topology-independent configurations. It has been realized that congestion in SCI systems generates additional retry traffic which reduces the available communication bandwidth. This paper investigates additional flow control mechanisms for overloaded switches. They are based on a supplementary retry delay and show a significant throughput gain. Furthermore two different management schemes for the output buffers are investigated. Computer simulations are used to compare the models and to determine system parameters.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114684088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Replaying distributed programs without message logging 在没有消息记录的情况下重播分布式程序
Robert H. B. Netzer, Yikang Xu
Debugging long program runs can be difficult because of the delays required to repeatedly re-run the execution. Even a moderately long run of five minutes can incur aggravating delays. To address this problem, techniques exist that allow re-executing a distributed program from intermediate points by using combinations of checkpointing and message logging. In this paper we explore another idea: how to support replay without logging the contents of any message. When no messages are logged, the set of global states from which replay is possible is constrained, and it has been unknown how to compute this set without exhaustively searching the space of all global states, whose size is exponential in the number of processes. We present a simple and efficient hybrid on-the-fly/post-mortem algorithm for detecting the necessary and sufficient conditions under which parts of the execution can be replayed without message logs. A small amount of trace (two vectors) is recorded at each checkpoint and a fast post-mortem algorithm computes global states from which replay can begin. This algorithm is independent of the checkpointing technique used.
由于反复重新运行执行所需的延迟,调试长时间的程序运行可能很困难。即使是中等长度的5分钟也会导致严重的延误。为了解决这个问题,现有的技术允许使用检查点和消息日志的组合从中间点重新执行分布式程序。在本文中,我们探索了另一个想法:如何在不记录任何消息内容的情况下支持重播。当没有记录任何消息时,可能重播的全局状态集受到约束,并且不知道如何在不彻底搜索所有全局状态空间(其大小与进程数量呈指数关系)的情况下计算该集合。我们提出了一种简单有效的实时/事后混合算法,用于检测必要和充分的条件,在这些条件下,可以在没有消息日志的情况下重播部分执行。在每个检查点记录少量的跟踪(两个向量),快速的事后分析算法计算重播可以开始的全局状态。该算法独立于所使用的检查点技术。
{"title":"Replaying distributed programs without message logging","authors":"Robert H. B. Netzer, Yikang Xu","doi":"10.1109/HPDC.1997.622370","DOIUrl":"https://doi.org/10.1109/HPDC.1997.622370","url":null,"abstract":"Debugging long program runs can be difficult because of the delays required to repeatedly re-run the execution. Even a moderately long run of five minutes can incur aggravating delays. To address this problem, techniques exist that allow re-executing a distributed program from intermediate points by using combinations of checkpointing and message logging. In this paper we explore another idea: how to support replay without logging the contents of any message. When no messages are logged, the set of global states from which replay is possible is constrained, and it has been unknown how to compute this set without exhaustively searching the space of all global states, whose size is exponential in the number of processes. We present a simple and efficient hybrid on-the-fly/post-mortem algorithm for detecting the necessary and sufficient conditions under which parts of the execution can be replayed without message logs. A small amount of trace (two vectors) is recorded at each checkpoint and a fast post-mortem algorithm computes global states from which replay can begin. This algorithm is independent of the checkpointing technique used.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"57 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128635665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
The software architecture of a virtual distributed computing environment 虚拟分布式计算环境的软件体系结构
H. Topcuoglu, S. Hariri, W. Furmanski, J. Valente, Ilkyeun Ra, Dongmin Kim, Yoonhee Kim, Xue Bing, Baoqing Ye
The requirements of grand challenge problems and the deployment of gigabit networks makes the network computing framework an attractive and cost effective computing environment with which to interconnect geographically distributed processing and storage resources. Our project, Virtual Distributed Computing Environment (VDCE), provides a problem-solving environment for high-performance distributed computing over wide area networks. VDCE delivers well-defined library functions that relieve end-users of tedious task implementations and also support reusability. In this paper we present the conceptual design of VDCE software architecture, which is defined in three modules: (a) the Application Editor, a user-friendly application development environment that generates the Application Flow Graph (AFG) of an application; (b) the Application Scheduler, which provides an efficient task-to-resource mapping of AFG; and (c) the VDCE Runtime System, which is responsible for running and managing application execution and monitoring the VDCE resources.
大挑战问题的需求和千兆网络的部署使得网络计算框架成为一个具有吸引力和成本效益的计算环境,用于互连地理上分布的处理和存储资源。我们的项目,虚拟分布式计算环境(VDCE),为广域网上的高性能分布式计算提供了一个解决问题的环境。VDCE提供了定义良好的库函数,这些函数将最终用户从繁琐的任务实现中解脱出来,并且还支持可重用性。在本文中,我们提出了VDCE软件体系结构的概念设计,它被定义为三个模块:(a)应用程序编辑器,一个用户友好的应用程序开发环境,生成应用程序的应用程序流程图(AFG);(b)应用程序调度程序,它提供了AFG的有效的任务到资源映射;以及(c) VDCE运行时系统,该系统负责运行和管理应用程序的执行,并监控VDCE资源。
{"title":"The software architecture of a virtual distributed computing environment","authors":"H. Topcuoglu, S. Hariri, W. Furmanski, J. Valente, Ilkyeun Ra, Dongmin Kim, Yoonhee Kim, Xue Bing, Baoqing Ye","doi":"10.1109/HPDC.1997.622361","DOIUrl":"https://doi.org/10.1109/HPDC.1997.622361","url":null,"abstract":"The requirements of grand challenge problems and the deployment of gigabit networks makes the network computing framework an attractive and cost effective computing environment with which to interconnect geographically distributed processing and storage resources. Our project, Virtual Distributed Computing Environment (VDCE), provides a problem-solving environment for high-performance distributed computing over wide area networks. VDCE delivers well-defined library functions that relieve end-users of tedious task implementations and also support reusability. In this paper we present the conceptual design of VDCE software architecture, which is defined in three modules: (a) the Application Editor, a user-friendly application development environment that generates the Application Flow Graph (AFG) of an application; (b) the Application Scheduler, which provides an efficient task-to-resource mapping of AFG; and (c) the VDCE Runtime System, which is responsible for running and managing application execution and monitoring the VDCE resources.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125377691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
期刊
Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1