首页 > 最新文献

Proceedings International Conference on Parallel Processing最新文献

英文 中文
A best-effort communication protocol for real-time broadcast networks 实时广播网络的最佳通信协议
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040909
Lakshmish Ramaswamy, B. Ravindran
In this paper, we present a best-effort communication protocol, called ABA, that seeks to maximize aggregate application benefit and deadline-satisfied ratio of asynchronous real-time distributed systems that use CSMA/DDCR broadcast networks. ABA considers an application model where end-to-end timeliness requirements of trans-node application tasks are expressed using Jensen's benefit functions. Furthermore, the protocol assumes that the application is designed using CSMA/DDCR feasibility conditions that is driven by a "best" possible estimate of upper bounds on message arrival densities that is possible at design-time. When such design-time postulations get violated at run-time, ABA directs message traffic so that messages that will increase applications' aggregate benefit are only transmitted, buffering others, until such time when the workloads respect their design-time postulated values. To study the performance of ABA, we consider a previously studied algorithm called RBA* as a baseline algorithm. Our experimental results indicate that ABA yields higher aggregate benefit and higher deadline-satisfied ratio than RBA* when message arrival densities increase at faster rates or at the same rates as that of process execution latencies due to the dynamics of the workload.
在本文中,我们提出了一种尽力而为的通信协议,称为ABA,旨在最大化使用CSMA/DDCR广播网络的异步实时分布式系统的总体应用效益和截止日期满意度。ABA考虑了一个应用程序模型,其中跨节点应用程序任务的端到端及时性需求使用Jensen的利益函数表示。此外,协议假设应用程序是使用CSMA/DDCR可行性条件设计的,该条件是由设计时可能的消息到达密度上界的“最佳”可能估计驱动的。当这种设计时假设在运行时被违反时,ABA会引导消息流量,以便只传输将增加应用程序总体收益的消息,缓冲其他消息,直到工作负载符合其设计时假设值。为了研究ABA的性能,我们将先前研究的RBA*算法作为基准算法。我们的实验结果表明,与RBA*相比,当消息到达密度以更快的速度增长或与工作负载动态导致的进程执行延迟相同的速度增长时,ABA产生了更高的总体效益和更高的截止日期满足率。
{"title":"A best-effort communication protocol for real-time broadcast networks","authors":"Lakshmish Ramaswamy, B. Ravindran","doi":"10.1109/ICPP.2002.1040909","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040909","url":null,"abstract":"In this paper, we present a best-effort communication protocol, called ABA, that seeks to maximize aggregate application benefit and deadline-satisfied ratio of asynchronous real-time distributed systems that use CSMA/DDCR broadcast networks. ABA considers an application model where end-to-end timeliness requirements of trans-node application tasks are expressed using Jensen's benefit functions. Furthermore, the protocol assumes that the application is designed using CSMA/DDCR feasibility conditions that is driven by a \"best\" possible estimate of upper bounds on message arrival densities that is possible at design-time. When such design-time postulations get violated at run-time, ABA directs message traffic so that messages that will increase applications' aggregate benefit are only transmitted, buffering others, until such time when the workloads respect their design-time postulated values. To study the performance of ABA, we consider a previously studied algorithm called RBA* as a baseline algorithm. Our experimental results indicate that ABA yields higher aggregate benefit and higher deadline-satisfied ratio than RBA* when message arrival densities increase at faster rates or at the same rates as that of process execution latencies due to the dynamics of the workload.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133306699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
MAC layer protocols for real-time traffic in ad-hoc wireless networks 用于自组织无线网络中实时流量的MAC层协议
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040911
A. Pal, A. Doğan, F. Özgüner
Providing quality of service (QoS) to high bandwidth video, voice and data applications in wireless networks is an important problem. Such applications are in the class of real-time applications; they need communication operations to complete within certain targeted deadlines. Based on this premise, this paper addresses the design of distributed MAC layer protocols that incorporate explicit support for real-time traffic in an ad-hoc wireless network. Specifically, we have developed two new MAC layer protocols, namely the elimination by sieving (ES-DCF) and the deadline bursting (DB-DCF) protocols. Both algorithms use deterministic collision resolution algorithms in order to provide timely delivery guarantees to different classes of real-time traffic. The extensive simulation studies conducted confirmed that ES-DCF and DB-DCF perform well for hard-real-time traffic and soft-real-time traffic, respectively.
为无线网络中的高带宽视频、语音和数据应用提供服务质量(QoS)是一个重要问题。这类应用属于实时应用;他们需要在特定的期限内完成通信操作。基于这一前提,本文讨论了分布式MAC层协议的设计,该协议明确支持自组织无线网络中的实时流量。具体来说,我们开发了两个新的MAC层协议,即筛分消除(ES-DCF)和截止日期爆破(DB-DCF)协议。两种算法都采用确定性冲突解决算法,为不同类别的实时流量提供及时交付保证。广泛的仿真研究证实,ES-DCF和DB-DCF分别在硬实时流量和软实时流量中表现良好。
{"title":"MAC layer protocols for real-time traffic in ad-hoc wireless networks","authors":"A. Pal, A. Doğan, F. Özgüner","doi":"10.1109/ICPP.2002.1040911","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040911","url":null,"abstract":"Providing quality of service (QoS) to high bandwidth video, voice and data applications in wireless networks is an important problem. Such applications are in the class of real-time applications; they need communication operations to complete within certain targeted deadlines. Based on this premise, this paper addresses the design of distributed MAC layer protocols that incorporate explicit support for real-time traffic in an ad-hoc wireless network. Specifically, we have developed two new MAC layer protocols, namely the elimination by sieving (ES-DCF) and the deadline bursting (DB-DCF) protocols. Both algorithms use deterministic collision resolution algorithms in order to provide timely delivery guarantees to different classes of real-time traffic. The extensive simulation studies conducted confirmed that ES-DCF and DB-DCF perform well for hard-real-time traffic and soft-real-time traffic, respectively.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115036918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Multi-level shared state for distributed systems 分布式系统的多级共享状态
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040867
DeQing Chen, Chunqiang Tang, Xiangchuan Chen, S. Dwarkadas, M. Scott
As a result of advances in processor and network speeds, more and more applications can productively be spread across geographically distributed machines. In this paper we present a transparent system for memory sharing, InterWeave, developed with such applications in mind. InterWeave can accommodate hardware coherence and consistency within multiprocessors (level-1 sharing), software distributed shared memory (S-DSM) within tightly coupled clusters (level-2 sharing), and version-based coherence and consistency across the Internet (level-3 sharing). InterWeave allows processes written in multiple languages, running on heterogeneous machines, to share arbitrary typed data structures as if they resided in local memory. Application-specific knowledge of minimal coherence requirements is used to minimize communication. Consistency information is maintained in a manner that allows scaling to large amounts of shared data. In C, operations on shared data, including pointers, take precisely the same form as operations on non-shared data. We demonstrate the ease of use and efficiency of the system through an evaluation of several applications. In particular, we demonstrate that InterWeave's support for sharing at higher (more distributed) levels does not reduce the performance of sharing at lower (more tightly coupled) levels.
由于处理器和网络速度的进步,越来越多的应用程序可以高效地分布在地理上分布的机器上。在本文中,我们提出了一种透明的内存共享系统,InterWeave,它是基于这种应用而开发的。InterWeave可以容纳多处理器内的硬件一致性和一致性(一级共享),紧耦合集群内的软件分布式共享内存(S-DSM)(二级共享),以及互联网上基于版本的一致性和一致性(三级共享)。InterWeave允许运行在异构机器上、用多种语言编写的进程共享任意类型的数据结构,就像它们驻留在本地内存中一样。使用最小一致性要求的应用特定知识来最小化通信。一致性信息的维护方式允许扩展到大量共享数据。在C语言中,对共享数据(包括指针)的操作与对非共享数据的操作采用完全相同的形式。通过对几个应用程序的评估,我们证明了该系统的易用性和效率。特别是,我们证明了InterWeave对更高(更分布式)级别共享的支持不会降低更低(更紧密耦合)级别共享的性能。
{"title":"Multi-level shared state for distributed systems","authors":"DeQing Chen, Chunqiang Tang, Xiangchuan Chen, S. Dwarkadas, M. Scott","doi":"10.1109/ICPP.2002.1040867","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040867","url":null,"abstract":"As a result of advances in processor and network speeds, more and more applications can productively be spread across geographically distributed machines. In this paper we present a transparent system for memory sharing, InterWeave, developed with such applications in mind. InterWeave can accommodate hardware coherence and consistency within multiprocessors (level-1 sharing), software distributed shared memory (S-DSM) within tightly coupled clusters (level-2 sharing), and version-based coherence and consistency across the Internet (level-3 sharing). InterWeave allows processes written in multiple languages, running on heterogeneous machines, to share arbitrary typed data structures as if they resided in local memory. Application-specific knowledge of minimal coherence requirements is used to minimize communication. Consistency information is maintained in a manner that allows scaling to large amounts of shared data. In C, operations on shared data, including pointers, take precisely the same form as operations on non-shared data. We demonstrate the ease of use and efficiency of the system through an evaluation of several applications. In particular, we demonstrate that InterWeave's support for sharing at higher (more distributed) levels does not reduce the performance of sharing at lower (more tightly coupled) levels.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127973358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Load balancing in distributed Web server systems with partial document replication 具有部分文档复制的分布式Web服务器系统中的负载平衡
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040886
Ling Zhuo, Cho-Li Wang, F. Lau
How documents of a Web site are replicated and where they are placed among the server nodes have an important bearing on balance of load in a geographically distributed Web server (DWS) system. The traffic generated due to movements of documents at runtime could also affect the performance of the DWS system. In this paper, we prove that minimizing such traffic is NP-hard. We propose a new document distribution scheme that periodically performs partial replication of a site's documents at selected server locations to maintain load balancing. Several approximation algorithms are used in it to minimize traffic generated. The simulation results show that this scheme can achieve better load balancing than a dynamic scheme, while the internal traffic it causes has a negligible effect on the system's performance.
如何复制Web站点的文档以及它们在服务器节点中的位置对地理分布式Web服务器(DWS)系统中的负载平衡具有重要影响。由于在运行时移动文档而产生的流量也可能影响DWS系统的性能。在本文中,我们证明最小化这样的流量是np困难的。我们提出了一种新的文档分发方案,该方案定期在选定的服务器位置执行站点文档的部分复制,以保持负载平衡。它使用了几种近似算法来最小化所产生的流量。仿真结果表明,该方案比动态方案具有更好的负载均衡效果,而其引起的内部流量对系统性能的影响可以忽略不计。
{"title":"Load balancing in distributed Web server systems with partial document replication","authors":"Ling Zhuo, Cho-Li Wang, F. Lau","doi":"10.1109/ICPP.2002.1040886","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040886","url":null,"abstract":"How documents of a Web site are replicated and where they are placed among the server nodes have an important bearing on balance of load in a geographically distributed Web server (DWS) system. The traffic generated due to movements of documents at runtime could also affect the performance of the DWS system. In this paper, we prove that minimizing such traffic is NP-hard. We propose a new document distribution scheme that periodically performs partial replication of a site's documents at selected server locations to maintain load balancing. Several approximation algorithms are used in it to minimize traffic generated. The simulation results show that this scheme can achieve better load balancing than a dynamic scheme, while the internal traffic it causes has a negligible effect on the system's performance.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120924361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
MOBY-a mobile peer-to-peer service and data network mobi -移动点对点服务和数据网络
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040900
T. Horozov, A. Grama, V. Vasudevan, Sean Landis
This paper describes the design and implementation of MOBY, a network for mobile peer-to-peer exchange of services and data. Constraints on computing power of mobile devices, limited hardware, networking, and software resources, and ad-hoc nature of mobile clients pose considerable challenges from the points of view of supporting performance goals, ease of service integration, and adaptation. These challenges are addressed in MOBY by dynamic service location and client mapping, surrogates for mobile clients, and standardized interfaces built upon off-the-shelf software components.
本文描述了MOBY的设计和实现,MOBY是一种用于移动点对点交换服务和数据的网络。移动设备计算能力的限制、有限的硬件、网络和软件资源以及移动客户机的特殊性质,从支持性能目标、服务集成和适应的便利性的角度来看,构成了相当大的挑战。这些挑战在MOBY中通过动态服务位置和客户端映射、移动客户端的代理以及基于现成软件组件构建的标准化接口来解决。
{"title":"MOBY-a mobile peer-to-peer service and data network","authors":"T. Horozov, A. Grama, V. Vasudevan, Sean Landis","doi":"10.1109/ICPP.2002.1040900","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040900","url":null,"abstract":"This paper describes the design and implementation of MOBY, a network for mobile peer-to-peer exchange of services and data. Constraints on computing power of mobile devices, limited hardware, networking, and software resources, and ad-hoc nature of mobile clients pose considerable challenges from the points of view of supporting performance goals, ease of service integration, and adaptation. These challenges are addressed in MOBY by dynamic service location and client mapping, surrogates for mobile clients, and standardized interfaces built upon off-the-shelf software components.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123330382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
A secure protocol for computing dot-products in clustered and distributed environments 一种在集群和分布式环境中计算点产品的安全协议
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040894
Ioannis Ioannidis, A. Grama, M. Atallah
Dot-products form the basis of various applications ranging from scientific computations to commercial applications in data mining and transaction processing. Typical scientific computations utilizing sparse iterative solvers use repeated matrix-vector products. These can be viewed as dot-products of sparse vectors. In database applications, dot-products take the form of counting operations. With widespread use of clustered and distributed platforms, these operations are increasingly being performed across networked hosts. Traditional APIs for messaging are susceptible to sniffing, and the data being transferred between hosts is often enough to compromise the entire computation. Due to the large computational requirements of underlying applications, it is highly desirable that secure protocols add minimal overhead to the original algorithm. Finally, by its very nature, dot-products leak limited amounts of information - one of the parties can detect an entry of the other party's vector by simply probing it with a vector with a I in a particular location and zeros elsewhere. We present an extremely efficient and sufficiently secure protocol for computing the dot-product of two vectors using linear algebraic techniques. Using analytical as well as experimental results, we demonstrate superior performance in terms of computational overhead, numerical stability, and security. We show that the overhead of a two-party dot-product computation using MPI as the messaging API across two high-end workstations connected via a Gigabit ethernet approaches multiple 4.69 over an unsecured dot-product. We also show that the average relative error in dot-products across a large number of random (normalized) vectors was roughly 4.5 /spl times/ 10/sup -9/.
点产品构成了各种应用的基础,从科学计算到数据挖掘和事务处理中的商业应用。典型的科学计算利用稀疏迭代求解使用重复的矩阵向量积。这些可以看作是稀疏向量的点积。在数据库应用程序中,点积采用计数操作的形式。随着集群和分布式平台的广泛使用,这些操作越来越多地跨网络主机执行。用于消息传递的传统api容易被嗅探,并且在主机之间传输的数据通常足以危及整个计算。由于底层应用程序的大量计算需求,非常希望安全协议对原始算法增加最小的开销。最后,就其本质而言,点积泄漏的信息数量有限——一方可以通过简单地在特定位置用带有I和其他地方带有零的向量探测另一方向量的条目来检测它。我们提出了一种利用线性代数技术计算两个向量的点积的极其有效和足够安全的协议。通过分析和实验结果,我们展示了在计算开销、数值稳定性和安全性方面的卓越性能。我们展示了使用MPI作为通过千兆以太网连接的两个高端工作站之间的消息传递API的双方点积计算的开销接近于不安全点积的4.69倍。我们还表明,在大量随机(标准化)向量上的点积的平均相对误差大约是4.5 /spl乘以/ 10/sup -9/。
{"title":"A secure protocol for computing dot-products in clustered and distributed environments","authors":"Ioannis Ioannidis, A. Grama, M. Atallah","doi":"10.1109/ICPP.2002.1040894","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040894","url":null,"abstract":"Dot-products form the basis of various applications ranging from scientific computations to commercial applications in data mining and transaction processing. Typical scientific computations utilizing sparse iterative solvers use repeated matrix-vector products. These can be viewed as dot-products of sparse vectors. In database applications, dot-products take the form of counting operations. With widespread use of clustered and distributed platforms, these operations are increasingly being performed across networked hosts. Traditional APIs for messaging are susceptible to sniffing, and the data being transferred between hosts is often enough to compromise the entire computation. Due to the large computational requirements of underlying applications, it is highly desirable that secure protocols add minimal overhead to the original algorithm. Finally, by its very nature, dot-products leak limited amounts of information - one of the parties can detect an entry of the other party's vector by simply probing it with a vector with a I in a particular location and zeros elsewhere. We present an extremely efficient and sufficiently secure protocol for computing the dot-product of two vectors using linear algebraic techniques. Using analytical as well as experimental results, we demonstrate superior performance in terms of computational overhead, numerical stability, and security. We show that the overhead of a two-party dot-product computation using MPI as the messaging API across two high-end workstations connected via a Gigabit ethernet approaches multiple 4.69 over an unsecured dot-product. We also show that the average relative error in dot-products across a large number of random (normalized) vectors was roughly 4.5 /spl times/ 10/sup -9/.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131538327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 149
Routing permutations with link-disjoint and node-disjoint paths in a class of self-routable networks 一类自路由网络中链路不相交和节点不相交路径的路由置换
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040879
Yuanyuan Yang, Jianchao Wang
High-speed interconnects have been gaining much attention from the computer industry recently as interconnects are becoming a limiting factor to the performance of modern computer systems. This trend will even continue in the near future as technology improves. In this paper, we consider efficiently routing permutations in a class of switch-based interconnects. Permutation is an important communication pattern in parallel and distributed computing systems. We present a generic approach to realizing arbitrary permutations in a class of unique-path, self-routable multistage networks. We consider routing arbitrary permutations with link-disjoint paths and node-disjoint paths in such interconnects in a minimum number of passes. In particular, routing with node-disjoint paths has important applications in the emerging optical interconnects. We employ and further expand the Latin square technique used in the all-to-all personalized exchange algorithms for this class of multistage networks for general permutation routing. The implementation is optimal in number of passes and near-optimal in network transmission time.
高速互连最近受到了计算机行业的广泛关注,因为互连正成为现代计算机系统性能的一个限制因素。随着技术的进步,这种趋势在不久的将来还会继续。在本文中,我们考虑了一类基于交换机的互连中的有效路由排列。排列是并行和分布式计算系统中一种重要的通信方式。我们提出了一种实现一类唯一路径、自路由多阶段网络中任意排列的通用方法。我们考虑在这样的互连中以最少的通道数路由具有链路不相交路径和节点不相交路径的任意排列。特别是具有节点不相交路径的路由在新兴的光互连中具有重要的应用。我们采用并进一步扩展了这类用于一般排列路由的多级网络中所有对所有个性化交换算法中使用的拉丁平方技术。该实现在传输次数上是最优的,在网络传输时间上是接近最优的。
{"title":"Routing permutations with link-disjoint and node-disjoint paths in a class of self-routable networks","authors":"Yuanyuan Yang, Jianchao Wang","doi":"10.1109/ICPP.2002.1040879","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040879","url":null,"abstract":"High-speed interconnects have been gaining much attention from the computer industry recently as interconnects are becoming a limiting factor to the performance of modern computer systems. This trend will even continue in the near future as technology improves. In this paper, we consider efficiently routing permutations in a class of switch-based interconnects. Permutation is an important communication pattern in parallel and distributed computing systems. We present a generic approach to realizing arbitrary permutations in a class of unique-path, self-routable multistage networks. We consider routing arbitrary permutations with link-disjoint paths and node-disjoint paths in such interconnects in a minimum number of passes. In particular, routing with node-disjoint paths has important applications in the emerging optical interconnects. We employ and further expand the Latin square technique used in the all-to-all personalized exchange algorithms for this class of multistage networks for general permutation routing. The implementation is optimal in number of passes and near-optimal in network transmission time.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131888854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Partitioning unstructured meshes for homogeneous and heterogeneous parallel computing environments 异构和同构并行计算环境下的非结构化网格划分
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040887
Peizong Lee, Jan-Jan Wu, Chih Chang
Partitioning meshes is a preprocessing step for parallel scientific simulation. The quality of a partitioning is measured by load balance and communication overhead. The effectiveness of a partitioning significantly influences the performance of parallel computation. In this paper, we propose a quadtree spatial-based domain decomposition method for partitioning unstructured meshes. The background quadtree, which is originally used to represent the density distribution among elements within the computing domain, can be used to obtain an initial partitioning and to do multi-level refinement. As the quadtree implicitly defines hierarchical relationship, which is a natural way to define coarsening and uncoarsening phases, we can repeatedly apply coarsening, partitioning, and uncoarsening multilevel refinement phases, until no improvement can be made. Thus, for most cases, the partitioning results by our method are better than those produced by other graph-based partitioning methods. Experimental studies for the NACA0012 airfoil, the NASA EET wing, and an artillery shell within a shock tube are reported.
划分网格是并行科学仿真的预处理步骤。分区的质量是通过负载平衡和通信开销来衡量的。分区的有效性直接影响并行计算的性能。本文提出了一种基于四叉树空间域分解的非结构化网格划分方法。背景四叉树原本是用来表示计算域内元素之间的密度分布,现在可以用来进行初始划分和多级细化。由于四叉树隐式地定义了层次关系,这是定义粗化和去粗化阶段的一种自然方式,我们可以重复地应用粗化、划分和去粗化多级细化阶段,直到无法改进为止。因此,在大多数情况下,我们的方法的分区结果比其他基于图的分区方法产生的结果要好。实验研究的NACA0012翼型,NASA EET翼,和炮弹内激波管报告。
{"title":"Partitioning unstructured meshes for homogeneous and heterogeneous parallel computing environments","authors":"Peizong Lee, Jan-Jan Wu, Chih Chang","doi":"10.1109/ICPP.2002.1040887","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040887","url":null,"abstract":"Partitioning meshes is a preprocessing step for parallel scientific simulation. The quality of a partitioning is measured by load balance and communication overhead. The effectiveness of a partitioning significantly influences the performance of parallel computation. In this paper, we propose a quadtree spatial-based domain decomposition method for partitioning unstructured meshes. The background quadtree, which is originally used to represent the density distribution among elements within the computing domain, can be used to obtain an initial partitioning and to do multi-level refinement. As the quadtree implicitly defines hierarchical relationship, which is a natural way to define coarsening and uncoarsening phases, we can repeatedly apply coarsening, partitioning, and uncoarsening multilevel refinement phases, until no improvement can be made. Thus, for most cases, the partitioning results by our method are better than those produced by other graph-based partitioning methods. Experimental studies for the NACA0012 airfoil, the NASA EET wing, and an artillery shell within a shock tube are reported.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114414080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An online heuristic for data placement in computer systems with active disks 具有活动磁盘的计算机系统中数据放置的在线启发式方法
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040877
S. Adabala, J. Fortes
In this paper, an online heuristic is proposed and evaluated, for managing the dynamic memory in a computer system with active disks, by physically colocating in disk memory or main memory, the data pages being accessed by a computation slice. This enables a runtime system that can offload the corresponding computation slice to the appropriate processing unit at the disk memory or main memory. A modified version of SEQUITUR, an online compression algorithm, is used to identify the affinity among sets of pages in a virtual memory page reference stream, and a page allocation and replacement policy is presented. The sets of pages identified are shown to closely match the sets of pages referenced by computation slices, using a suite of data access kernels as benchmarks. The paging policy is evaluated with page traces of micro benchmarks and real applications. In memory constrained environments, with additional memory at the disk, most of the benchmarks see improved performance, due to fewer page faults. The paging heuristic can colocate 50% of the affinity sets on average and can offload up to 100% of the computation to disk.
本文提出并评估了一种在线启发式方法,用于在具有活动磁盘的计算机系统中管理动态内存,该方法将计算片访问的数据页物理地分配到磁盘存储器或主存储器中。这使得运行时系统可以将相应的计算片卸载到磁盘内存或主内存上的适当处理单元。采用改进的在线压缩算法SEQUITUR来识别虚拟内存页面引用流中页面集之间的亲和性,并提出了页面分配和替换策略。使用一组数据访问内核作为基准,所标识的页面集与计算片引用的页面集非常匹配。使用微基准测试和实际应用程序的页面跟踪来评估分页策略。在内存受限的环境中,由于磁盘上有额外的内存,大多数基准测试都看到了性能的提高,因为页面错误减少了。分页启发式算法平均可以配置50%的关联集,并且可以将高达100%的计算卸载到磁盘上。
{"title":"An online heuristic for data placement in computer systems with active disks","authors":"S. Adabala, J. Fortes","doi":"10.1109/ICPP.2002.1040877","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040877","url":null,"abstract":"In this paper, an online heuristic is proposed and evaluated, for managing the dynamic memory in a computer system with active disks, by physically colocating in disk memory or main memory, the data pages being accessed by a computation slice. This enables a runtime system that can offload the corresponding computation slice to the appropriate processing unit at the disk memory or main memory. A modified version of SEQUITUR, an online compression algorithm, is used to identify the affinity among sets of pages in a virtual memory page reference stream, and a page allocation and replacement policy is presented. The sets of pages identified are shown to closely match the sets of pages referenced by computation slices, using a suite of data access kernels as benchmarks. The paging policy is evaluated with page traces of micro benchmarks and real applications. In memory constrained environments, with additional memory at the disk, most of the benchmarks see improved performance, due to fewer page faults. The paging heuristic can colocate 50% of the affinity sets on average and can offload up to 100% of the computation to disk.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114569736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hypercube network fault tolerance: a probabilistic approach 超立方体网络容错:一种概率方法
Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040860
Jianer Chen, Iyad A. Kanj, Guojun Wang
Extensive experience has shown that hypercube networks are highly fault tolerant. What is frustrating is that it seems very difficult to properly formulate and formally prove this important fact, despite extensive research efforts in the past two decades. Most proposed fault tolerance models for hypercube networks are only able to characterize very rare extreme situations thus significantly underestimating the fault tolerance power of hypercube networks, while for more realistic fault tolerance models, the analysis becomes much more complicated. We develop new techniques to analyze a realistic fault tolerance model and derive lower bounds for the probability of hypercube network fault tolerance. Our results are both theoretically significant and practically important. Theoretically, our method offers very general and powerful techniques for formally proving lower bounds on the probability of network connectivity, while practically, our results provide formally proven and precisely given upper bounds on node failure probabilities for manufacturers to achieve a desired probability for network connectivity. Our techniques are also useful for analysis of the performance of routing algorithms.
大量的经验表明,超立方体网络具有高度的容错性。令人沮丧的是,尽管在过去的二十年里进行了大量的研究,但要正确地表述和正式证明这一重要事实似乎非常困难。大多数超立方体网络容错模型只能描述非常罕见的极端情况,从而大大低估了超立方体网络的容错能力,而对于更现实的容错模型,分析变得更加复杂。我们开发了新的技术来分析一个现实的容错模型,并推导出超立方体网络容错概率的下界。研究结果具有理论意义和实践意义。理论上,我们的方法为正式证明网络连接概率的下界提供了非常通用和强大的技术,而实际上,我们的结果为制造商提供了正式证明和精确给出的节点故障概率上界,以实现所需的网络连接概率。我们的技术对于分析路由算法的性能也很有用。
{"title":"Hypercube network fault tolerance: a probabilistic approach","authors":"Jianer Chen, Iyad A. Kanj, Guojun Wang","doi":"10.1109/ICPP.2002.1040860","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040860","url":null,"abstract":"Extensive experience has shown that hypercube networks are highly fault tolerant. What is frustrating is that it seems very difficult to properly formulate and formally prove this important fact, despite extensive research efforts in the past two decades. Most proposed fault tolerance models for hypercube networks are only able to characterize very rare extreme situations thus significantly underestimating the fault tolerance power of hypercube networks, while for more realistic fault tolerance models, the analysis becomes much more complicated. We develop new techniques to analyze a realistic fault tolerance model and derive lower bounds for the probability of hypercube network fault tolerance. Our results are both theoretically significant and practically important. Theoretically, our method offers very general and powerful techniques for formally proving lower bounds on the probability of network connectivity, while practically, our results provide formally proven and precisely given upper bounds on node failure probabilities for manufacturers to achieve a desired probability for network connectivity. Our techniques are also useful for analysis of the performance of routing algorithms.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116068363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
期刊
Proceedings International Conference on Parallel Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1