首页 > 最新文献

2008 International Conference on Parallel Processing - Workshops最新文献

英文 中文
Interconnected Traffic with Real Mobility Tool for Ad Hoc Networks 基于实时移动工具的Ad Hoc网络互联流量
Pub Date : 2008-09-08 DOI: 10.1109/ICPP-W.2008.32
A. Doci
Due to the slow deployment of ad hoc networks, their protocol performance is mainly measured in simulation environments and uses synthetic mobility and traffic models. The synthetic mobility and traffic models are designed independently of each other and work under the assumption that wireless nodes start and remain in the simulation for a user-specified simulation time. This paper shows that mobility and traffic are interconnected. We announce the implementation of the interconnected traffic tool and show that under real mobility and interconnected traffic the performance metrics need to be re-thought. Therefore, we propose availability as a new performance metric and evaluate protocol performance under synthetic and real mobility models. We offer the code to anyone interested.
由于ad hoc网络部署缓慢,其协议性能主要在仿真环境中测量,并使用综合移动性和流量模型。综合移动性和流量模型是相互独立设计的,并假设无线节点在用户指定的仿真时间内启动并保持在仿真中。本文表明,出行和交通是相互关联的。我们宣布实施互联交通工具,并表明在真实的移动性和互联交通下,需要重新考虑性能指标。因此,我们提出可用性作为新的性能指标,并在综合和真实移动模型下评估协议性能。我们向任何感兴趣的人提供代码。
{"title":"Interconnected Traffic with Real Mobility Tool for Ad Hoc Networks","authors":"A. Doci","doi":"10.1109/ICPP-W.2008.32","DOIUrl":"https://doi.org/10.1109/ICPP-W.2008.32","url":null,"abstract":"Due to the slow deployment of ad hoc networks, their protocol performance is mainly measured in simulation environments and uses synthetic mobility and traffic models. The synthetic mobility and traffic models are designed independently of each other and work under the assumption that wireless nodes start and remain in the simulation for a user-specified simulation time. This paper shows that mobility and traffic are interconnected. We announce the implementation of the interconnected traffic tool and show that under real mobility and interconnected traffic the performance metrics need to be re-thought. Therefore, we propose availability as a new performance metric and evaluate protocol performance under synthetic and real mobility models. We offer the code to anyone interested.","PeriodicalId":231042,"journal":{"name":"2008 International Conference on Parallel Processing - Workshops","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122225538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Novel Mobility Management Scheme for IEEE 802.11-Based Wireless Mesh Networks 一种新的基于IEEE 802.11的无线Mesh网络移动性管理方案
Pub Date : 2008-09-08 DOI: 10.1109/ICPP-W.2008.22
Zhenxia Zhang, A. Boukerche
Recent advances in Wireless Mesh Networks (WMNs) have overcome the drawbacks of traditional wired networks and wireless ad hoc networks. WMNs are going to play a highly promising role in the next generation of networks. Mobility management is one of the most significant management services for WMNs. Due to the inherent characteristics of WMNs, such as relatively static backbones and highly mobile clients, the question of how to provide seamless mobility management for WMNs is the driving force behind research. In this paper, a novel intra-domain mobility management scheme for WMNs is presented. A hybrid routing algorithm is used to forward packets, and during handoff, gratuitous ARP messages are used to provide the new routing information, thus avoiding re-routing and location update. Real-time applications over 802.11 WMNs can be supported by this scheme, such as VoIP, etc.
无线网状网络(WMNs)的最新进展克服了传统有线网络和无线自组网的缺点。无线网络将在下一代网络中发挥非常有前途的作用。移动性管理是wmn最重要的管理服务之一。由于wmn固有的特点,如相对静态的主干网和高度移动的客户端,如何为wmn提供无缝的移动性管理是研究的动力。提出了一种新的wmn域内移动性管理方案。采用混合路由算法转发报文,在切换过程中使用免费ARP消息提供新的路由信息,避免了重路由和位置更新。该方案可支持802.11无线网络上的实时应用,如VoIP等。
{"title":"A Novel Mobility Management Scheme for IEEE 802.11-Based Wireless Mesh Networks","authors":"Zhenxia Zhang, A. Boukerche","doi":"10.1109/ICPP-W.2008.22","DOIUrl":"https://doi.org/10.1109/ICPP-W.2008.22","url":null,"abstract":"Recent advances in Wireless Mesh Networks (WMNs) have overcome the drawbacks of traditional wired networks and wireless ad hoc networks. WMNs are going to play a highly promising role in the next generation of networks. Mobility management is one of the most significant management services for WMNs. Due to the inherent characteristics of WMNs, such as relatively static backbones and highly mobile clients, the question of how to provide seamless mobility management for WMNs is the driving force behind research. In this paper, a novel intra-domain mobility management scheme for WMNs is presented. A hybrid routing algorithm is used to forward packets, and during handoff, gratuitous ARP messages are used to provide the new routing information, thus avoiding re-routing and location update. Real-time applications over 802.11 WMNs can be supported by this scheme, such as VoIP, etc.","PeriodicalId":231042,"journal":{"name":"2008 International Conference on Parallel Processing - Workshops","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116636449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An Analysis of QoS Provisioning for Sockets Direct Protocol vs. IPoIB over Modern InfiniBand Networks 现代InfiniBand网络中套接字直接协议与IPoIB的QoS配置分析
Pub Date : 2008-09-08 DOI: 10.1109/ICPP-W.2008.25
Ryan E. Grant, Mohammad J. Rashti, A. Afsahi
The introduction of quality of service (QoS) features for socket-based communication over InfiniBand networks provides the opportunity to enact service differentiation for traditional socket-based applications over high performance networks for the first time. The effectiveness of such techniques in providing control over the quality of service that individual connections experience is important in managing traffic in modern data centers. In this paper, we quantitatively analyze the performance benefits of QoS provisioning in InfiniBand networks for sockets direct protocol (SDP) and IPoIB. We find that QoS provisioning can provide prioritized service for sockets-based streams, with more apparent impact on SDP traffic than IPoIB.
InfiniBand网络上基于套接字通信的服务质量(QoS)特性的引入,首次为高性能网络上传统的基于套接字的应用程序提供了实现服务差异化的机会。这些技术在控制单个连接体验的服务质量方面的有效性对于管理现代数据中心的流量非常重要。在本文中,我们定量分析了在InfiniBand网络中针对socket direct protocol (SDP)和IPoIB提供QoS的性能优势。我们发现QoS配置可以为基于套接字的流提供优先级服务,对SDP流量的影响比IPoIB更明显。
{"title":"An Analysis of QoS Provisioning for Sockets Direct Protocol vs. IPoIB over Modern InfiniBand Networks","authors":"Ryan E. Grant, Mohammad J. Rashti, A. Afsahi","doi":"10.1109/ICPP-W.2008.25","DOIUrl":"https://doi.org/10.1109/ICPP-W.2008.25","url":null,"abstract":"The introduction of quality of service (QoS) features for socket-based communication over InfiniBand networks provides the opportunity to enact service differentiation for traditional socket-based applications over high performance networks for the first time. The effectiveness of such techniques in providing control over the quality of service that individual connections experience is important in managing traffic in modern data centers. In this paper, we quantitatively analyze the performance benefits of QoS provisioning in InfiniBand networks for sockets direct protocol (SDP) and IPoIB. We find that QoS provisioning can provide prioritized service for sockets-based streams, with more apparent impact on SDP traffic than IPoIB.","PeriodicalId":231042,"journal":{"name":"2008 International Conference on Parallel Processing - Workshops","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114449494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Understanding Locality-Awareness in Peer-to-Peer Systems 理解点对点系统中的位置感知
Pub Date : 2008-09-08 DOI: 10.1109/ICPP-W.2008.15
Xiongfei Weng, Hongliang Yu, G. Shi, Jing Chen, Xu Wang, Jing Sun, Weimin Zheng
Locality-awareness is one of the essential characteristics for peer-to-peer (P2P) systems. Recently, many locality-aware algorithms have been proposed, in which locality can be defined as different network metrics. In this paper, we compare different performance optimization goals between peer users and ISPs, and then present a detailed simulation study to accurately explore how locality-aware algorithms based on different network metrics influence the performance of real P2P systems. Two widely deployed P2P systems, including BitTorrent, a content-distribution system, and CoolStreaming, a media streaming system, are tested under the real data set from PlanetLab in our extensive simulations. Experimental results suggest that selecting neighbors within the same AS is desirable, which can decrease user experienced delays and keep traffic locality.
位置感知是对等(P2P)系统的基本特征之一。近年来,提出了许多位置感知算法,其中可以将位置定义为不同的网络度量。在本文中,我们比较了对等用户和isp之间不同的性能优化目标,然后进行了详细的仿真研究,以准确地探索基于不同网络指标的位置感知算法如何影响真实P2P系统的性能。两个广泛部署的P2P系统,包括BitTorrent(一个内容分发系统)和CoolStreaming(一个媒体流系统),在PlanetLab的真实数据集下进行了广泛的模拟测试。实验结果表明,在同一自治系统内选择邻居是可取的,这可以减少用户体验的延迟并保持流量的局部性。
{"title":"Understanding Locality-Awareness in Peer-to-Peer Systems","authors":"Xiongfei Weng, Hongliang Yu, G. Shi, Jing Chen, Xu Wang, Jing Sun, Weimin Zheng","doi":"10.1109/ICPP-W.2008.15","DOIUrl":"https://doi.org/10.1109/ICPP-W.2008.15","url":null,"abstract":"Locality-awareness is one of the essential characteristics for peer-to-peer (P2P) systems. Recently, many locality-aware algorithms have been proposed, in which locality can be defined as different network metrics. In this paper, we compare different performance optimization goals between peer users and ISPs, and then present a detailed simulation study to accurately explore how locality-aware algorithms based on different network metrics influence the performance of real P2P systems. Two widely deployed P2P systems, including BitTorrent, a content-distribution system, and CoolStreaming, a media streaming system, are tested under the real data set from PlanetLab in our extensive simulations. Experimental results suggest that selecting neighbors within the same AS is desirable, which can decrease user experienced delays and keep traffic locality.","PeriodicalId":231042,"journal":{"name":"2008 International Conference on Parallel Processing - Workshops","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127850244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
OpenMPD: A Directive-Based Data Parallel Language Extension for Distributed Memory Systems OpenMPD:分布式存储系统的基于指令的数据并行语言扩展
Pub Date : 2008-09-08 DOI: 10.1109/ICPP-W.2008.28
Jinpil Lee, M. Sato, T. Boku
Open MPD is a language extension for programming on distributed memory systems that helps users by having minimal and simple notations. Although MPI is the de facto standard for parallel programming on distributed memory systems, writing MPI programs is often a time-consuming and complicated process. Open MPD supports typical parallelization-based on the data parallel paradigm and work sharing, and enables parallelizing the original sequential code using minimal modification with simple directives, like Open MP. And for flexibility, it allows to combine with explicit MPI coding on parallelization with Open MP for more complicated parallel codes. Experimental results of our implementation show that Open MPD achieves three to eight times speed-up on a PC cluster with eight processors given a small modification to the original sequential code.
Open MPD是一种用于在分布式内存系统上编程的语言扩展,它通过最小和简单的符号来帮助用户。尽管MPI是分布式内存系统上并行编程的事实标准,但编写MPI程序通常是一个耗时且复杂的过程。Open MPD支持基于数据并行范式和工作共享的典型并行化,并支持通过简单指令(如Open MP)进行最小修改来并行化原始顺序代码。为了提高灵活性,它允许将显式MPI编码与Open MP的并行化相结合,以实现更复杂的并行代码。实验结果表明,在具有8个处理器的PC集群上,通过对原始顺序代码进行少量修改,Open MPD可以实现3到8倍的加速。
{"title":"OpenMPD: A Directive-Based Data Parallel Language Extension for Distributed Memory Systems","authors":"Jinpil Lee, M. Sato, T. Boku","doi":"10.1109/ICPP-W.2008.28","DOIUrl":"https://doi.org/10.1109/ICPP-W.2008.28","url":null,"abstract":"Open MPD is a language extension for programming on distributed memory systems that helps users by having minimal and simple notations. Although MPI is the de facto standard for parallel programming on distributed memory systems, writing MPI programs is often a time-consuming and complicated process. Open MPD supports typical parallelization-based on the data parallel paradigm and work sharing, and enables parallelizing the original sequential code using minimal modification with simple directives, like Open MP. And for flexibility, it allows to combine with explicit MPI coding on parallelization with Open MP for more complicated parallel codes. Experimental results of our implementation show that Open MPD achieves three to eight times speed-up on a PC cluster with eight processors given a small modification to the original sequential code.","PeriodicalId":231042,"journal":{"name":"2008 International Conference on Parallel Processing - Workshops","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124645503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
TCP/IP Performance Near I/O Bus Bandwidth on Multi-Core Systems: 10-Gigabit Ethernet vs. Multi-Port Gigabit Ethernet 多核系统上接近I/O总线带宽的TCP/IP性能:10千兆以太网与多端口千兆以太网
Pub Date : 2008-09-08 DOI: 10.1109/ICPP-W.2008.33
Hyun-Wook Jin, Yeon-Ji Yun, Hye-Churn Jang
With significant advances in network interfaces, I/O bus, and processor architecture of end node, innovative approaches are required to achieve high network bandwidth by fully utilizing available system resources. The issues related can be summarized into two: (i) Utilizing I/O bus bandwidth for high bandwidth network connection and (ii) Utilizing multiple cores for high packet processing throughput. In this paper, we conduct several experiments on a multi-core system with 10 GigE and multi-port 1 GigE network interfaces. We aim to show the impact of system configurations on the network performance and compare the performance of two different network interfaces. The experimental results show that, with the proper interrupt affinity configurations, the multi-port 1 GigE can achieve comparable bandwidth to 10 GigE. The peak bandwidth achieved by the multi-port 1 GigE is 6.7 Gbps, which is more than 80% of the theoretical maximum I/O bus bandwidth on the experimental system. We, however, also show that the multi-port 1 GigE can consume much more processor resource than 10 GigE. More importantly, we reveal that processing the packets on many cores can result in more resource consumption without much benefit. This can be because of locking overhead between softirqs running on different cores and lower cache efficiency. We show that the more tuning on the configuration cannot overcome this side effect.
随着网络接口、I/O总线和终端节点处理器架构的显著进步,需要创新的方法来充分利用可用的系统资源来实现高网络带宽。相关问题可以概括为两个:(i)利用i /O总线带宽实现高带宽网络连接;(ii)利用多核实现高数据包处理吞吐量。在本文中,我们在具有10gige和多端口1gige网络接口的多核系统上进行了多次实验。我们的目标是展示系统配置对网络性能的影响,并比较两种不同网络接口的性能。实验结果表明,通过适当的中断亲和配置,多端口1gige可以获得与10gige相当的带宽。多端口1gige实现的峰值带宽为6.7 Gbps,是实验系统理论最大I/O总线带宽的80%以上。然而,我们也显示了多端口1gige比10gige消耗更多的处理器资源。更重要的是,我们揭示了在多个核心上处理数据包可能会导致更多的资源消耗,而没有多少好处。这可能是因为在不同内核上运行的软件之间的锁定开销和较低的缓存效率。我们表明,对配置进行更多的调优并不能克服这种副作用。
{"title":"TCP/IP Performance Near I/O Bus Bandwidth on Multi-Core Systems: 10-Gigabit Ethernet vs. Multi-Port Gigabit Ethernet","authors":"Hyun-Wook Jin, Yeon-Ji Yun, Hye-Churn Jang","doi":"10.1109/ICPP-W.2008.33","DOIUrl":"https://doi.org/10.1109/ICPP-W.2008.33","url":null,"abstract":"With significant advances in network interfaces, I/O bus, and processor architecture of end node, innovative approaches are required to achieve high network bandwidth by fully utilizing available system resources. The issues related can be summarized into two: (i) Utilizing I/O bus bandwidth for high bandwidth network connection and (ii) Utilizing multiple cores for high packet processing throughput. In this paper, we conduct several experiments on a multi-core system with 10 GigE and multi-port 1 GigE network interfaces. We aim to show the impact of system configurations on the network performance and compare the performance of two different network interfaces. The experimental results show that, with the proper interrupt affinity configurations, the multi-port 1 GigE can achieve comparable bandwidth to 10 GigE. The peak bandwidth achieved by the multi-port 1 GigE is 6.7 Gbps, which is more than 80% of the theoretical maximum I/O bus bandwidth on the experimental system. We, however, also show that the multi-port 1 GigE can consume much more processor resource than 10 GigE. More importantly, we reveal that processing the packets on many cores can result in more resource consumption without much benefit. This can be because of locking overhead between softirqs running on different cores and lower cache efficiency. We show that the more tuning on the configuration cannot overcome this side effect.","PeriodicalId":231042,"journal":{"name":"2008 International Conference on Parallel Processing - Workshops","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120983143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Scheduling Task Graphs on Heterogeneous Multiprocessors with Reconfigurable Hardware 硬件可重构异构多处理器上的调度任务图
Pub Date : 2008-09-08 DOI: 10.1109/ICPP-W.2008.39
J. Teller, F. Özgüner, R. Ewing
We address the problem of scheduling applications represented as directed acyclic task graphs (DAGs) onto architectures with reconfigurable processing cores. We introduce the Mutually Exclusive Processor Groups reconfiguration model, a novel reconfiguration model that captures many different modes of reconfiguration. Additionally, we propose the Heterogeneous Earliest Finish Time with Mutually Exclusive Processor Groups (HEFT-MEG) scheduling heuristic using the Mutually Exclusive Processor Groups reconfiguration model. HEFT-MEG schedules reconfigurations using a novel back-tracking algorithm to evaluate how different reconfiguration decisions affect previously scheduled tasks. HEFT-MEG's goal when choosing configurations is to choose the most efficient configuration for different application phases. In simulation, HEFT-MEG generates higher quality schedules than those generated by the hardware-software co-scheduler proposed by Mei, et al. [21] and HEFT [31] using a single configuration.
我们解决了在具有可重构处理核心的架构上以有向无循环任务图(dag)表示的应用程序调度问题。我们介绍了互斥处理器组重构模型,这是一种新的重构模型,它捕获了许多不同的重构模式。此外,我们提出了基于互斥处理器组重构模型的异构最早完成时间调度启发式算法(HEFT-MEG)。HEFT-MEG使用一种新颖的回溯算法来调度重新配置,以评估不同的重新配置决策如何影响先前调度的任务。HEFT-MEG在选择配置时的目标是为不同的应用阶段选择最有效的配置。在仿真中,HEFT- meg生成的调度质量高于Mei等[21]和HEFT[31]采用单一配置的软硬件协同调度程序生成的调度质量。
{"title":"Scheduling Task Graphs on Heterogeneous Multiprocessors with Reconfigurable Hardware","authors":"J. Teller, F. Özgüner, R. Ewing","doi":"10.1109/ICPP-W.2008.39","DOIUrl":"https://doi.org/10.1109/ICPP-W.2008.39","url":null,"abstract":"We address the problem of scheduling applications represented as directed acyclic task graphs (DAGs) onto architectures with reconfigurable processing cores. We introduce the Mutually Exclusive Processor Groups reconfiguration model, a novel reconfiguration model that captures many different modes of reconfiguration. Additionally, we propose the Heterogeneous Earliest Finish Time with Mutually Exclusive Processor Groups (HEFT-MEG) scheduling heuristic using the Mutually Exclusive Processor Groups reconfiguration model. HEFT-MEG schedules reconfigurations using a novel back-tracking algorithm to evaluate how different reconfiguration decisions affect previously scheduled tasks. HEFT-MEG's goal when choosing configurations is to choose the most efficient configuration for different application phases. In simulation, HEFT-MEG generates higher quality schedules than those generated by the hardware-software co-scheduler proposed by Mei, et al. [21] and HEFT [31] using a single configuration.","PeriodicalId":231042,"journal":{"name":"2008 International Conference on Parallel Processing - Workshops","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115497059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Performance Analysis and Optimization of Parallel Scientific Applications on CMP Cluster Systems CMP集群系统上并行科学应用的性能分析与优化
Pub Date : 2008-09-08 DOI: 10.1109/ICPP-W.2008.21
Xingfu Wu, V. Taylor, Charles W. Lively, S. Sharkawi
Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system. A major challenge to be addressed is efficient use of such cluster systems for large-scale scientific applications. In this paper, we quantify the performance gap resulting from using different number of processors per node; this information is used to provide a baseline for the amount of optimization needed when using all processors per node on CMP clusters. We conduct detailed performance analysis to identify how applications can be modified to efficiently utilize all processors per node on CMP clusters, especially focusing on two scientific applications: a 3D particle-in-cell, magnetic fusion application gyrokinetic toroidal code (GTC) and a lattice Boltzmann method for simulating fluid dynamics (LBM). In terms of refinements, we use conventional techniques such as cache blocking, loop unrolling and loop fusion, and develop hybrid methods for optimizing MPI_Allreduce and MPI_Reduce. Using these optimizations, the application performance for utilizing all processors per node was improved by up to 18.97% for GTC and 15.77% for LBM on up to 2048 total processors on the CMP clusters.
芯片多处理器(CMP)广泛用于高性能计算。此外,这些cmp以分层方式配置,以组成集群系统中的节点。需要解决的一个主要挑战是有效地将这种集群系统用于大规模科学应用。在本文中,我们量化了由于每个节点使用不同数量的处理器而导致的性能差距;此信息用于为在CMP集群上使用每个节点上的所有处理器时所需的优化量提供基线。我们进行了详细的性能分析,以确定如何修改应用程序以有效地利用CMP集群上的每个节点的所有处理器,特别是关注两个科学应用:三维粒子在细胞中,磁融合应用回旋动力学环面代码(GTC)和晶格玻尔兹曼方法模拟流体动力学(LBM)。在改进方面,我们使用了传统的技术,如缓存阻塞、循环展开和循环融合,并开发了优化MPI_Allreduce和MPI_Reduce的混合方法。通过这些优化,在CMP集群上最多2048个处理器的情况下,使用每个节点的所有处理器的应用程序性能在GTC上提高了18.97%,在LBM上提高了15.77%。
{"title":"Performance Analysis and Optimization of Parallel Scientific Applications on CMP Cluster Systems","authors":"Xingfu Wu, V. Taylor, Charles W. Lively, S. Sharkawi","doi":"10.1109/ICPP-W.2008.21","DOIUrl":"https://doi.org/10.1109/ICPP-W.2008.21","url":null,"abstract":"Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system. A major challenge to be addressed is efficient use of such cluster systems for large-scale scientific applications. In this paper, we quantify the performance gap resulting from using different number of processors per node; this information is used to provide a baseline for the amount of optimization needed when using all processors per node on CMP clusters. We conduct detailed performance analysis to identify how applications can be modified to efficiently utilize all processors per node on CMP clusters, especially focusing on two scientific applications: a 3D particle-in-cell, magnetic fusion application gyrokinetic toroidal code (GTC) and a lattice Boltzmann method for simulating fluid dynamics (LBM). In terms of refinements, we use conventional techniques such as cache blocking, loop unrolling and loop fusion, and develop hybrid methods for optimizing MPI_Allreduce and MPI_Reduce. Using these optimizations, the application performance for utilizing all processors per node was improved by up to 18.97% for GTC and 15.77% for LBM on up to 2048 total processors on the CMP clusters.","PeriodicalId":231042,"journal":{"name":"2008 International Conference on Parallel Processing - Workshops","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116022490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
A Fuzzy-Based Handover System for Avoiding Ping-Pong Effect in Wireless Cellular Networks 无线蜂窝网络中避免乒乓效应的模糊切换系统
Pub Date : 2008-09-08 DOI: 10.1109/ICPP-W.2008.11
L. Barolli, F. Xhafa, A. Durresi, A. Koyama
Many handover algorithms are proposed in the literature. However, to make a better handover and keep the QoS in wireless networks is very difficult. In this paper, we propose a new handover system based on fuzzy logic. The proposed system uses 3 parameters for handoff decision: the change of signal strength of the present Base Station (BS), signal strength from the neighbor BS, and the distance between Mobile Station (MS) and BS. The performance evaluation via simulations shows that proposed system can avoid ping-pong effect and has a good handover decision.
文献中提出了许多切换算法。然而,在无线网络中,如何更好地进行切换并保持服务质量是一个非常困难的问题。本文提出了一种新的基于模糊逻辑的切换系统。该系统采用3个参数进行切换决策:当前基站(BS)信号强度的变化、来自相邻基站的信号强度以及移动站(MS)与基站的距离。仿真结果表明,该系统能够有效避免乒乓效应,具有良好的切换决策能力。
{"title":"A Fuzzy-Based Handover System for Avoiding Ping-Pong Effect in Wireless Cellular Networks","authors":"L. Barolli, F. Xhafa, A. Durresi, A. Koyama","doi":"10.1109/ICPP-W.2008.11","DOIUrl":"https://doi.org/10.1109/ICPP-W.2008.11","url":null,"abstract":"Many handover algorithms are proposed in the literature. However, to make a better handover and keep the QoS in wireless networks is very difficult. In this paper, we propose a new handover system based on fuzzy logic. The proposed system uses 3 parameters for handoff decision: the change of signal strength of the present Base Station (BS), signal strength from the neighbor BS, and the distance between Mobile Station (MS) and BS. The performance evaluation via simulations shows that proposed system can avoid ping-pong effect and has a good handover decision.","PeriodicalId":231042,"journal":{"name":"2008 International Conference on Parallel Processing - Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130818530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
A Fault Tolerance Scheme for Hierarchical Dynamic Schedulers in Grids 网格中分层动态调度的容错方案
Pub Date : 2008-09-08 DOI: 10.1109/ICPP-W.2008.7
Nitin B. Gorde, S. Aggarwal
In dynamic grid environment failures (e.g. link down, resource failures) are frequent. We present a fault tolerance scheme for hierarchical dynamic scheduler (HDS) for grid workflow applications. In HDS all resources are arranged in a hierarchy tree and each resource acts as a scheduler. The fault tolerance scheme is fully distributed and is responsible for maintaining the hierarchy tree in the presence of failures. Our fault tolerance scheme handles root failures specially, which avoids root becoming single point of failure. The resources detecting failures are responsible for taking appropriate actions. Our fault tolerance scheme uses randomization to get rid of multiple simultaneous failures. Our simulation results show that the recovery process is fast and the failures affect minimally to the scheduling process.
在动态网格环境中,故障(如链接断开、资源故障)是经常发生的。提出了一种适用于网格工作流应用的分层动态调度(HDS)容错方案。在HDS中,所有资源都安排在层次结构树中,每个资源都充当调度程序。容错方案是完全分布式的,在出现故障时负责维护层次结构树。我们的容错方案专门处理根故障,避免了根成为单点故障。检测故障的资源负责采取适当的行动。我们的容错方案使用随机化来消除多个同时发生的故障。仿真结果表明,该方法恢复速度快,故障对调度过程的影响最小。
{"title":"A Fault Tolerance Scheme for Hierarchical Dynamic Schedulers in Grids","authors":"Nitin B. Gorde, S. Aggarwal","doi":"10.1109/ICPP-W.2008.7","DOIUrl":"https://doi.org/10.1109/ICPP-W.2008.7","url":null,"abstract":"In dynamic grid environment failures (e.g. link down, resource failures) are frequent. We present a fault tolerance scheme for hierarchical dynamic scheduler (HDS) for grid workflow applications. In HDS all resources are arranged in a hierarchy tree and each resource acts as a scheduler. The fault tolerance scheme is fully distributed and is responsible for maintaining the hierarchy tree in the presence of failures. Our fault tolerance scheme handles root failures specially, which avoids root becoming single point of failure. The resources detecting failures are responsible for taking appropriate actions. Our fault tolerance scheme uses randomization to get rid of multiple simultaneous failures. Our simulation results show that the recovery process is fast and the failures affect minimally to the scheduling process.","PeriodicalId":231042,"journal":{"name":"2008 International Conference on Parallel Processing - Workshops","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131335342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
2008 International Conference on Parallel Processing - Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1