{"title":"Palantir: Reseizing Network Proximity in Large-Scale Distributed Computing Frameworks Using SDN","authors":"Ze Yu, Min Li, Xin Yang, Xiaolin Li","doi":"10.1109/CLOUD.2014.66","DOIUrl":null,"url":null,"abstract":"Parallel/Distributed computing frameworks, such as MapReduce and Dryad, have been widely adopted to analyze massive data. Traditionally, these frameworks depend on manual configuration to acquire network proximity information to optimize the data placement and task scheduling. However, this approach is cumbersome, inflexible or even infeasible in largescale deployments, for example, across multiple datacenters. In this paper, we address this problem by utilizing the Software-Defined Networking (SDN) capability. We build Palantir, an SDN service specific for parallel/distributed computing frameworks to abstract the proximity information out of the network. Palantir frees the framework developers/ administrators from having to manually configure the network. In addition, Palantir is flexible because it allows different frameworks to define the proximity according to the framework-specific metrics. We design and implement a datacenter-aware MapReduce to demonstrate Palantir's usefullness. Our evaluation shows that, based on Palantir, datacenter-aware MapReduce achieves siginficant performance improvement.","PeriodicalId":288542,"journal":{"name":"2014 IEEE 7th International Conference on Cloud Computing","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 7th International Conference on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUD.2014.66","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Parallel/Distributed computing frameworks, such as MapReduce and Dryad, have been widely adopted to analyze massive data. Traditionally, these frameworks depend on manual configuration to acquire network proximity information to optimize the data placement and task scheduling. However, this approach is cumbersome, inflexible or even infeasible in largescale deployments, for example, across multiple datacenters. In this paper, we address this problem by utilizing the Software-Defined Networking (SDN) capability. We build Palantir, an SDN service specific for parallel/distributed computing frameworks to abstract the proximity information out of the network. Palantir frees the framework developers/ administrators from having to manually configure the network. In addition, Palantir is flexible because it allows different frameworks to define the proximity according to the framework-specific metrics. We design and implement a datacenter-aware MapReduce to demonstrate Palantir's usefullness. Our evaluation shows that, based on Palantir, datacenter-aware MapReduce achieves siginficant performance improvement.