Joaquín Chung, Zhengchun Liu, R. Kettimuthu, Ian T Foster
{"title":"Toward an Elastic Data Transfer Infrastructure","authors":"Joaquín Chung, Zhengchun Liu, R. Kettimuthu, Ian T Foster","doi":"10.1109/eScience.2019.00036","DOIUrl":null,"url":null,"abstract":"Data transfer over wide area networks is an integral part of many science workflows that must, for example, move data from scientific facilities to remote resources for analysis, sharing, and storage. Yet despite continued enhancements in data transfer infrastructure (DTI), our previous analyses of approximately 40 billion GridFTP command logs collected over four years from the Globus transfer service show that data transfer nodes (DTNs) are idle (i.e., are performing no transfers) 94.3% of the time. On the other hand, we have also observed periods in which CPU resource scarcity negatively impacts DTN throughput. Motivated by the opportunity to optimize DTI performance, we present here an elastic DTI architecture in which the pool of nodes allocated to DTN activities expands and shrinks over time, based on demand. Our results show that this elastic DTI can save up to ~95% of resources compared with a typical static DTN deployment, with the median slowdown incurred remaining close to one for most of the evaluated scenarios.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"11 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 15th International Conference on eScience (eScience)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2019.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Data transfer over wide area networks is an integral part of many science workflows that must, for example, move data from scientific facilities to remote resources for analysis, sharing, and storage. Yet despite continued enhancements in data transfer infrastructure (DTI), our previous analyses of approximately 40 billion GridFTP command logs collected over four years from the Globus transfer service show that data transfer nodes (DTNs) are idle (i.e., are performing no transfers) 94.3% of the time. On the other hand, we have also observed periods in which CPU resource scarcity negatively impacts DTN throughput. Motivated by the opportunity to optimize DTI performance, we present here an elastic DTI architecture in which the pool of nodes allocated to DTN activities expands and shrinks over time, based on demand. Our results show that this elastic DTI can save up to ~95% of resources compared with a typical static DTN deployment, with the median slowdown incurred remaining close to one for most of the evaluated scenarios.