{"title":"预测零星的网格数据传输","authors":"Sudharshan S. Vazhkudai, J. Schopf","doi":"10.1109/HPDC.2002.1029918","DOIUrl":null,"url":null,"abstract":"The increasingly common practice of replicating datasets and using resources as distributed data stores in grid environments has led to the problem of determining which replica can be accessed most efficiently. Due diverse performance characteristics and load variations of several components in the end-to-end path linking these various locations, selecting a replica from among many requires accurate prediction information of the data transfer times between the sources and sinks. In this paper we present a prediction system that is based on combining end-to-end application throughput observations and network load variations, capturing the whole-system performance and variations in load patterns, respectively. We develop a set of regression models to derive predictions that characterize the effect of network load variations on file transfer times. We apply these techniques to the GridFTP data movement tool, part of the Globus Toolkit/spl trade/, and observe performance gains of up to 10% in prediction accuracy when compared with approaches based on past system behavior in isolation.","PeriodicalId":279053,"journal":{"name":"Proceedings 11th IEEE International Symposium on High Performance Distributed Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"88","resultStr":"{\"title\":\"Predicting sporadic grid data transfers\",\"authors\":\"Sudharshan S. Vazhkudai, J. Schopf\",\"doi\":\"10.1109/HPDC.2002.1029918\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasingly common practice of replicating datasets and using resources as distributed data stores in grid environments has led to the problem of determining which replica can be accessed most efficiently. Due diverse performance characteristics and load variations of several components in the end-to-end path linking these various locations, selecting a replica from among many requires accurate prediction information of the data transfer times between the sources and sinks. In this paper we present a prediction system that is based on combining end-to-end application throughput observations and network load variations, capturing the whole-system performance and variations in load patterns, respectively. We develop a set of regression models to derive predictions that characterize the effect of network load variations on file transfer times. We apply these techniques to the GridFTP data movement tool, part of the Globus Toolkit/spl trade/, and observe performance gains of up to 10% in prediction accuracy when compared with approaches based on past system behavior in isolation.\",\"PeriodicalId\":279053,\"journal\":{\"name\":\"Proceedings 11th IEEE International Symposium on High Performance Distributed Computing\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"88\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 11th IEEE International Symposium on High Performance Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPDC.2002.1029918\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 11th IEEE International Symposium on High Performance Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPDC.2002.1029918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The increasingly common practice of replicating datasets and using resources as distributed data stores in grid environments has led to the problem of determining which replica can be accessed most efficiently. Due diverse performance characteristics and load variations of several components in the end-to-end path linking these various locations, selecting a replica from among many requires accurate prediction information of the data transfer times between the sources and sinks. In this paper we present a prediction system that is based on combining end-to-end application throughput observations and network load variations, capturing the whole-system performance and variations in load patterns, respectively. We develop a set of regression models to derive predictions that characterize the effect of network load variations on file transfer times. We apply these techniques to the GridFTP data movement tool, part of the Globus Toolkit/spl trade/, and observe performance gains of up to 10% in prediction accuracy when compared with approaches based on past system behavior in isolation.