{"title":"Partial replica selection for spatial datasets","authors":"Yun Tian, P. J. Rhodes","doi":"10.1109/eScience.2012.6404473","DOIUrl":null,"url":null,"abstract":"The implementation of partial or incomplete replicas, which represent only a subset of a larger dataset, has been an active topic of research. Partial Spatial Replicas extend this functionality to spatial data, allowing us to distribute a spatial dataset in pieces over several locations. Accessing only a subset of a spatial replica usually results in a large number of relatively small read requests made to the underlying storage device. For this reason, an accurate model of disk access is important when working with spatial subsets. We make two primary contributions in this paper. First, we describe a model for disk access performance that takes filesystem prefetching into account and is sufficiently accurate for spatial replica selection. Second, making a few simplifying assumptions, we propose a fast replica selection algorithm for partial spatial replicas. The algorithm uses a greedy approach that attempts to maximize performance by choosing a collection of replica subsets that allow fast data retrieval by a client machine. Experiments show that the performance of the solution found by our algorithm is on average always at least 91% and 93.4% of the performance of the optimal solution in 4-node and 8-node tests respectively.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"59 1 1","pages":"1-10"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 8th International Conference on E-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2012.6404473","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
The implementation of partial or incomplete replicas, which represent only a subset of a larger dataset, has been an active topic of research. Partial Spatial Replicas extend this functionality to spatial data, allowing us to distribute a spatial dataset in pieces over several locations. Accessing only a subset of a spatial replica usually results in a large number of relatively small read requests made to the underlying storage device. For this reason, an accurate model of disk access is important when working with spatial subsets. We make two primary contributions in this paper. First, we describe a model for disk access performance that takes filesystem prefetching into account and is sufficiently accurate for spatial replica selection. Second, making a few simplifying assumptions, we propose a fast replica selection algorithm for partial spatial replicas. The algorithm uses a greedy approach that attempts to maximize performance by choosing a collection of replica subsets that allow fast data retrieval by a client machine. Experiments show that the performance of the solution found by our algorithm is on average always at least 91% and 93.4% of the performance of the optimal solution in 4-node and 8-node tests respectively.