Brennan Bell, T. Dinter, Vlad Merticariu, B. P. Huu, D. Misev, P. Baumann
{"title":"Navigating Sea-Ice Timeseries Data using Tracklines","authors":"Brennan Bell, T. Dinter, Vlad Merticariu, B. P. Huu, D. Misev, P. Baumann","doi":"10.1109/eScience.2018.00115","DOIUrl":null,"url":null,"abstract":"Scientists are often interested in sampling buffered regions of data across multiple time-slices in array datacubes. For instance, in studying sea-ice distributions, a string of geographic coordinates with timestamps are requested, representing a sample or ship track line of a measurement campaign. A defined region is sampled around each of those data points using a nearestneighbour approach in time and a buffer or polygon clipping in the spatial domain. Objectively, such queries can be handled discretely across the time domain, as there is no temporal interpolation, and as a result, the tiling of extracted rasters is well-defined by the tiling of the source data. What happens when the resulting object should also be represented by a 3-D raster, such as in the case where the trackline consists of continuous buffered sampling across the timeseries? Spatio-temporal data is typically stored in chunked 3-D arrays, where multiple time-slices appear in the same \"tile\" or subarray. Unlike the discrete version, tracing out a polygonally-shaped buffer along a ship’s path in a 3-D spatio-temporal datacube leads to shearing across the spatial tiles in the result raster, and this shearing prevents an a priori tiling of the result. Here, we present several approaches to tiling the result raster, and we provide a mathematical investigation of the impact these approaches can have on performance. To substantiate the theoretical investigation, an implementation and performance benchmarks on the different tiling approaches are provided, and the implementation is demonstrated on sea-ice data as a casestudy. In future work, we discuss different approaches towards parallelization utilizing these techniques as a basis for thread-safety, establishing the results on arbitrary R+ trees and extending these results to R* trees.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"14 1","pages":"392-392"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 14th International Conference on e-Science (e-Science)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2018.00115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Scientists are often interested in sampling buffered regions of data across multiple time-slices in array datacubes. For instance, in studying sea-ice distributions, a string of geographic coordinates with timestamps are requested, representing a sample or ship track line of a measurement campaign. A defined region is sampled around each of those data points using a nearestneighbour approach in time and a buffer or polygon clipping in the spatial domain. Objectively, such queries can be handled discretely across the time domain, as there is no temporal interpolation, and as a result, the tiling of extracted rasters is well-defined by the tiling of the source data. What happens when the resulting object should also be represented by a 3-D raster, such as in the case where the trackline consists of continuous buffered sampling across the timeseries? Spatio-temporal data is typically stored in chunked 3-D arrays, where multiple time-slices appear in the same "tile" or subarray. Unlike the discrete version, tracing out a polygonally-shaped buffer along a ship’s path in a 3-D spatio-temporal datacube leads to shearing across the spatial tiles in the result raster, and this shearing prevents an a priori tiling of the result. Here, we present several approaches to tiling the result raster, and we provide a mathematical investigation of the impact these approaches can have on performance. To substantiate the theoretical investigation, an implementation and performance benchmarks on the different tiling approaches are provided, and the implementation is demonstrated on sea-ice data as a casestudy. In future work, we discuss different approaches towards parallelization utilizing these techniques as a basis for thread-safety, establishing the results on arbitrary R+ trees and extending these results to R* trees.