首页 > 最新文献

International Workshop on Analytics for Big Geospatial Data最新文献

英文 中文
The price of generality in spatial indexing 空间标引的通用性代价
Pub Date : 2013-11-04 DOI: 10.1145/2534921.2534923
Bogdan Simion, Daniel N. Ilha, Angela Demke Brown, Ryan Johnson
Efficient indexing can significantly speed up the processing of large volumes of spatial data in many BigData applications. Many new emerging spatial applications (e.g., biomedical imaging, genome analysis, etc.) have varying indexing requirements, thus, a unified indexing infrastructure for implementing new indexing schemes without requiring knowledge of database internals is beneficial. However, designing a generic indexing framework is a challenging task. We study the issues with general indexing schemes, such as the GiST (used in PostGIS) and expose the tradeoff between generality and performance, showing that generality can be severely detrimental to performance if the abstractions are not carefully designed. Our experiments indicate that the GiST framework, as implemented in PostgreSQL/PostGIS, performs 4.5-6x slower for filtering records through the index, compared to a custom R-tree implementation. We also isolate the GiST-specific overhead by implementing the framework outside the DBMS, showing that the GiST-based R-tree is up to 2x slower than the raw R-tree algorithm that it uses internally. We conclude that although a generic framework for a wide range of spatial BigData application domains is desirable, implementers of new frameworks need to be careful in designing the abstractions to avoid paying a hefty performance penalty.
在许多大数据应用中,高效的索引可以显著加快对大量空间数据的处理速度。许多新兴的空间应用(如生物医学成像、基因组分析等)都有不同的索引要求,因此,一个统一的索引基础设施可以在不需要了解数据库内部知识的情况下实现新的索引方案。然而,设计通用索引框架是一项具有挑战性的任务。我们研究了一般索引方案(如GiST(在PostGIS中使用))的问题,并揭示了通用性和性能之间的权衡,表明如果抽象设计不仔细,通用性可能严重损害性能。我们的实验表明,与自定义r树实现相比,在PostgreSQL/PostGIS中实现的GiST框架通过索引过滤记录的速度要慢4.5-6倍。我们还通过在DBMS之外实现框架来隔离特定于gist的开销,这表明基于gist的r树比它在内部使用的原始r树算法慢2倍。我们得出的结论是,尽管一个适用于大范围空间大数据应用领域的通用框架是可取的,但新框架的实现者在设计抽象时需要小心,以避免付出巨大的性能代价。
{"title":"The price of generality in spatial indexing","authors":"Bogdan Simion, Daniel N. Ilha, Angela Demke Brown, Ryan Johnson","doi":"10.1145/2534921.2534923","DOIUrl":"https://doi.org/10.1145/2534921.2534923","url":null,"abstract":"Efficient indexing can significantly speed up the processing of large volumes of spatial data in many BigData applications. Many new emerging spatial applications (e.g., biomedical imaging, genome analysis, etc.) have varying indexing requirements, thus, a unified indexing infrastructure for implementing new indexing schemes without requiring knowledge of database internals is beneficial. However, designing a generic indexing framework is a challenging task. We study the issues with general indexing schemes, such as the GiST (used in PostGIS) and expose the tradeoff between generality and performance, showing that generality can be severely detrimental to performance if the abstractions are not carefully designed. Our experiments indicate that the GiST framework, as implemented in PostgreSQL/PostGIS, performs 4.5-6x slower for filtering records through the index, compared to a custom R-tree implementation. We also isolate the GiST-specific overhead by implementing the framework outside the DBMS, showing that the GiST-based R-tree is up to 2x slower than the raw R-tree algorithm that it uses internally. We conclude that although a generic framework for a wide range of spatial BigData application domains is desirable, implementers of new frameworks need to be careful in designing the abstractions to avoid paying a hefty performance penalty.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131455327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Breaking the big data barrier by enhancing on-board sensor flexibility 通过增强机载传感器的灵活性,打破大数据壁垒
Pub Date : 2013-11-04 DOI: 10.1145/2534921.2534926
P. Baumann, A. Dumitru, Vlad Merticariu, D. Misev, M. Rusu
Modern sensors, such as hyperspectral cameras, deliver massive amounts of data. On board of satellites, the high volume is paired with low bandwidth and part-time availability, during overpasses. This leads to well-known availability problems and bottlenecks in today's remote sensing. We address this challenge by enhancing the on-board system with flexible filtering and processing capabilities based on the Array Analytics engine, rasdaman. Users then can exact request, which can lead to substantially decreased data traffic. Our project has been accepted for a CubeSat mission for which rasdaman now has been prepared. We present the project setup and core extensions done to rasdaman to this end.
现代传感器,如高光谱相机,可以提供大量的数据。在卫星上,在立交桥期间,高容量与低带宽和部分可用性相匹配。这导致了当今遥感中众所周知的可用性问题和瓶颈。我们通过基于Array Analytics引擎rasdaman的灵活过滤和处理能力来增强车载系统,从而解决了这一挑战。然后,用户可以确定请求,这可以大大减少数据流量。我们的项目已经被接受为立方体卫星任务,rasdaman现在已经准备好了。为此,我们介绍了为rasdaman所做的项目设置和核心扩展。
{"title":"Breaking the big data barrier by enhancing on-board sensor flexibility","authors":"P. Baumann, A. Dumitru, Vlad Merticariu, D. Misev, M. Rusu","doi":"10.1145/2534921.2534926","DOIUrl":"https://doi.org/10.1145/2534921.2534926","url":null,"abstract":"Modern sensors, such as hyperspectral cameras, deliver massive amounts of data. On board of satellites, the high volume is paired with low bandwidth and part-time availability, during overpasses. This leads to well-known availability problems and bottlenecks in today's remote sensing.\u0000 We address this challenge by enhancing the on-board system with flexible filtering and processing capabilities based on the Array Analytics engine, rasdaman. Users then can exact request, which can lead to substantially decreased data traffic. Our project has been accepted for a CubeSat mission for which rasdaman now has been prepared. We present the project setup and core extensions done to rasdaman to this end.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124013481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
When big data meets big smog: a big spatio-temporal data framework for China severe smog analysis 当大数据遇上大雾霾:中国重度雾霾分析的大时空数据框架
Pub Date : 2013-11-04 DOI: 10.1145/2534921.2534924
Jiaoyan Chen, Huajun Chen, Jeff Z. Pan, Ming Wu, Ningyu Zhang, Guozhou Zheng
Recently, the appearing disaster of severe smog has been attacking many cities in China such as the capital Beijing. The chief culprit of China smog, namely PM2.5, is affected by various factors including air pollutants, weather, climate, geographical location, urbanization, etc. To analyze the factors, we collect about 35,000,000 air quality records and about 30,000,000 weather records from the sensors in 77 China's cities in 2013. Moreover, two big data sets named Geoname and DBPedia are also combined for the data of climate, geographical location and urbanization. To deal with big spatio-temporal data for big smog analysis, we propose a MapReduce-based framework named BigSmog. It mainly conducts parallel correlation analysis of the factors and scalable training of artificial neural networks for spatio-temporal approximation of the concentration of PM2.5. In the experiments, BigSmog displays high scalability for big smog analysis with big spatio-temporal data. The analysis result shows that the air pollutants influence the short-term concentration of PM2.5 more than the weather and the factors of geographical location and climate rather than urbanization play a major role in determining a city's long-term pollution level of PM2.5. Moreover, the trained ANNs can accurately approximate the concentration of PM2.5.
最近,出现严重雾霾的灾难已经袭击了中国的许多城市,如首都北京。中国雾霾的罪魁祸首是PM2.5,它受到空气污染物、天气、气候、地理位置、城市化等多种因素的影响。为了分析这些因素,我们从2013年中国77个城市的传感器中收集了大约3500万份空气质量记录和大约3000万份天气记录。此外,还结合了两个名为Geoname和DBPedia的大数据集,用于气候、地理位置和城市化数据。为了处理大雾霾分析的大时空数据,我们提出了一个基于mapreduce的框架BigSmog。主要进行因子并行相关分析和人工神经网络的可扩展训练,实现PM2.5浓度的时空逼近。在实验中,BigSmog在大时空数据的大雾霾分析中显示出较高的可扩展性。分析结果表明,大气污染物对PM2.5短期浓度的影响大于天气因素,地理位置和气候因素对城市PM2.5长期污染水平的影响大于城市化因素。此外,训练后的人工神经网络可以准确地近似PM2.5的浓度。
{"title":"When big data meets big smog: a big spatio-temporal data framework for China severe smog analysis","authors":"Jiaoyan Chen, Huajun Chen, Jeff Z. Pan, Ming Wu, Ningyu Zhang, Guozhou Zheng","doi":"10.1145/2534921.2534924","DOIUrl":"https://doi.org/10.1145/2534921.2534924","url":null,"abstract":"Recently, the appearing disaster of severe smog has been attacking many cities in China such as the capital Beijing. The chief culprit of China smog, namely PM2.5, is affected by various factors including air pollutants, weather, climate, geographical location, urbanization, etc. To analyze the factors, we collect about 35,000,000 air quality records and about 30,000,000 weather records from the sensors in 77 China's cities in 2013. Moreover, two big data sets named Geoname and DBPedia are also combined for the data of climate, geographical location and urbanization. To deal with big spatio-temporal data for big smog analysis, we propose a MapReduce-based framework named BigSmog. It mainly conducts parallel correlation analysis of the factors and scalable training of artificial neural networks for spatio-temporal approximation of the concentration of PM2.5. In the experiments, BigSmog displays high scalability for big smog analysis with big spatio-temporal data. The analysis result shows that the air pollutants influence the short-term concentration of PM2.5 more than the weather and the factors of geographical location and climate rather than urbanization play a major role in determining a city's long-term pollution level of PM2.5. Moreover, the trained ANNs can accurately approximate the concentration of PM2.5.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133280037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Parallel spatial query processing on GPUs using R-trees 使用r树的gpu上的并行空间查询处理
Pub Date : 2013-11-04 DOI: 10.1145/2534921.2534949
Simin You, Jianting Zhang, L. Gruenwald
R-Trees are popular spatial indexing techniques that have been widely adopted in many geospatial applications. As commodity GPUs (Graphics Processing Units) are increasingly becoming available on personal workstations and cluster computers, there are considerable research interests in applying the massive data parallel GPGPU (General Purpose computing on GPUs) technologies to index and query large-scale geospatial data on GPUs using R-Trees. In this study, we aim at evaluating the potentials of accelerating both R-Tree bulk loading and spatial window query processing on GPUs using R-Trees. In addition to designing an efficient data layout schema for R-Trees on GPUs, we have implemented several parallel spatial window query processing techniques on GPUs using both dynamically generated R-Trees constructed on CPUs and bulk loaded R-Trees constructed on GPUs. Extensive experiments using both synthetic and real-world datasets have shown that our GPU based parallel query processing techniques using R-Trees can achieve about 10X speedups on average over 8-core CPU parallel implementations by effectively utilizing large numbers of processors and high memory bandwidth on GPUs.
r - tree是一种流行的空间索引技术,在许多地理空间应用中被广泛采用。随着商用gpu(图形处理单元)越来越多地应用于个人工作站和集群计算机,应用大规模数据并行GPGPU (gpu上的通用计算)技术使用R-Trees在gpu上索引和查询大规模地理空间数据引起了相当大的研究兴趣。在本研究中,我们旨在评估使用R-Trees在gpu上加速R-Tree批量加载和空间窗口查询处理的潜力。除了为gpu上的r树设计一个高效的数据布局模式外,我们还在gpu上实现了几种并行空间窗口查询处理技术,这些技术使用在cpu上构造的动态生成的r树和在gpu上构造的批量加载的r树。使用合成数据集和真实数据集的广泛实验表明,我们使用R-Trees基于GPU的并行查询处理技术可以通过有效利用GPU上的大量处理器和高内存带宽,实现比8核CPU并行实现平均约10倍的速度。
{"title":"Parallel spatial query processing on GPUs using R-trees","authors":"Simin You, Jianting Zhang, L. Gruenwald","doi":"10.1145/2534921.2534949","DOIUrl":"https://doi.org/10.1145/2534921.2534949","url":null,"abstract":"R-Trees are popular spatial indexing techniques that have been widely adopted in many geospatial applications. As commodity GPUs (Graphics Processing Units) are increasingly becoming available on personal workstations and cluster computers, there are considerable research interests in applying the massive data parallel GPGPU (General Purpose computing on GPUs) technologies to index and query large-scale geospatial data on GPUs using R-Trees. In this study, we aim at evaluating the potentials of accelerating both R-Tree bulk loading and spatial window query processing on GPUs using R-Trees. In addition to designing an efficient data layout schema for R-Trees on GPUs, we have implemented several parallel spatial window query processing techniques on GPUs using both dynamically generated R-Trees constructed on CPUs and bulk loaded R-Trees constructed on GPUs. Extensive experiments using both synthetic and real-world datasets have shown that our GPU based parallel query processing techniques using R-Trees can achieve about 10X speedups on average over 8-core CPU parallel implementations by effectively utilizing large numbers of processors and high memory bandwidth on GPUs.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121605310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
GPGPU-accelerated interesting interval discovery and other computations on GeoSpatial datasets: a summary of results 在地理空间数据集上gpgpu加速的有趣间隔发现和其他计算:结果摘要
Pub Date : 2013-11-04 DOI: 10.1145/2534921.2535837
S. Prasad, S. Shekhar, Michael McDermott, Xun Zhou, Michael R. Evans, S. Puri
It is imperative that for scalable solutions of GIS computations the modern hybrid architecture comprising a CPU-GPU pair is exploited fully. The existing parallel algorithms and data structures port reasonably well to multi-core CPUs, but poorly to GPGPUs because of latter's atypical fine-grained, single-instruction multiple-thread (SIMT) architecture, extreme memory hierarchy and coalesced access requirements, and delicate CPU-GPU coordination. Recently, our parallelization of the state-of-art interesting sequence discovery algorithms calculates one-dimensional interesting intervals over an image representing the normalized difference vegetation indices of Africa within 31 ms on an nVidia 480GTX. To our knowledge, this paper reports the first parallelization of these algorithms. This allowed us to process 612 images representing biweekly data from July 1981 through Dec 2006 within 22 seconds. We were also able to pipe the output to a display in almost real-time, which would interest climate scientists. We have also undertaken parallelization of two key tree-based data structures, namely R-tree and heap, and have employed parallel R-tree in polygon overlay system. These data structure parallelization are hard because of the underlying tree topology and the fine-grained computation leading to frequent access to such data structures severely stifling parallel efficiency.
为了实现GIS计算的可扩展解决方案,必须充分利用由CPU-GPU对组成的现代混合架构。现有的并行算法和数据结构可以很好地移植到多核cpu上,但由于gpgpu的非典型细粒度、单指令多线程(SIMT)架构、极端的内存层次结构和合并访问要求以及微妙的CPU-GPU协调,gpgpu的移植效果不佳。最近,我们对最先进的兴趣序列发现算法的并行化在nVidia 480GTX上计算了代表非洲归一化差异植被指数的图像在31毫秒内的一维兴趣间隔。据我们所知,本文报道了这些算法的第一次并行化。这使我们能够在22秒内处理612张代表1981年7月至2006年12月的双周数据的图像。我们还可以几乎实时地将输出输出到显示器上,这将引起气候科学家的兴趣。我们还对两种关键的基于树的数据结构r树和堆进行了并行化处理,并将并行r树应用于多边形叠加系统。这些数据结构的并行化很困难,因为底层的树拓扑结构和细粒度计算导致频繁访问这些数据结构,严重抑制了并行效率。
{"title":"GPGPU-accelerated interesting interval discovery and other computations on GeoSpatial datasets: a summary of results","authors":"S. Prasad, S. Shekhar, Michael McDermott, Xun Zhou, Michael R. Evans, S. Puri","doi":"10.1145/2534921.2535837","DOIUrl":"https://doi.org/10.1145/2534921.2535837","url":null,"abstract":"It is imperative that for scalable solutions of GIS computations the modern hybrid architecture comprising a CPU-GPU pair is exploited fully. The existing parallel algorithms and data structures port reasonably well to multi-core CPUs, but poorly to GPGPUs because of latter's atypical fine-grained, single-instruction multiple-thread (SIMT) architecture, extreme memory hierarchy and coalesced access requirements, and delicate CPU-GPU coordination. Recently, our parallelization of the state-of-art interesting sequence discovery algorithms calculates one-dimensional interesting intervals over an image representing the normalized difference vegetation indices of Africa within 31 ms on an nVidia 480GTX. To our knowledge, this paper reports the first parallelization of these algorithms. This allowed us to process 612 images representing biweekly data from July 1981 through Dec 2006 within 22 seconds. We were also able to pipe the output to a display in almost real-time, which would interest climate scientists. We have also undertaken parallelization of two key tree-based data structures, namely R-tree and heap, and have employed parallel R-tree in polygon overlay system. These data structure parallelization are hard because of the underlying tree topology and the fine-grained computation leading to frequent access to such data structures severely stifling parallel efficiency.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114579851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A cluster-based morphological filter for geospatial data analysis 地理空间数据分析的聚类形态学滤波器
Pub Date : 2013-11-04 DOI: 10.1145/2534921.2534922
Zheng Cui, Keqi Zhang, Chengcui Zhang, Shu‐Ching Chen
LIDAR (Light Detection and Ranging) is a widely used technology to measure terrain properties and topographic mapping nowadays. Many filtering methods have been developed to process the geospatial data generated by LIDAR to generate bare earth digital terrain models. Among these methods, mathematical morphological filtering is a very effective and efficient method to separate ground and non-ground objects from LIDAR data. This method can achieve ideal results in the flat terrain, while it is not working very well in the undulating and complex terrain with large non-ground objects. The reason is that it would remove ground terrain objects along with filtering large size non-ground objects when using a large filtering window size. Especially in the mountainous terrain, it would cause the hill cut-off problem, which is a common problem for morphological filters. In this paper, a cluster-based morphological filter is proposed to improve the progressive morphological filter and make it work better on more undulating and complex terrain types. The filtering results demonstrate that the proposed method is able to effectively preserve terrain ground objects and remove large non-ground objects.
激光雷达(LIDAR, Light Detection and Ranging)是目前广泛应用于地形测量和地形测绘的一种技术。为了处理激光雷达生成的地理空间数据以生成裸地数字地形模型,已经开发了许多滤波方法。在这些方法中,数学形态滤波是一种非常有效的从激光雷达数据中分离地面和非地面目标的方法。该方法在平坦地形中可以达到理想的效果,但在起伏复杂的地形中,非地面物体较多,效果不佳。原因是当使用大的过滤窗口大小时,它会去除地面地形物体,同时过滤大尺寸的非地面物体。特别是在山地地形中,它会造成山地截断问题,这是形态滤波器的一个常见问题。本文提出了一种基于聚类的形态滤波器,对渐进式形态滤波器进行了改进,使其能够更好地适应更起伏、更复杂的地形类型。滤波结果表明,该方法能够有效地保留地形地物和去除大型非地物。
{"title":"A cluster-based morphological filter for geospatial data analysis","authors":"Zheng Cui, Keqi Zhang, Chengcui Zhang, Shu‐Ching Chen","doi":"10.1145/2534921.2534922","DOIUrl":"https://doi.org/10.1145/2534921.2534922","url":null,"abstract":"LIDAR (Light Detection and Ranging) is a widely used technology to measure terrain properties and topographic mapping nowadays. Many filtering methods have been developed to process the geospatial data generated by LIDAR to generate bare earth digital terrain models. Among these methods, mathematical morphological filtering is a very effective and efficient method to separate ground and non-ground objects from LIDAR data. This method can achieve ideal results in the flat terrain, while it is not working very well in the undulating and complex terrain with large non-ground objects. The reason is that it would remove ground terrain objects along with filtering large size non-ground objects when using a large filtering window size. Especially in the mountainous terrain, it would cause the hill cut-off problem, which is a common problem for morphological filters. In this paper, a cluster-based morphological filter is proposed to improve the progressive morphological filter and make it work better on more undulating and complex terrain types. The filtering results demonstrate that the proposed method is able to effectively preserve terrain ground objects and remove large non-ground objects.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"550 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129337924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Object based image classification: state of the art and computational challenges 基于对象的图像分类:现状和计算挑战
Pub Date : 2013-11-04 DOI: 10.1145/2534921.2534927
Ranga Raju Vatsavai
As the spatial resolution of satellite remote sensing imagery is advancing towards sub meter, the predominantly pixel based (or single instance) classification methods needs be redesigned to take advantage of the spatial and structural patterns found in the very high resolution imagery. In this work, we look at the advantages of object based image analysis methods through the newer multiple instance learning learning schemes. We analyze these methods in the context of big geospatial data and allude readers to some of the outstanding computational challenges.
随着卫星遥感影像空间分辨率向亚米方向发展,需要重新设计主要基于像元(或单实例)的分类方法,以利用极高分辨率影像中的空间和结构模式。在这项工作中,我们通过新的多实例学习学习方案来研究基于对象的图像分析方法的优点。我们在大地理空间数据的背景下分析这些方法,并暗示读者一些突出的计算挑战。
{"title":"Object based image classification: state of the art and computational challenges","authors":"Ranga Raju Vatsavai","doi":"10.1145/2534921.2534927","DOIUrl":"https://doi.org/10.1145/2534921.2534927","url":null,"abstract":"As the spatial resolution of satellite remote sensing imagery is advancing towards sub meter, the predominantly pixel based (or single instance) classification methods needs be redesigned to take advantage of the spatial and structural patterns found in the very high resolution imagery. In this work, we look at the advantages of object based image analysis methods through the newer multiple instance learning learning schemes. We analyze these methods in the context of big geospatial data and allude readers to some of the outstanding computational challenges.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130132397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
P2EST: parallelization philosophies for evaluating spatio-temporal queries P2EST:用于评估时空查询的并行化哲学
Pub Date : 2013-11-04 DOI: 10.1145/2534921.2534929
Xiling Sun, Anan Yaagoub, Goce Trajcevski, P. Scheuermann, Hao Chen, Abhinav Kachhwaha
This work considers the impact of different contexts when attempting to exploit parallelization approaches for processing continuous spatio-temporal queries. More specifically, we are interested in various trade-off aspects that may arise due to differences of the computing environments like, for example, multicore vs. cloud. Algorithmic solutions for parallel processing of spatio-temporal queries cater to splitting the load among units - be it based on the data or the query (or both) - relying to a bigger or lesser degree on a certain set of features of a given environment. We postulate that incorporating the service-features should be coupled with the algorithms/heuristics for processing particular queries, in addition to the volume of the data. We present the current version of the implementation of our P2EST system and analyze the execution of different heuristics for parallel processing of spatio-temporal range queries.
在尝试利用并行化方法处理连续的时空查询时,这项工作考虑了不同上下文的影响。更具体地说,我们感兴趣的是由于计算环境(例如,多核与云)的差异而可能出现的各种权衡方面。用于并行处理时空查询的算法解决方案可以在单元之间分配负载——无论是基于数据还是基于查询(或两者兼而有之)——或多或少地依赖于给定环境的一组特定特征。我们假设,除了数据量之外,合并服务特性应该与处理特定查询的算法/启发式相结合。我们展示了我们的P2EST系统的当前实现版本,并分析了不同的启发式并行处理时空范围查询的执行情况。
{"title":"P2EST: parallelization philosophies for evaluating spatio-temporal queries","authors":"Xiling Sun, Anan Yaagoub, Goce Trajcevski, P. Scheuermann, Hao Chen, Abhinav Kachhwaha","doi":"10.1145/2534921.2534929","DOIUrl":"https://doi.org/10.1145/2534921.2534929","url":null,"abstract":"This work considers the impact of different contexts when attempting to exploit parallelization approaches for processing continuous spatio-temporal queries. More specifically, we are interested in various trade-off aspects that may arise due to differences of the computing environments like, for example, multicore vs. cloud. Algorithmic solutions for parallel processing of spatio-temporal queries cater to splitting the load among units - be it based on the data or the query (or both) - relying to a bigger or lesser degree on a certain set of features of a given environment. We postulate that incorporating the service-features should be coupled with the algorithms/heuristics for processing particular queries, in addition to the volume of the data. We present the current version of the implementation of our P2EST system and analyze the execution of different heuristics for parallel processing of spatio-temporal range queries.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127681534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Predictive analytics with surveillance big data 预测分析与监控大数据
Pub Date : 2012-11-06 DOI: 10.1145/2447481.2447491
S. Ayhan, J. Pesce, P. Comitz, G. Gerberick, S. Bliesner
In this paper, we describe a novel analytics system that enables query processing and predictive analytics over streams of aviation data. As part of an Internal Research and Development project, Boeing Research and Technology (BR&T) Advanced Air Traffic Management (AATM) built a system that makes predictions based upon descriptive patterns of archived aviation data. Boeing AATM has been receiving live Aircraft Situation Display to Industry (ASDI) data and archiving it for over two years. At the present time, there is not an easy mechanism to perform analytics on the data. The incoming ASDI data is large, compressed, and requires correlation with other flight data before it can be analyzed. The service exposes this data once it has been uncompressed, correlated, and stored in a data warehouse for further analysis using a variety of descriptive, predictive, and possibly prescriptive analytics tools. The service is being built partially in response to requests from Boeing Commercial Aviation (BCA) for analysis of capacity and flow in the US National Airspace System (NAS). The service utilizes a custom tool for correlating the raw ASDI feed, IBM Warehouse with DB2 for data management, WebSphere Message Broker for real-time message brokering, SPSS Modeler for statistical analysis, and Cognos BI for front-end business intelligence (BI) visualization. This paper describes a scalable service architecture, implementation and the value it adds to the aviation domain.
在本文中,我们描述了一个新的分析系统,它可以对航空数据流进行查询处理和预测分析。作为内部研发项目的一部分,波音研究与技术(BR&T)高级空中交通管理(AATM)建立了一个系统,该系统可以根据存档航空数据的描述模式进行预测。波音AATM已经接收实时飞机态势显示工业(ASDI)数据并将其存档了两年多。目前,还没有一种简单的机制来对数据进行分析。传入的ASDI数据是大的、压缩的,并且在分析之前需要与其他飞行数据相关联。一旦数据被解压缩、关联并存储在数据仓库中,服务就会公开这些数据,以便使用各种描述性、预测性和可能的规定性分析工具进行进一步分析。该服务的建立部分是为了响应波音商用航空公司(BCA)对美国国家空域系统(NAS)容量和流量分析的要求。该服务使用自定义工具将原始ASDI提要关联起来,使用IBM Warehouse和DB2进行数据管理,使用WebSphere Message Broker进行实时消息代理,使用SPSS Modeler进行统计分析,使用Cognos BI进行前端业务智能(BI)可视化。本文描述了一种可扩展的服务体系结构、实现及其为航空领域带来的价值。
{"title":"Predictive analytics with surveillance big data","authors":"S. Ayhan, J. Pesce, P. Comitz, G. Gerberick, S. Bliesner","doi":"10.1145/2447481.2447491","DOIUrl":"https://doi.org/10.1145/2447481.2447491","url":null,"abstract":"In this paper, we describe a novel analytics system that enables query processing and predictive analytics over streams of aviation data. As part of an Internal Research and Development project, Boeing Research and Technology (BR&T) Advanced Air Traffic Management (AATM) built a system that makes predictions based upon descriptive patterns of archived aviation data. Boeing AATM has been receiving live Aircraft Situation Display to Industry (ASDI) data and archiving it for over two years. At the present time, there is not an easy mechanism to perform analytics on the data. The incoming ASDI data is large, compressed, and requires correlation with other flight data before it can be analyzed.\u0000 The service exposes this data once it has been uncompressed, correlated, and stored in a data warehouse for further analysis using a variety of descriptive, predictive, and possibly prescriptive analytics tools. The service is being built partially in response to requests from Boeing Commercial Aviation (BCA) for analysis of capacity and flow in the US National Airspace System (NAS). The service utilizes a custom tool for correlating the raw ASDI feed, IBM Warehouse with DB2 for data management, WebSphere Message Broker for real-time message brokering, SPSS Modeler for statistical analysis, and Cognos BI for front-end business intelligence (BI) visualization. This paper describes a scalable service architecture, implementation and the value it adds to the aviation domain.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125733589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Elastic and effective spatio-temporal query processing scheme on Hadoop 基于Hadoop的灵活有效的时空查询处理方案
Pub Date : 2012-11-06 DOI: 10.1145/2447481.2447486
Yunqin Zhong, Xiaomin Zhu, Jinyun Fang
Geospatial applications have become prevalent in both scientific research and industry. Spatio-Temporal query processing is a fundamental issue for driving geospatial applications. However, the state-of-the-art spatio-temporal query processing methods are facing significant challenges as the data expand and concurrent users increase. In this paper we present a novel spatio-temporal querying scheme to provide efficient query processing over big geospatial data. The scheme improves query efficiency from three facets. Firstly, taking geographic proximity and storage locality into consideration, we propose a geospatial data organization approach to achieve high aggregate I/O throughput, and design a distributed indexing framework for efficient pruning of the search space. Furthermore, we design an indexing plus MapReduce query processing architecture to improve data retrieval efficiency and query computation efficiency. In addition, we design distributed caching model to accelerate the access response of hotspot spatial objects. We evaluate the effectiveness of our scheme with comprehensive experiments using real datasets and application scenarios.
地理空间应用在科学研究和工业中都很普遍。时空查询处理是驱动地理空间应用的一个基本问题。然而,随着数据量的增长和并发用户的增加,现有的时空查询处理方法面临着巨大的挑战。本文提出了一种新的时空查询方案,以提供对大地理空间数据的高效查询处理。该方案从三个方面提高了查询效率。首先,考虑地理邻近性和存储局地性,提出了一种地理空间数据组织方法,以实现高聚合I/O吞吐量,并设计了分布式索引框架,对搜索空间进行高效修剪。此外,我们还设计了索引+ MapReduce的查询处理架构,以提高数据检索效率和查询计算效率。此外,我们还设计了分布式缓存模型来加速热点空间对象的访问响应。通过实际数据集和应用场景的综合实验,评估了该方案的有效性。
{"title":"Elastic and effective spatio-temporal query processing scheme on Hadoop","authors":"Yunqin Zhong, Xiaomin Zhu, Jinyun Fang","doi":"10.1145/2447481.2447486","DOIUrl":"https://doi.org/10.1145/2447481.2447486","url":null,"abstract":"Geospatial applications have become prevalent in both scientific research and industry. Spatio-Temporal query processing is a fundamental issue for driving geospatial applications. However, the state-of-the-art spatio-temporal query processing methods are facing significant challenges as the data expand and concurrent users increase. In this paper we present a novel spatio-temporal querying scheme to provide efficient query processing over big geospatial data. The scheme improves query efficiency from three facets. Firstly, taking geographic proximity and storage locality into consideration, we propose a geospatial data organization approach to achieve high aggregate I/O throughput, and design a distributed indexing framework for efficient pruning of the search space. Furthermore, we design an indexing plus MapReduce query processing architecture to improve data retrieval efficiency and query computation efficiency. In addition, we design distributed caching model to accelerate the access response of hotspot spatial objects. We evaluate the effectiveness of our scheme with comprehensive experiments using real datasets and application scenarios.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"701 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122970965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
International Workshop on Analytics for Big Geospatial Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1