首页 > 最新文献

Proceedings of the ... ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems : ACM GIS. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems最新文献

英文 中文
Hide Your Distance: Privacy Risks and Protection in Spatial Accessibility Analysis. 隐藏距离:空间可达性分析中的隐私风险与保护。
Liyue Fan, Luca Bonomi

Measuring spatial accessibility to healthcare resources and facilities has long been an important problem in public health. For example, during disease outbreaks, sharing spatial accessibility data such as individual travel distances to health facilities is vital to policy making and designing effective interventions. However, sharing these data may raise privacy concerns, as information about individual data contributors (e.g., health status and residential address) may be disclosed. In this work, we investigate those unintended information leakage in spatial accessibility analysis. Specifically, we are interested in understanding whether sharing data for spatial accessibility computations may disclose individual participation (i.e., membership inference) and personal identifiable information (i.e., address inference). Furthermore, we propose two provably private algorithms that mitigate those privacy risks. The evaluation is conducted with real population and healthcare facilities data from Mecklenburg county, NC and Nashville, TN. Compared to state-of-the-art privacy practices, our methods effectively reduce the risks of membership and address disclosure, while providing useful data for spatial accessibility analysis.

长期以来,测量医疗资源和设施的空间可达性一直是公共卫生领域的一个重要问题。例如,在疾病爆发期间,共享个人前往医疗设施的距离等空间可达性数据对于制定政策和设计有效的干预措施至关重要。然而,共享这些数据可能会引发隐私问题,因为个人数据贡献者的信息(如健康状况和住址)可能会被披露。在这项工作中,我们调查了空间可访问性分析中的意外信息泄漏。具体来说,我们有兴趣了解空间可访问性计算中的数据共享是否会泄露个人参与(即成员推断)和个人身份信息(即地址推断)。此外,我们还提出了两种可证明的隐私算法,以降低这些隐私风险。我们使用北卡罗来纳州梅克伦堡县和田纳西州纳什维尔市的真实人口和医疗设施数据进行了评估。与最先进的隐私实践相比,我们的方法有效降低了成员资格和地址泄露的风险,同时为空间可访问性分析提供了有用的数据。
{"title":"Hide Your Distance: Privacy Risks and Protection in Spatial Accessibility Analysis.","authors":"Liyue Fan, Luca Bonomi","doi":"10.1145/3589132.3625656","DOIUrl":"10.1145/3589132.3625656","url":null,"abstract":"<p><p>Measuring spatial accessibility to healthcare resources and facilities has long been an important problem in public health. For example, during disease outbreaks, sharing spatial accessibility data such as individual travel distances to health facilities is vital to policy making and designing effective interventions. However, sharing these data may raise privacy concerns, as information about individual data contributors (e.g., health status and residential address) may be disclosed. In this work, we investigate those unintended information leakage in spatial accessibility analysis. Specifically, we are interested in understanding whether sharing data for spatial accessibility computations may disclose individual participation (i.e., membership inference) and personal identifiable information (i.e., address inference). Furthermore, we propose two provably private algorithms that mitigate those privacy risks. The evaluation is conducted with real population and healthcare facilities data from Mecklenburg county, NC and Nashville, TN. Compared to state-of-the-art privacy practices, our methods effectively reduce the risks of membership and address disclosure, while providing useful data for spatial accessibility analysis.</p>","PeriodicalId":90295,"journal":{"name":"Proceedings of the ... ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems : ACM GIS. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10751042/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139050055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPU-based Real-time Contact Tracing at Scale. 基于 GPU 的大规模实时联系人追踪。
Dejun Teng, Akshay Nehe, Prajeeth Emanuel, Furqan Baig, Jun Kong, Fusheng Wang

Contact tracing is gaining its importance in controlling the spread of COVID-19. However, the enormous volume of the frequently sampled tracing data brings major challenges for real-time processing. In this paper, we propose a GPU-based real-time contact tracing system based on spatial proximity queries with temporal constraints using location data. We provide dynamic indexing of moving objects using an adaptive partitioning schema on GPU with extremely low overhead. Our system optimizes the retrieval of contacted pairs to match both the requirements of contact tracing scenarios and GPU centered parallelism. We propose an efficient contacts evaluation mechanism to keep only the spatially and temporally valid contacts. Our experiments demonstrate that the system can achieve sub-second level response for large-scale contact tracing of tens of millions of people, with two magnitudes of performance boost over CPU based approach.

接触追踪对于控制 COVID-19 的传播越来越重要。然而,频繁采样的大量追踪数据给实时处理带来了巨大挑战。在本文中,我们提出了一种基于 GPU 的实时接触追踪系统,该系统基于使用位置数据的带有时间约束的空间接近性查询。我们在 GPU 上使用自适应分区模式为移动物体提供动态索引,而且开销极低。我们的系统优化了接触对的检索,以同时满足接触追踪场景和以 GPU 为中心的并行性的要求。我们提出了一种高效的接触评估机制,只保留空间和时间上有效的接触。我们的实验证明,该系统可以在数千万人的大规模联系人追踪中实现亚秒级响应,与基于 CPU 的方法相比,性能提升了两个量级。
{"title":"GPU-based Real-time Contact Tracing at Scale.","authors":"Dejun Teng, Akshay Nehe, Prajeeth Emanuel, Furqan Baig, Jun Kong, Fusheng Wang","doi":"10.1145/3474717.3483627","DOIUrl":"10.1145/3474717.3483627","url":null,"abstract":"<p><p>Contact tracing is gaining its importance in controlling the spread of COVID-19. However, the enormous volume of the frequently sampled tracing data brings major challenges for real-time processing. In this paper, we propose a GPU-based real-time contact tracing system based on spatial proximity queries with temporal constraints using location data. We provide dynamic indexing of moving objects using an adaptive partitioning schema on GPU with extremely low overhead. Our system optimizes the retrieval of contacted pairs to match both the requirements of contact tracing scenarios and GPU centered parallelism. We propose an efficient contacts evaluation mechanism to keep only the spatially and temporally valid contacts. Our experiments demonstrate that the system can achieve sub-second level response for large-scale contact tracing of tens of millions of people, with two magnitudes of performance boost over CPU based approach.</p>","PeriodicalId":90295,"journal":{"name":"Proceedings of the ... ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems : ACM GIS. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8849613/pdf/nihms-1767013.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39795804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining Public Datasets for Modeling Intra-City PM2.5 Concentrations at a Fine Spatial Resolution. 挖掘公共数据集,以精细的空间分辨率对城市内 PM2.5 浓度进行建模。
Yijun Lin, Dimitrios Stripelis, Yao-Yi Chiang, José Luis Ambite, Rima Habre, Fan Pan, Sandrah P Eckel

Air quality models are important for studying the impact of air pollutant on health conditions at a fine spatiotemporal scale. Existing work typically relies on area-specific, expert-selected attributes of pollution emissions (e,g., transportation) and dispersion (e.g., meteorology) for building the model for each combination of study areas, pollutant types, and spatiotemporal scales. In this paper, we present a data mining approach that utilizes publicly available OpenStreetMap (OSM) data to automatically generate an air quality model for the concentrations of fine particulate matter less than 2.5 μm in aerodynamic diameter at various temporal scales. Our experiment shows that our (domain-) expert-free model could generate accurate PM2.5 concentration predictions, which can be used to improve air quality models that traditionally rely on expert-selected input. Our approach also quantifies the impact on air quality from a variety of geographic features (i.e., how various types of geographic features such as parking lots and commercial buildings affect air quality and from what distance) representing mobile, stationary and area natural and anthropogenic air pollution sources. This approach is particularly important for enabling the construction of context-specific spatiotemporal models of air pollution, allowing investigations of the impact of air pollution exposures on sensitive populations such as children with asthma at scale.

空气质量模型对于研究空气污染物在精细时空尺度上对健康状况的影响非常重要。现有的工作通常依赖于特定区域、专家选择的污染排放(如交通)和扩散(如气象)属性,为研究区域、污染物类型和时空尺度的每种组合建立模型。在本文中,我们介绍了一种数据挖掘方法,该方法利用公开的 OpenStreetMap(OSM)数据,自动生成不同时空尺度下空气动力学直径小于 2.5 μm 的细颗粒物浓度的空气质量模型。我们的实验表明,我们的(无领域)专家模型可以生成准确的 PM2.5 浓度预测,可用于改进传统上依赖专家选择输入的空气质量模型。我们的方法还量化了各种地理特征对空气质量的影响(即停车场和商业建筑等各类地理特征对空气质量的影响以及影响距离),这些地理特征代表了移动、固定和区域性的自然和人为空气污染源。这种方法对于构建针对具体环境的空气污染时空模型尤为重要,可用于调查空气污染暴露对哮喘儿童等敏感人群的影响。
{"title":"Mining Public Datasets for Modeling Intra-City PM<sub>2.5</sub> Concentrations at a Fine Spatial Resolution.","authors":"Yijun Lin, Dimitrios Stripelis, Yao-Yi Chiang, José Luis Ambite, Rima Habre, Fan Pan, Sandrah P Eckel","doi":"10.1145/3139958.3140013","DOIUrl":"10.1145/3139958.3140013","url":null,"abstract":"<p><p>Air quality models are important for studying the impact of air pollutant on health conditions at a fine spatiotemporal scale. Existing work typically relies on area-specific, expert-selected attributes of pollution emissions (e,g., transportation) and dispersion (e.g., meteorology) for building the model for each combination of study areas, pollutant types, and spatiotemporal scales. In this paper, we present a data mining approach that utilizes publicly available OpenStreetMap (OSM) data to automatically generate an air quality model for the concentrations of fine particulate matter less than 2.5 <i>μ</i>m in aerodynamic diameter at various temporal scales. Our experiment shows that our (domain-) expert-free model could generate accurate PM<sub>2.5</sub> concentration predictions, which can be used to improve air quality models that traditionally rely on expert-selected input. Our approach also quantifies the impact on air quality from a variety of geographic features (i.e., how various types of geographic features such as parking lots and commercial buildings affect air quality and from what distance) representing mobile, stationary and area natural and anthropogenic air pollution sources. This approach is particularly important for enabling the construction of context-specific spatiotemporal models of air pollution, allowing investigations of the impact of air pollution exposures on sensitive populations such as children with asthma at scale.</p>","PeriodicalId":90295,"journal":{"name":"Proceedings of the ... ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems : ACM GIS. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5841919/pdf/nihms944848.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35902320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iSPEED: an Efficient In-Memory Based Spatial Query System for Large-Scale 3D Data with Complex Structures. iSPEED:一种高效的基于内存的复杂结构大规模三维数据空间查询系统。
Yanhui Liang, Jun Kong, Hoang Vo, Fusheng Wang

Recent advances in digital pathology make it possible to support 3D tissue-based investigation of human diseases at extremely high resolutions. Exploring spatial relationships and patterns among massive 3D micro-anatomic biological objects such as blood vessels and cells derived from 3D pathology image volumes plays a critical role in studying human diseases. In this paper, we present our work on building an effective and scalable in-memory based spatial query system iSPEED for large-scale 3D data with complex structures. To achieve low latency, iSPEED stores data in memory with effective progressive compression for each 3D object with successive levels of detail. To minimize search space and computation cost, iSPEED pregenerates global spatial indexes in memory and employs on-demand indexing at run-time. In particular, iSPEED exploits structural indexing for complex structured objects in distance based queries. iSPEED provides a 3D spatial query engine that can be invoked on-demand to run many instances in parallel implemented with, but not limited to, MapReduce. iSPEED builds in-memory indexes and decompresses data on-demand, which has minimal memory footprint. We evaluate iSPEED with two representative queries: 3D spatial joins and 3D spatial proximity estimation. Our experiments demonstrate that iSPEED significantly improves the performance over traditional non-memory based spatial query systems.

数字病理学的最新进展使得支持以极高分辨率对人类疾病进行基于组织的3D研究成为可能。从三维病理图像中探索血管和细胞等大量三维微观解剖生物对象之间的空间关系和模式,对研究人类疾病具有重要作用。在本文中,我们介绍了一种基于内存的空间查询系统iSPEED的研究工作,该系统可用于复杂结构的大规模三维数据。为了实现低延迟,iSPEED将数据存储在内存中,并对每个具有连续细节级别的3D对象进行有效的渐进压缩。为了最小化搜索空间和计算成本,iSPEED在内存中预生成全局空间索引,并在运行时采用按需索引。特别是,iSPEED利用基于距离查询的复杂结构化对象的结构索引。iSPEED提供了一个3D空间查询引擎,可以按需调用,以并行运行许多实例,但不限于MapReduce。iSPEED在内存中构建索引并按需解压缩数据,这具有最小的内存占用。我们用两个代表性的查询来评估iSPEED: 3D空间连接和3D空间接近估计。我们的实验表明,与传统的非基于内存的空间查询系统相比,iSPEED显著提高了性能。
{"title":"iSPEED: an Efficient In-Memory Based Spatial Query System for Large-Scale 3D Data with Complex Structures.","authors":"Yanhui Liang,&nbsp;Jun Kong,&nbsp;Hoang Vo,&nbsp;Fusheng Wang","doi":"10.1145/3139958.3139961","DOIUrl":"https://doi.org/10.1145/3139958.3139961","url":null,"abstract":"<p><p>Recent advances in digital pathology make it possible to support 3D tissue-based investigation of human diseases at extremely high resolutions. Exploring spatial relationships and patterns among massive 3D micro-anatomic biological objects such as blood vessels and cells derived from 3D pathology image volumes plays a critical role in studying human diseases. In this paper, we present our work on building an effective and scalable in-memory based spatial query system <i>iSPEED</i> for large-scale 3D data with complex structures. To achieve low latency, iSPEED stores data in memory with effective progressive compression for each 3D object with successive levels of detail. To minimize search space and computation cost, iSPEED pregenerates global spatial indexes in memory and employs on-demand indexing at run-time. In particular, iSPEED exploits structural indexing for complex structured objects in distance based queries. iSPEED provides a 3D spatial query engine that can be invoked on-demand to run many instances in parallel implemented with, but not limited to, MapReduce. iSPEED builds in-memory indexes and decompresses data on-demand, which has minimal memory footprint. We evaluate iSPEED with two representative queries: 3D spatial joins and 3D spatial proximity estimation. Our experiments demonstrate that iSPEED significantly improves the performance over traditional non-memory based spatial query systems.</p>","PeriodicalId":90295,"journal":{"name":"Proceedings of the ... ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems : ACM GIS. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3139958.3139961","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38890763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing. SparkGIS:资源感知高效内存空间查询处理。
Furqan Baig, Hoang Vo, Tahsin Kurc, Joel Saltz, Fusheng Wang

Much effort has been devoted to support high performance spatial queries on large volumes of spatial data in distributed spatial computing systems, especially in the MapReduce paradigm. Recent works have focused on extending spatial MapReduce frameworks to leverage high performance in-memory distributed processing capabilities of systems such as Spark. However, the performance advantage comes with the requirement of having enough memory and comprehensive configuration. Failing to fulfill this falls back to disk IO, defeating the purpose of such systems or in worst case gets out of memory and fails the job. The problem is aggravated further for spatial processing since the underlying in-memory systems are oblivious of spatial data features and characteristics. In this paper we present SparkGIS - an in-memory oriented spatial data querying system for high throughput and low latency spatial query handling by adapting Apache Spark's distributed processing capabilities. It supports basic spatial queries including containment, spatial join and k-nearest neighbor and allows extending these to complex query pipelines. SparkGIS mitigates skew in distributed processing by supporting several dynamic partitioning algorithms suitable for a rich set of contemporary application scenarios. Multilevel global and local, pre-generated and on-demand in-memory indexes, allow SparkGIS to prune input data and apply compute intensive operations on a subset of relevant spatial objects only. Finally, SparkGIS employs dynamic query rewriting to gracefully manage large spatial query workflows that exceed available distributed resources. Our comparative evaluation has shown that the performance of SparkGIS is on par with contemporary Spark based platforms for relatively smaller queries and outperforms them for larger data and memory intensive workflows by dynamic query rewriting and efficient spatial data management.

为支持分布式空间计算系统中大量空间数据的高性能空间查询,特别是 MapReduce 范式,人们付出了很多努力。最近的工作重点是扩展空间 MapReduce 框架,以利用 Spark 等系统的高性能内存分布式处理能力。然而,在获得性能优势的同时,还需要有足够的内存和全面的配置。如果不能满足这一要求,就会退回到磁盘 IO,从而违背了此类系统的初衷,最糟糕的情况是内存不足,导致工作失败。由于底层内存系统无视空间数据的特征和特性,因此空间处理问题更加严重。在本文中,我们介绍了 SparkGIS--一个面向内存的空间数据查询系统,它通过调整 Apache Spark 的分布式处理能力,实现了高吞吐量和低延迟的空间查询处理。它支持基本的空间查询,包括包含、空间连接和 K 近邻,并允许将其扩展到复杂的查询管道。SparkGIS 支持多种适合当代丰富应用场景的动态分区算法,从而减轻了分布式处理中的偏差。多层次的全局和局部、预生成和按需内存索引允许 SparkGIS 对输入数据进行剪裁,并仅在相关空间对象的子集上应用计算密集型操作。最后,SparkGIS 采用了动态查询重写技术,可以从容管理超出可用分布式资源的大型空间查询工作流。我们的比较评估表明,对于相对较小的查询,SparkGIS 的性能与基于 Spark 的当代平台相当,而通过动态查询重写和高效的空间数据管理,SparkGIS 的性能在较大的数据和内存密集型工作流中优于它们。
{"title":"SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing.","authors":"Furqan Baig, Hoang Vo, Tahsin Kurc, Joel Saltz, Fusheng Wang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Much effort has been devoted to support high performance spatial queries on large volumes of spatial data in distributed spatial computing systems, especially in the MapReduce paradigm. Recent works have focused on extending spatial MapReduce frameworks to leverage high performance in-memory distributed processing capabilities of systems such as Spark. However, the performance advantage comes with the requirement of having enough memory and comprehensive configuration. Failing to fulfill this falls back to disk IO, defeating the purpose of such systems or in worst case gets out of memory and fails the job. The problem is aggravated further for spatial processing since the underlying in-memory systems are oblivious of spatial data features and characteristics. In this paper we present SparkGIS - an in-memory oriented spatial data querying system for high throughput and low latency spatial query handling by adapting Apache Spark's distributed processing capabilities. It supports basic spatial queries including containment, spatial join and <i>k</i>-nearest neighbor and allows extending these to complex query pipelines. SparkGIS mitigates skew in distributed processing by supporting several dynamic partitioning algorithms suitable for a rich set of contemporary application scenarios. Multilevel global and local, pre-generated and on-demand in-memory indexes, allow SparkGIS to prune input data and apply compute intensive operations on a subset of relevant spatial objects only. Finally, SparkGIS employs dynamic query rewriting to gracefully manage large spatial query workflows that exceed available distributed resources. Our comparative evaluation has shown that the performance of SparkGIS is on par with contemporary Spark based platforms for relatively smaller queries and outperforms them for larger data and memory intensive workflows by dynamic query rewriting and efficient spatial data management.</p>","PeriodicalId":90295,"journal":{"name":"Proceedings of the ... ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems : ACM GIS. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6054321/pdf/nihms980878.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36334968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable 3D Spatial Queries for Analytical Pathology Imaging with MapReduce. 可扩展的三维空间查询分析病理成像与MapReduce。
Yanhui Liang, Hoang Vo, Ablimit Aji, Jun Kong, Fusheng Wang

3D analytical pathology imaging examines high resolution 3D image volumes of human tissues to facilitate biomedical research and provide potential effective diagnostic assistance. Such approach - quantitative analysis of large-scale 3D pathology image volumes - generates tremendous amounts of spatially derived 3D micro-anatomic objects, such as 3D blood vessels and nuclei. Spatial exploration of such massive 3D spatial data requires effective and efficient querying methods. In this paper, we present a scalable and efficient 3D spatial query system for querying massive 3D spatial data based on MapReduce. The system provides an on-demand spatial querying engine which can be executed with as many instances as needed on MapReduce at runtime. Our system supports multiple types of spatial queries on MapReduce through 3D spatial data partitioning, customizable 3D spatial query engine, and implicit parallel spatial query execution. We utilize multi-level spatial indexing to achieve efficient query processing, including global partition indexing for data retrieval and on-demand local spatial indexing for spatial query processing. We evaluate our system with two representative queries: 3D spatial joins and 3D k-nearest neighbor query. Our experiments demonstrate that our system scales to large number of computing nodes, and efficiently handles data-intensive 3D spatial queries that are challenging in analytical pathology imaging.

三维分析病理成像检查人体组织的高分辨率三维图像量,以促进生物医学研究并提供潜在的有效诊断协助。这种方法——大规模三维病理图像量的定量分析——产生了大量空间衍生的三维微观解剖对象,如三维血管和核。对如此海量的三维空间数据进行空间探索,需要有效、高效的查询方法。本文提出了一种基于MapReduce的可扩展、高效的三维空间查询系统,用于查询海量三维空间数据。系统提供了一个按需空间查询引擎,可以在运行时在MapReduce上执行任意数量的实例。我们的系统通过三维空间数据分区、可定制的三维空间查询引擎和隐式并行空间查询执行,支持MapReduce上多种类型的空间查询。我们利用多级空间索引来实现高效的查询处理,包括用于数据检索的全局分区索引和用于空间查询处理的按需本地空间索引。我们用两个代表性的查询来评估我们的系统:3D空间连接和3D k近邻查询。我们的实验表明,我们的系统可以扩展到大量的计算节点,并有效地处理在分析病理成像中具有挑战性的数据密集型3D空间查询。
{"title":"Scalable 3D Spatial Queries for Analytical Pathology Imaging with MapReduce.","authors":"Yanhui Liang,&nbsp;Hoang Vo,&nbsp;Ablimit Aji,&nbsp;Jun Kong,&nbsp;Fusheng Wang","doi":"10.1145/2996913.2996925","DOIUrl":"https://doi.org/10.1145/2996913.2996925","url":null,"abstract":"<p><p>3D analytical pathology imaging examines high resolution 3D image volumes of human tissues to facilitate biomedical research and provide potential effective diagnostic assistance. Such approach - quantitative analysis of large-scale 3D pathology image volumes - generates tremendous amounts of spatially derived 3D micro-anatomic objects, such as 3D blood vessels and nuclei. Spatial exploration of such massive 3D spatial data requires effective and efficient <i>querying</i> methods. In this paper, we present a scalable and efficient 3D spatial query system for querying massive 3D spatial data based on MapReduce. The system provides an on-demand spatial querying engine which can be executed with as many instances as needed on MapReduce at runtime. Our system supports multiple types of spatial queries on MapReduce through 3D spatial data partitioning, customizable 3D spatial query engine, and implicit parallel spatial query execution. We utilize multi-level spatial indexing to achieve efficient query processing, including global partition indexing for data retrieval and on-demand local spatial indexing for spatial query processing. We evaluate our system with two representative queries: 3D spatial joins and 3D <i>k</i>-nearest neighbor query. Our experiments demonstrate that our system scales to large number of computing nodes, and efficiently handles data-intensive 3D spatial queries that are challenging in analytical pathology imaging.</p>","PeriodicalId":90295,"journal":{"name":"Proceedings of the ... ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems : ACM GIS. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2996913.2996925","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35288068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data. 为大规模医学影像数据建立高性能空间查询系统。
Ablimit Aji, Fusheng Wang, Joel H Saltz

Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective querying of such enormous amount of data with fast response, which is faced with two major challenges: the "big data" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.

在许多应用中,支持对大量科学空间数据进行高性能查询正变得越来越重要。推动这一增长的不仅有众多领域的地理空间问题,还有数据和计算日益密集的新兴科学应用。例如,数字病理成像在过去十年中已成为一个新兴领域,通过对人体组织标本的高分辨率图像进行检查,可以更有效地诊断、预测和治疗疾病。对大规模病理图像进行系统分析,可生成大量微观原子对象(如细胞核、血管和组织区域)的空间衍生量化数据。病理成像分析为支持基于图像的计算机辅助诊断提供了巨大潜力。其主要要求之一是对如此巨大的数据量进行有效查询并快速响应,这面临着两大挑战:"大数据 "挑战和高计算复杂性。本文介绍了我们为在 MapReduce 上查询海量空间数据而构建高性能空间查询系统所做的工作。我们的框架采用按需构建索引的方法来处理空间查询,并采用分区-合并的方法来构建并行空间查询管道,这与 MapReduce 的计算模型非常契合。我们在支持算法评估的多向空间连接和微观原子对象的近邻查询上演示了我们的框架。为了缩短查询响应时间,我们提出了基于成本的查询优化,以减轻数据偏斜的影响。我们的实验表明,该框架可以在 MapReduce 上高效地支持复杂的分析性空间查询。
{"title":"Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data.","authors":"Ablimit Aji, Fusheng Wang, Joel H Saltz","doi":"10.1145/2424321.2424361","DOIUrl":"10.1145/2424321.2424361","url":null,"abstract":"<p><p>Support of high performance queries on large volumes of scientific spatial data is becoming increasingly important in many applications. This growth is driven by not only geospatial problems in numerous fields, but also emerging scientific applications that are increasingly data- and compute-intensive. For example, digital pathology imaging has become an emerging field during the past decade, where examination of high resolution images of human tissue specimens enables more effective diagnosis, prediction and treatment of diseases. Systematic analysis of large-scale pathology images generates tremendous amounts of spatially derived quantifications of micro-anatomic objects, such as nuclei, blood vessels, and tissue regions. Analytical pathology imaging provides high potential to support image based computer aided diagnosis. One major requirement for this is effective <i>querying</i> of such enormous amount of data with fast response, which is faced with two major challenges: the \"big data\" challenge and the high computation complexity. In this paper, we present our work towards building a high performance spatial query system for querying massive spatial data on MapReduce. Our framework takes an on demand index building approach for processing spatial queries and a partition-merge approach for building parallel spatial query pipelines, which fits nicely with the computing model of MapReduce. We demonstrate our framework on supporting multi-way spatial joins for algorithm evaluation and nearest neighbor queries for microanatomic objects. To reduce query response time, we propose cost based query optimization to mitigate the effect of data skew. Our experiments show that the framework can efficiently support complex analytical spatial queries on MapReduce.</p>","PeriodicalId":90295,"journal":{"name":"Proceedings of the ... ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems : ACM GIS. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3909999/pdf/nihms480782.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32093860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the ... ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems : ACM GIS. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1