首页 > 最新文献

International Workshop on Analytics for Big Geospatial Data最新文献

英文 中文
Big data as a service from an urban information system 城市信息系统的大数据服务
Pub Date : 2016-10-31 DOI: 10.1145/3006386.3006391
Alexandre Sorokine, R. Karthik, A. King, B. Bhaduri
Big Data has already proven itself as a valuable tool that lets geographers and urban researchers utilize large data resources to generate new insights. However, wider adoption of Big Data techniques in these areas is impeded by a number of difficulties in both knowledge discovery and data and science production. Typically users face such problems as disparate and scattered data, data management, spatial searching, insufficient computational capacity for data-driven analysis and modelling, and the lack of tools to quickly visualize and summarize large data and analysis results. Here we propose an architecture for an Urban Information System (UrbIS) that mitigates these problems by utilizing the Big Data as a Service (BDaaS) concept. With technological roots in High-performance Computing (HPC), BDaaS is based on the idea of outsourcing computations to different computing paradigms, scalable to super-computers. UrbIS aims to incorporate federated metadata search, integrated modeling and analysis, and geovisualization into a single seamless workflow. The system is under active development and is built around various emerging technologies that include hybrid and NoSQL databases, massively parallel systems, GPU computing, and WebGL-based geographic visualization. UrbIS is designed to facilitate the use of Big Data across multiple cities to better understand how urban areas impact the environment and how climate change and other environmental change impact urban areas.
大数据已经证明了自己是一个有价值的工具,可以让地理学家和城市研究人员利用大数据资源来产生新的见解。然而,大数据技术在这些领域的广泛应用受到知识发现和数据科学生产方面的一些困难的阻碍。用户通常面临的问题包括数据分散、分散、数据管理、空间搜索、数据驱动分析和建模的计算能力不足、缺乏快速可视化和汇总大数据和分析结果的工具等。在这里,我们提出了一个城市信息系统(UrbIS)的架构,通过利用大数据即服务(BDaaS)的概念来缓解这些问题。BDaaS的技术根源在于高性能计算(HPC),它基于将计算外包给不同计算范式的思想,可扩展到超级计算机。UrbIS旨在将联合元数据搜索、集成建模和分析以及地理可视化整合到一个单一的无缝工作流中。该系统正在积极开发中,并围绕各种新兴技术构建,包括混合和NoSQL数据库、大规模并行系统、GPU计算和基于webgl的地理可视化。UrbIS旨在促进跨多个城市使用大数据,以更好地了解城市地区如何影响环境,以及气候变化和其他环境变化如何影响城市地区。
{"title":"Big data as a service from an urban information system","authors":"Alexandre Sorokine, R. Karthik, A. King, B. Bhaduri","doi":"10.1145/3006386.3006391","DOIUrl":"https://doi.org/10.1145/3006386.3006391","url":null,"abstract":"Big Data has already proven itself as a valuable tool that lets geographers and urban researchers utilize large data resources to generate new insights. However, wider adoption of Big Data techniques in these areas is impeded by a number of difficulties in both knowledge discovery and data and science production. Typically users face such problems as disparate and scattered data, data management, spatial searching, insufficient computational capacity for data-driven analysis and modelling, and the lack of tools to quickly visualize and summarize large data and analysis results. Here we propose an architecture for an Urban Information System (UrbIS) that mitigates these problems by utilizing the Big Data as a Service (BDaaS) concept. With technological roots in High-performance Computing (HPC), BDaaS is based on the idea of outsourcing computations to different computing paradigms, scalable to super-computers. UrbIS aims to incorporate federated metadata search, integrated modeling and analysis, and geovisualization into a single seamless workflow. The system is under active development and is built around various emerging technologies that include hybrid and NoSQL databases, massively parallel systems, GPU computing, and WebGL-based geographic visualization. UrbIS is designed to facilitate the use of Big Data across multiple cities to better understand how urban areas impact the environment and how climate change and other environmental change impact urban areas.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121582748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards massive spatial data validation with SpatialHadoop 使用SpatialHadoop实现大规模空间数据验证
Pub Date : 2016-10-31 DOI: 10.1145/3006386.3006392
S. Migliorini, A. Belussi, Mauro Negri, G. Pelagatti
Spatial data usually encapsulate semantic characterization of features which carry out important meaning and relations among objects, such as the containment between the extension of a region and of its constituent parts. The GeoUML methodology allows one to bring the gap between the definition of spatial integrity constraints at conceptual level and the realization of validation procedures. In particular, it automatically generates SQL validation queries starting from a conceptual specification and using predefined SQL templates. These queries can be used to check data contained into spatial relational databases, such as PostGIS. However, the quality requirements and the amount of available data are considerably growing making unfeasible the execution of these validation procedures. The use of the map-reduce paradigm can be effectively applied in such context since the same test can be performed in parallel on different data chunks and then partial results can be combined together to obtain the final set of violating objects. Pigeon is a data-flow language defined on top of Spatial Hadoop which provides spatial data types and functions. The aim of this paper is to explore the possibility to extend the GeoUML methodology by automatically producing Pigeon validation procedures starting from a set of predefined Pigeon macros. These scripts can be used in a map-reduce environment in order to make feasible the validation of large datasets.
空间数据通常封装了对象之间具有重要意义和关系的特征的语义表征,例如区域的扩展与其组成部分之间的包容关系。GeoUML方法允许在概念级别定义空间完整性约束与验证过程的实现之间缩小差距。特别是,它从概念规范开始并使用预定义的SQL模板自动生成SQL验证查询。这些查询可用于检查包含在空间关系数据库(如PostGIS)中的数据。然而,质量要求和可用数据的数量正在显著增长,使得这些验证程序的执行变得不可行。map-reduce范式的使用可以有效地应用于这种情况,因为相同的测试可以在不同的数据块上并行执行,然后可以将部分结果组合在一起以获得最终的违反对象集。Pigeon是一种定义在Spatial Hadoop之上的数据流语言,它提供空间数据类型和函数。本文的目的是探索通过从一组预定义的Pigeon宏开始自动生成Pigeon验证过程来扩展GeoUML方法的可能性。这些脚本可以在map-reduce环境中使用,以使大型数据集的验证变得可行。
{"title":"Towards massive spatial data validation with SpatialHadoop","authors":"S. Migliorini, A. Belussi, Mauro Negri, G. Pelagatti","doi":"10.1145/3006386.3006392","DOIUrl":"https://doi.org/10.1145/3006386.3006392","url":null,"abstract":"Spatial data usually encapsulate semantic characterization of features which carry out important meaning and relations among objects, such as the containment between the extension of a region and of its constituent parts. The GeoUML methodology allows one to bring the gap between the definition of spatial integrity constraints at conceptual level and the realization of validation procedures. In particular, it automatically generates SQL validation queries starting from a conceptual specification and using predefined SQL templates. These queries can be used to check data contained into spatial relational databases, such as PostGIS.\u0000 However, the quality requirements and the amount of available data are considerably growing making unfeasible the execution of these validation procedures. The use of the map-reduce paradigm can be effectively applied in such context since the same test can be performed in parallel on different data chunks and then partial results can be combined together to obtain the final set of violating objects. Pigeon is a data-flow language defined on top of Spatial Hadoop which provides spatial data types and functions. The aim of this paper is to explore the possibility to extend the GeoUML methodology by automatically producing Pigeon validation procedures starting from a set of predefined Pigeon macros. These scripts can be used in a map-reduce environment in order to make feasible the validation of large datasets.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127913048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Analytics on public transport delays with spatial big data 基于空间大数据的公共交通延误分析
Pub Date : 2016-10-31 DOI: 10.1145/3006386.3006387
Jayanth Raghothama, V. M. Shreenath, S. Meijer
The increasing pervasiveness of location-aware technologies is leading to the rise of large, spatio-temporal datasets and to the opportunity of discovering usable knowledge about the behaviors of people and objects. Applied extensively in transportation, spatial big data and its analytics can deliver useful insights on a number of different issues such as congestion, delays, public transport reliability and so on. Predominantly studied for its use in operational management, spatial big data can be used to provide insight in strategic applications as well, from planning and design to evaluation and management. Such large scale, streaming spatial big data can be used in the improvement of public transport, for example the design of public transport networks and reliability. In this paper, we analyze GTFS data from the cities of Stockholm and Rome to gain insight on the sources and factors influencing public transport delays in the cities. The analysis is performed on a combination of GTFS data with data from other sources. The paper points to key issues in the analysis of real time data, driven by the contextual setting in the two cities.
位置感知技术的日益普及导致了大型时空数据集的兴起,并为发现有关人和物体行为的可用知识提供了机会。空间大数据及其分析广泛应用于交通领域,可以为许多不同的问题提供有用的见解,如拥堵、延误、公共交通可靠性等。空间大数据主要用于运营管理,也可用于从规划、设计到评估和管理的战略应用。这种大规模、流化的空间大数据可以用于公共交通的改进,例如公共交通网络的设计和可靠性。在本文中,我们分析了来自斯德哥尔摩和罗马的GTFS数据,以深入了解城市公共交通延误的来源和影响因素。对GTFS数据和来自其他来源的数据进行组合分析。本文指出了在两个城市的背景环境驱动下,实时数据分析中的关键问题。
{"title":"Analytics on public transport delays with spatial big data","authors":"Jayanth Raghothama, V. M. Shreenath, S. Meijer","doi":"10.1145/3006386.3006387","DOIUrl":"https://doi.org/10.1145/3006386.3006387","url":null,"abstract":"The increasing pervasiveness of location-aware technologies is leading to the rise of large, spatio-temporal datasets and to the opportunity of discovering usable knowledge about the behaviors of people and objects. Applied extensively in transportation, spatial big data and its analytics can deliver useful insights on a number of different issues such as congestion, delays, public transport reliability and so on. Predominantly studied for its use in operational management, spatial big data can be used to provide insight in strategic applications as well, from planning and design to evaluation and management. Such large scale, streaming spatial big data can be used in the improvement of public transport, for example the design of public transport networks and reliability. In this paper, we analyze GTFS data from the cities of Stockholm and Rome to gain insight on the sources and factors influencing public transport delays in the cities. The analysis is performed on a combination of GTFS data with data from other sources. The paper points to key issues in the analysis of real time data, driven by the contextual setting in the two cities.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132526626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Spatial computing goes to education and beyond: can semantic trajectory characterize students? 空间计算应用于教育及其他领域:语义轨迹能表征学生吗?
Pub Date : 2016-10-31 DOI: 10.1145/3006386.3006389
J. Heo, Sanghyun Yoon, Won Seob Oh, J. Ma, Sungha Ju, S. Yun
Spatial big data (SBD) has been utilized in many fields and we propose SBD analytics to apply to education with semantic trajectory data of undergraduate students in Songdo International Campus at Yonsei University. Higher education is under a pressure of disruptive innovation, so that colleges and universities strive to provide not only better education but also customized service to every single student, for a matter of survival in upcoming drastic wave. The entire research plan is to present a smart campus with SBD analytics for education, safety, health, and campus management, and this research is composed of four specific items: (1) to produce 3D mapping for project site; (2) to build semantic trajectory based on class attendance records, dorm gate entry records, etc.; (3) to collect pedagogical and other parameters of students; (4) to find relationship among trajectory patterns and pedagogical characteristics. Successful completion of the research would set a milestone to use semantic trajectory to predict student performance and characteristics, even further to go to proactive student care system and student activity guiding system. It can eventually present better customized education services to participating students.
空间大数据(SBD)已经在许多领域得到了应用,我们提出了将SBD分析应用于延世大学松岛国际校区本科生语义轨迹数据的教育。高等教育面临着颠覆性创新的压力,为了在即将到来的激烈浪潮中生存下来,高校不仅要努力提供更好的教育,还要努力为每一位学生提供量身定制的服务。整个研究计划是为教育、安全、健康和校园管理提供一个具有SBD分析的智慧校园,本研究由四个具体项目组成:(1)制作项目现场的3D地图;(2)基于班级出勤记录、宿舍门禁记录等构建语义轨迹;(3)收集学生的教学参数和其他参数;(4)寻找轨迹模式与教学特征的关系。本研究的成功完成将为利用语义轨迹预测学生成绩和特征,进而走向积极主动的学生关怀系统和学生活动指导系统,树立一个里程碑。它最终可以为参与的学生提供更好的定制教育服务。
{"title":"Spatial computing goes to education and beyond: can semantic trajectory characterize students?","authors":"J. Heo, Sanghyun Yoon, Won Seob Oh, J. Ma, Sungha Ju, S. Yun","doi":"10.1145/3006386.3006389","DOIUrl":"https://doi.org/10.1145/3006386.3006389","url":null,"abstract":"Spatial big data (SBD) has been utilized in many fields and we propose SBD analytics to apply to education with semantic trajectory data of undergraduate students in Songdo International Campus at Yonsei University. Higher education is under a pressure of disruptive innovation, so that colleges and universities strive to provide not only better education but also customized service to every single student, for a matter of survival in upcoming drastic wave. The entire research plan is to present a smart campus with SBD analytics for education, safety, health, and campus management, and this research is composed of four specific items: (1) to produce 3D mapping for project site; (2) to build semantic trajectory based on class attendance records, dorm gate entry records, etc.; (3) to collect pedagogical and other parameters of students; (4) to find relationship among trajectory patterns and pedagogical characteristics. Successful completion of the research would set a milestone to use semantic trajectory to predict student performance and characteristics, even further to go to proactive student care system and student activity guiding system. It can eventually present better customized education services to participating students.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127307234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
High-performance polyline intersection based spatial join on GPU-accelerated clusters 基于gpu加速集群空间连接的高性能多线段交
Pub Date : 2016-10-31 DOI: 10.1145/3006386.3006390
Simin You, Jianting Zhang, L. Gruenwald
The rapid growing volumes of spatial data have brought significant challenges on developing high-performance spatial data processing techniques in parallel and distributed computing environments. Spatial joins are important data management techniques in gaining insights from large-scale geospatial data. While several distributed spatial join techniques based on spatial partitions have been implemented on top of existing Big Data systems, they are not capable of natively exploiting massively data parallel computing power provided by modern commodity Graphics Processing Units (GPUs). In this study, as an important component of our research initiative in developing high-performance spatial join techniques on GPUs, we have designed and implemented a polyline intersection based spatial join technique that is capable of exploiting massively data parallel computing power on GPUs. The proposed polyline intersection based spatial join technique is integrated into a customized lightweight distributed execution engine that natively supports spatial partitions. We empirically evaluate the performance of the proposed spatial join technique on both a standalone GPU-equipped workstation and Amazon EC2 GPU-accelerated clusters and demonstrate its high performance when comparing with the state-of-the-art.
空间数据量的快速增长给并行和分布式计算环境下的高性能空间数据处理技术的开发带来了重大挑战。空间连接是获取大规模地理空间数据的重要数据管理技术。虽然一些基于空间分区的分布式空间连接技术已经在现有的大数据系统上实现,但它们无法利用现代商品图形处理单元(gpu)提供的大规模数据并行计算能力。在本研究中,作为我们在gpu上开发高性能空间连接技术的研究计划的重要组成部分,我们设计并实现了一种基于多线交集的空间连接技术,该技术能够利用gpu上的大规模数据并行计算能力。所提出的基于折线交集的空间连接技术被集成到一个定制的轻量级分布式执行引擎中,该引擎本身支持空间分区。我们在配备独立gpu的工作站和Amazon EC2 gpu加速集群上对所提出的空间连接技术的性能进行了实证评估,并在与最先进的技术进行比较时展示了其高性能。
{"title":"High-performance polyline intersection based spatial join on GPU-accelerated clusters","authors":"Simin You, Jianting Zhang, L. Gruenwald","doi":"10.1145/3006386.3006390","DOIUrl":"https://doi.org/10.1145/3006386.3006390","url":null,"abstract":"The rapid growing volumes of spatial data have brought significant challenges on developing high-performance spatial data processing techniques in parallel and distributed computing environments. Spatial joins are important data management techniques in gaining insights from large-scale geospatial data. While several distributed spatial join techniques based on spatial partitions have been implemented on top of existing Big Data systems, they are not capable of natively exploiting massively data parallel computing power provided by modern commodity Graphics Processing Units (GPUs). In this study, as an important component of our research initiative in developing high-performance spatial join techniques on GPUs, we have designed and implemented a polyline intersection based spatial join technique that is capable of exploiting massively data parallel computing power on GPUs. The proposed polyline intersection based spatial join technique is integrated into a customized lightweight distributed execution engine that natively supports spatial partitions. We empirically evaluate the performance of the proposed spatial join technique on both a standalone GPU-equipped workstation and Amazon EC2 GPU-accelerated clusters and demonstrate its high performance when comparing with the state-of-the-art.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132003670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Building knowledge graph from public data for predictive analysis: a case study on predicting technology future in space and time 从公共数据构建用于预测分析的知识图谱:预测技术未来时空的案例研究
Pub Date : 2016-10-31 DOI: 10.1145/3006386.3006388
Weiwei Duan, Yao-Yi Chiang
A domain expert can process heterogeneous data to make meaningful interpretations or predictions from the data. For example, by looking at research papers and patent records, an expert can determine the maturity of an emerging technology and predict the geographic location(s) and time (e.g., in a certain year) where and when the technology will be a success. However, this is an expert- and manual-intensive task. This paper presents an end-to-end system that integrates heterogeneous data sources into a knowledge graph in the RDF (Resource Description Framework) format using an ontology. Then the user can easily query the knowledge graph to prepare the required data for different types of predictive analysis tools. We show a case study of predicting the (geographic) center(s) of fuel cell technologies using data collected from public sources to demonstrate the feasibility of our system. The system extracts, cleanses, and augments data from public sources including research papers and patent records. Next, the system uses an ontology-based data integration method to generate knowledge graphs in the RDF format to enable users to switch quickly between machine learning models for predictive analytic tasks. We tested the system using the Support Vector Machine and Multiple Hidden Markov Models and achieved 66.7% and 83.3% accuracy on the city and year levels of spatial and temporal resolutions, respectively.
领域专家可以处理异构数据,从数据中做出有意义的解释或预测。例如,通过查看研究论文和专利记录,专家可以确定一项新兴技术的成熟度,并预测该技术将在何时何地取得成功的地理位置和时间(例如,在某一年)。然而,这是一项专家和手工密集型的任务。本文提出了一个端到端系统,该系统使用本体将异构数据源集成到RDF(资源描述框架)格式的知识图谱中。然后,用户可以方便地查询知识图谱,为不同类型的预测分析工具准备所需的数据。我们展示了一个案例研究,使用从公共来源收集的数据来预测燃料电池技术的(地理)中心,以证明我们系统的可行性。该系统从包括研究论文和专利记录在内的公共资源中提取、清理和增加数据。接下来,系统使用基于本体的数据集成方法生成RDF格式的知识图,使用户能够在预测分析任务的机器学习模型之间快速切换。我们使用支持向量机和多重隐马尔可夫模型对系统进行了测试,在城市和年份的空间和时间分辨率上分别达到了66.7%和83.3%的准确率。
{"title":"Building knowledge graph from public data for predictive analysis: a case study on predicting technology future in space and time","authors":"Weiwei Duan, Yao-Yi Chiang","doi":"10.1145/3006386.3006388","DOIUrl":"https://doi.org/10.1145/3006386.3006388","url":null,"abstract":"A domain expert can process heterogeneous data to make meaningful interpretations or predictions from the data. For example, by looking at research papers and patent records, an expert can determine the maturity of an emerging technology and predict the geographic location(s) and time (e.g., in a certain year) where and when the technology will be a success. However, this is an expert- and manual-intensive task. This paper presents an end-to-end system that integrates heterogeneous data sources into a knowledge graph in the RDF (Resource Description Framework) format using an ontology. Then the user can easily query the knowledge graph to prepare the required data for different types of predictive analysis tools. We show a case study of predicting the (geographic) center(s) of fuel cell technologies using data collected from public sources to demonstrate the feasibility of our system. The system extracts, cleanses, and augments data from public sources including research papers and patent records. Next, the system uses an ontology-based data integration method to generate knowledge graphs in the RDF format to enable users to switch quickly between machine learning models for predictive analytic tasks. We tested the system using the Support Vector Machine and Multiple Hidden Markov Models and achieved 66.7% and 83.3% accuracy on the city and year levels of spatial and temporal resolutions, respectively.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114461338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Agent based urban growth modeling framework on Apache Spark 基于Agent的Apache Spark城市增长建模框架
Pub Date : 2016-10-31 DOI: 10.1145/3006386.3007610
Qiang Zhang, Ranga Raju Vatsavai, Ashwin Shashidharan, D. Berkel
The simulation of urban growth is an important part of urban planning and development. Due to large data and computational challenges, urban growth simulation models demand efficient data analytic frameworks for scaling them to large geographic regions. Agent-based models are widely used to observe and analyze the urban growth simulation at various scales. The incorporation of the agent-based model makes the scaling task even harder due to communication and coordination among agents. Many existing agent-based model frameworks were implemented using traditional shared and distributed memory programming models. On the other hand, Apache Spark is becoming a popular platform for distributed big data in-memory analytics. This paper presents an implementation of agent-based sub-model in Apache Spark framework. With the in-memory computation, Spark implementation outperforms the traditional distributed memory implementation using MPI. This paper provides (i) an overview of our framework capable of running urban growth simulations at a fine resolution of 30 meter grid cells, (ii) a scalable approach using Apache Spark to implement an agent-based model for simulating human decisions, and (iii) the comparative analysis of performance of Apache Spark and MPI based implementations.
城市增长模拟是城市规划与发展的重要组成部分。由于大数据和计算挑战,城市增长模拟模型需要有效的数据分析框架,以便将其扩展到更大的地理区域。基于agent的模型被广泛用于观察和分析不同尺度的城市增长模拟。基于智能体的模型的引入,由于智能体之间的沟通和协调,使得扩展任务变得更加困难。许多现有的基于代理的模型框架是使用传统的共享和分布式内存编程模型实现的。另一方面,Apache Spark正在成为分布式大数据内存分析的流行平台。本文介绍了基于agent的子模型在Apache Spark框架中的实现。在内存计算方面,Spark实现优于传统的使用MPI的分布式内存实现。本文提供(i)概述我们的框架,该框架能够在30米网格单元的精细分辨率下运行城市增长模拟,(ii)使用Apache Spark实现基于代理的模型来模拟人类决策的可扩展方法,以及(iii)对Apache Spark和基于MPI的实现的性能进行比较分析。
{"title":"Agent based urban growth modeling framework on Apache Spark","authors":"Qiang Zhang, Ranga Raju Vatsavai, Ashwin Shashidharan, D. Berkel","doi":"10.1145/3006386.3007610","DOIUrl":"https://doi.org/10.1145/3006386.3007610","url":null,"abstract":"The simulation of urban growth is an important part of urban planning and development. Due to large data and computational challenges, urban growth simulation models demand efficient data analytic frameworks for scaling them to large geographic regions. Agent-based models are widely used to observe and analyze the urban growth simulation at various scales. The incorporation of the agent-based model makes the scaling task even harder due to communication and coordination among agents. Many existing agent-based model frameworks were implemented using traditional shared and distributed memory programming models. On the other hand, Apache Spark is becoming a popular platform for distributed big data in-memory analytics. This paper presents an implementation of agent-based sub-model in Apache Spark framework. With the in-memory computation, Spark implementation outperforms the traditional distributed memory implementation using MPI. This paper provides (i) an overview of our framework capable of running urban growth simulations at a fine resolution of 30 meter grid cells, (ii) a scalable approach using Apache Spark to implement an agent-based model for simulating human decisions, and (iii) the comparative analysis of performance of Apache Spark and MPI based implementations.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127563550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Big earth observation data analytics: matching requirements to system architectures 大地球观测数据分析:需求与系统架构的匹配
Pub Date : 2016-10-31 DOI: 10.1145/3006386.3006393
G. Câmara, L. F. Assis, G. R. Queiroz, K. Ferreira, E. Llapa, L. Vinhas
Earth observation satellites produce petabytes of geospatial data. To manage large data sets, researchers need stable and efficient solutions that support their analytical tasks. Since the technology for big data handling is evolving rapidly, researchers find it hard to keep up with the new developments. To lower this burden, we argue that researchers should not have to convert their algorithms to specialised environments. Imposing a new API to researchers is counterproductive and slows down progress on big data analytics. This paper assesses the cost of research-friendliness, in a case where the researcher has developed an algorithm in the R language and wants to use the same code for big data analytics. We take an algorithm for remote sensing time series analysis on compare it use on map/reduce and on array database architectures. While the performance of the algorithm for big data sets is similar, organising image data for processing in Hadoop is more complicated and time-consuming than handling images in SciDB. Therefore, the combination of the array database SciDB and the R language offers an adequate support for researchers working on big Earth observation data analytics.
地球观测卫星产生pb级的地理空间数据。为了管理大型数据集,研究人员需要稳定高效的解决方案来支持他们的分析任务。由于大数据处理技术正在迅速发展,研究人员发现很难跟上新的发展。为了减轻这种负担,我们认为研究人员不应该将他们的算法转换为专门的环境。给研究人员强加新的API只会适得其反,而且会减缓大数据分析的进展。本文评估了研究友好性的成本,在一个研究人员用R语言开发了一个算法,并希望使用相同的代码进行大数据分析的情况下。本文提出了一种用于遥感时间序列分析的算法,并对其在map/reduce和阵列数据库架构上的应用进行了比较。虽然大数据集的算法性能相似,但在Hadoop中组织图像数据进行处理比在SciDB中处理图像更复杂,也更耗时。因此,阵列数据库SciDB与R语言的结合为从事地球观测大数据分析的研究人员提供了足够的支持。
{"title":"Big earth observation data analytics: matching requirements to system architectures","authors":"G. Câmara, L. F. Assis, G. R. Queiroz, K. Ferreira, E. Llapa, L. Vinhas","doi":"10.1145/3006386.3006393","DOIUrl":"https://doi.org/10.1145/3006386.3006393","url":null,"abstract":"Earth observation satellites produce petabytes of geospatial data. To manage large data sets, researchers need stable and efficient solutions that support their analytical tasks. Since the technology for big data handling is evolving rapidly, researchers find it hard to keep up with the new developments. To lower this burden, we argue that researchers should not have to convert their algorithms to specialised environments. Imposing a new API to researchers is counterproductive and slows down progress on big data analytics. This paper assesses the cost of research-friendliness, in a case where the researcher has developed an algorithm in the R language and wants to use the same code for big data analytics. We take an algorithm for remote sensing time series analysis on compare it use on map/reduce and on array database architectures. While the performance of the algorithm for big data sets is similar, organising image data for processing in Hadoop is more complicated and time-consuming than handling images in SciDB. Therefore, the combination of the array database SciDB and the R language offers an adequate support for researchers working on big Earth observation data analytics.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128180423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Discovering persistent change windows in spatiotemporal datasets: a summary of results 发现时空数据集中的持续变化窗口:结果摘要
Pub Date : 2013-11-04 DOI: 10.1145/2534921.2534928
Xun Zhou, S. Shekhar, Dev Oliver
Given a region S comprised of locations that each have a time series of length |T|, the Persistent Change Windows (PCW) discovery problem aims to find all spatial window and temporal interval pairs that exhibit persistent change of attribute values over time. PCW discovery is important for critical societal applications such as detecting desertification, deforestation, and monitoring urban sprawl. The PCW discovery problem is challenging due to the large number of candidate patterns, the lack of monotonicity where sub-regions of a PCW may not show persistent change, the lack of predefined window sizes for the ST windows, and large datasets of detailed resolution and high volume, i.e., spatial big data. Previous approaches in ST change footprint discovery have focused on local spatial footprints for persistent change discovery and may not guarantee completeness. In contrast, we propose a space-time window enumeration and pruning (SWEP) approach that considers zonal spatial footprints when finding persistent change patterns. We provide theoretical analysis of SWEP's correctness, completeness, and space-time complexity. We also present a case study on vegetation data that demonstrates the usefulness of the proposed approach. Experimental evaluation on synthetic data show that the SWEP approach is orders of magnitude faster than the naive approach.
给定一个由位置组成的区域S,每个位置都有一个长度为T的时间序列,持续变化窗口(PCW)发现问题的目的是找到所有的空间窗口和时间间隔对,这些空间窗口和时间间隔对随着时间的推移表现出属性值的持续变化。PCW的发现对于检测荒漠化、森林砍伐和监测城市扩张等关键社会应用具有重要意义。由于大量的候选模式,缺乏单调性(PCW的子区域可能不会显示持续变化),缺乏ST窗口的预定义窗口大小,以及详细分辨率和高容量的大型数据集(即空间大数据),PCW发现问题具有挑战性。以前的ST变更足迹发现方法主要集中在局部空间足迹上,用于持久的变更发现,可能不能保证完整性。相比之下,我们提出了一种时空窗口枚举和修剪(SWEP)方法,该方法在寻找持续变化模式时考虑了区域空间足迹。从理论上分析了SWEP的正确性、完备性和时空复杂性。我们还提出了一个关于植被数据的案例研究,以证明所提出方法的有效性。综合数据的实验评价表明,该方法比原始方法快几个数量级。
{"title":"Discovering persistent change windows in spatiotemporal datasets: a summary of results","authors":"Xun Zhou, S. Shekhar, Dev Oliver","doi":"10.1145/2534921.2534928","DOIUrl":"https://doi.org/10.1145/2534921.2534928","url":null,"abstract":"Given a region S comprised of locations that each have a time series of length |T|, the Persistent Change Windows (PCW) discovery problem aims to find all spatial window and temporal interval pairs <Si, Ti> that exhibit persistent change of attribute values over time. PCW discovery is important for critical societal applications such as detecting desertification, deforestation, and monitoring urban sprawl. The PCW discovery problem is challenging due to the large number of candidate patterns, the lack of monotonicity where sub-regions of a PCW may not show persistent change, the lack of predefined window sizes for the ST windows, and large datasets of detailed resolution and high volume, i.e., spatial big data. Previous approaches in ST change footprint discovery have focused on local spatial footprints for persistent change discovery and may not guarantee completeness. In contrast, we propose a space-time window enumeration and pruning (SWEP) approach that considers zonal spatial footprints when finding persistent change patterns. We provide theoretical analysis of SWEP's correctness, completeness, and space-time complexity. We also present a case study on vegetation data that demonstrates the usefulness of the proposed approach. Experimental evaluation on synthetic data show that the SWEP approach is orders of magnitude faster than the naive approach.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117186161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Algorithms for fundamental spatial aggregate operations over regions 基于区域的基本空间聚合操作算法
Pub Date : 2013-11-04 DOI: 10.1145/2534921.2534930
Mark McKenney, Brian Olsen
Aggregate operators are a useful class of operators in relational databases. In this paper, we examine spatial aggregate operators over regions. Spatial aggregates are defined to operate over a set of regions, and return a single region as a result. We systematically identify individual spatial aggregate operations by extending existing spatial operations into aggregate form. Semantic meaning for each operator is defined over a specified data model. Once defined, algorithms for computing spatial aggregates over regions are provided. We show that all proposed aggregates can be computed using a single algorithm. Furthermore, we provide serial and parallel algorithm constructions that can take advantage of vector co-processors, such as graphical processing units (GPUs), and that can be integrated into map/reduce queries to take advantage of big data-style clusters. Example queries and their results are provided.
聚合操作符是关系数据库中有用的一类操作符。在本文中,我们研究了区域上的空间聚合算子。空间聚合被定义为在一组区域上操作,并返回单个区域作为结果。我们通过将现有的空间操作扩展为聚合形式,系统地识别单个空间聚合操作。每个操作符的语义是在指定的数据模型上定义的。一旦定义,就提供了计算区域空间聚合的算法。我们表明,所有提出的聚合可以使用单一算法计算。此外,我们提供了串行和并行算法结构,可以利用矢量协处理器,如图形处理单元(gpu),并且可以集成到map/reduce查询中,以利用大数据风格的集群。提供了示例查询及其结果。
{"title":"Algorithms for fundamental spatial aggregate operations over regions","authors":"Mark McKenney, Brian Olsen","doi":"10.1145/2534921.2534930","DOIUrl":"https://doi.org/10.1145/2534921.2534930","url":null,"abstract":"Aggregate operators are a useful class of operators in relational databases. In this paper, we examine spatial aggregate operators over regions. Spatial aggregates are defined to operate over a set of regions, and return a single region as a result. We systematically identify individual spatial aggregate operations by extending existing spatial operations into aggregate form. Semantic meaning for each operator is defined over a specified data model. Once defined, algorithms for computing spatial aggregates over regions are provided. We show that all proposed aggregates can be computed using a single algorithm. Furthermore, we provide serial and parallel algorithm constructions that can take advantage of vector co-processors, such as graphical processing units (GPUs), and that can be integrated into map/reduce queries to take advantage of big data-style clusters. Example queries and their results are provided.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"559 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124174957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
International Workshop on Analytics for Big Geospatial Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1