17th International Symposium on Spatial and Temporal Databases最新文献_第2页

NALMO: A Natural Language Interface for Moving Objects Databases 移动对象数据库的自然语言接口

17th International Symposium on Spatial and Temporal Databases

Pub Date : 2021-08-23 DOI: 10.1145/3469830.3470894

Xieyang Wang, Jianqiu Xu, Hua Lu

Moving objects databases (MODs) have been extensively studied due to their wide variety of applications including traffic management, tourist service and mobile commerce. However, queries in natural languages are still not supported in MODs. Since most users are not familiar with structured query languages, it is essentially important to bridge the gap between natural languages and the underlying MODs system commands. Motivated by this, we design a natural language interface for moving objects, named NALMO. In general, we use semantic parsing in combination with a location knowledge base and domain-specific rules to interpret natural language queries. We design a corpus of moving objects queries for model training, which is later used to determine the query type. Extracted entities from parsing are mapped through deterministic rules to perform query composition. NALMO is able to well translate moving objects queries into structured (executable) languages. We support four kinds of queries including time interval queries, range queries, nearest neighbor queries and trajectory similarity queries. We develop the system in a prototype system SECONDO and evaluate our approach using 240 natural language queries extracted from popular conference and journal papers in the domain of moving objects. Experimental results show that (i) NALMO achieves accuracy and precision 98.1 and 88.1, respectively, and (ii) the average time cost of translating a query is 1.47s.

移动对象数据库(MODs)由于其在交通管理、旅游服务和移动商务等领域的广泛应用而得到了广泛的研究。然而，mod中仍然不支持自然语言的查询。由于大多数用户不熟悉结构化查询语言，因此在自然语言和底层mod系统命令之间架起桥梁非常重要。受此启发，我们设计了一个用于移动物体的自然语言界面，命名为NALMO。通常，我们将语义解析与位置知识库和特定于领域的规则结合使用来解释自然语言查询。我们设计了一个用于模型训练的移动对象查询语料库，该语料库随后用于确定查询类型。从解析中提取的实体通过确定性规则进行映射，以执行查询组合。NALMO能够很好地将移动对象查询转换为结构化(可执行)语言。我们支持四种查询，包括时间间隔查询、范围查询、最近邻查询和轨迹相似性查询。我们在一个原型系统SECONDO中开发了该系统，并使用240个从移动物体领域的流行会议和期刊论文中提取的自然语言查询来评估我们的方法。实验结果表明:(1)NALMO的准确率和精密度分别达到98.1和88.1，(2)查询翻译的平均时间成本为1.47s。

{"title":"NALMO: A Natural Language Interface for Moving Objects Databases","authors":"Xieyang Wang, Jianqiu Xu, Hua Lu","doi":"10.1145/3469830.3470894","DOIUrl":"https://doi.org/10.1145/3469830.3470894","url":null,"abstract":"Moving objects databases (MODs) have been extensively studied due to their wide variety of applications including traffic management, tourist service and mobile commerce. However, queries in natural languages are still not supported in MODs. Since most users are not familiar with structured query languages, it is essentially important to bridge the gap between natural languages and the underlying MODs system commands. Motivated by this, we design a natural language interface for moving objects, named NALMO. In general, we use semantic parsing in combination with a location knowledge base and domain-specific rules to interpret natural language queries. We design a corpus of moving objects queries for model training, which is later used to determine the query type. Extracted entities from parsing are mapped through deterministic rules to perform query composition. NALMO is able to well translate moving objects queries into structured (executable) languages. We support four kinds of queries including time interval queries, range queries, nearest neighbor queries and trajectory similarity queries. We develop the system in a prototype system SECONDO and evaluate our approach using 240 natural language queries extracted from popular conference and journal papers in the domain of moving objects. Experimental results show that (i) NALMO achieves accuracy and precision 98.1 and 88.1, respectively, and (ii) the average time cost of translating a query is 1.47s.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115515231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

CACSE: Context Aware Clustering of Stellar Evolution 恒星演化的上下文感知聚类

17th International Symposium on Spatial and Temporal Databases

Pub Date : 2021-08-18 DOI: 10.1145/3469830.3470916

Xu Teng, Adam Corpstein, Joel Holm, Willis Knox, Becker Mathie, Philip R. O. Payne, Ethan Vander Wiel, Prabin Giri, Goce Trajcevski, A. Dotter, J. Andrews, S. Coughlin, Y. Qin, J. G. Serra-Perez, N. Tran, Jaime Roman-Garja, K. Kovlakas, E. Zapartas, S. Bavera, D. Misra, T. Fragos

We present CACSE – a system for Context Aware Clustering of Stellar Evolution – for datasets corresponding to temporal evolution of stars, which are multivariate time series, usually with a large number of attributes (e.g., ≥ 40). Typically, the datasets are obtained by simulation and are relatively large in size (5 ∼ 10 GB per certain interval of values for various initial conditions). Investigating common evolutionary trends in these datasets often depends on the context – i.e., not all the attributes are always of interest, and among the subset of the context-relevant attributes, some may have more impact than others. To enable such context-aware clustering, our CACSE system provides functionalities allowing the domain experts to dynamically select attributes that matter, and assign desired weights/priorities. Our system consists of a PostgreSQL database, Python-based middleware with RESTful and Django framework, and a web-based user interface as frontend. The user interface provides multiple interactive options, including selection of datasets and preferred attributes along with the corresponding weights. Subsequently, the users can select a time instant or a time range to visualize the formed clusters. Thus, CACSE enables a detection of changes in the the set of clusters (i.e., convoys) of stellar evolution tracks. Current version provides two of the most popular clustering algorithms – k-means and DBSCAN.

我们提出了CACSE——一个恒星演化的上下文感知聚类系统——用于与恒星时间演化相对应的数据集，这些数据集是多元时间序列，通常具有大量属性(例如，≥40)。通常，数据集是通过模拟获得的，并且大小相对较大(在各种初始条件下，每个特定的值间隔为5 ~ 10gb)。在这些数据集中调查共同的进化趋势通常取决于上下文——也就是说，并非所有的属性都是感兴趣的，在与上下文相关的属性的子集中，有些属性可能比其他属性更有影响力。为了实现这种上下文感知的集群，我们的CACSE系统提供了允许领域专家动态选择重要属性的功能，并分配所需的权重/优先级。我们的系统包括一个PostgreSQL数据库，基于python的中间件，RESTful和Django框架，以及一个基于web的用户界面作为前端。用户界面提供了多个交互选项，包括数据集和首选属性的选择以及相应的权重。随后，用户可以选择一个时间瞬间或一个时间范围来可视化形成的集群。因此，CACSE能够探测到恒星演化轨迹的一组星团(即车队)的变化。当前版本提供了两种最流行的聚类算法——k-means和DBSCAN。

{"title":"CACSE: Context Aware Clustering of Stellar Evolution","authors":"Xu Teng, Adam Corpstein, Joel Holm, Willis Knox, Becker Mathie, Philip R. O. Payne, Ethan Vander Wiel, Prabin Giri, Goce Trajcevski, A. Dotter, J. Andrews, S. Coughlin, Y. Qin, J. G. Serra-Perez, N. Tran, Jaime Roman-Garja, K. Kovlakas, E. Zapartas, S. Bavera, D. Misra, T. Fragos","doi":"10.1145/3469830.3470916","DOIUrl":"https://doi.org/10.1145/3469830.3470916","url":null,"abstract":"We present CACSE – a system for Context Aware Clustering of Stellar Evolution – for datasets corresponding to temporal evolution of stars, which are multivariate time series, usually with a large number of attributes (e.g., ≥ 40). Typically, the datasets are obtained by simulation and are relatively large in size (5 ∼ 10 GB per certain interval of values for various initial conditions). Investigating common evolutionary trends in these datasets often depends on the context – i.e., not all the attributes are always of interest, and among the subset of the context-relevant attributes, some may have more impact than others. To enable such context-aware clustering, our CACSE system provides functionalities allowing the domain experts to dynamically select attributes that matter, and assign desired weights/priorities. Our system consists of a PostgreSQL database, Python-based middleware with RESTful and Django framework, and a web-based user interface as frontend. The user interface provides multiple interactive options, including selection of datasets and preferred attributes along with the corresponding weights. Subsequently, the users can select a time instant or a time range to visualize the formed clusters. Thus, CACSE enables a detection of changes in the the set of clusters (i.e., convoys) of stellar evolution tracks. Current version provides two of the most popular clustering algorithms – k-means and DBSCAN.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125777264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Tell Me What Air You Breath, I Tell You Where You Are 告诉我你呼吸什么空气，我告诉你你在哪里

17th International Symposium on Spatial and Temporal Databases

Pub Date : 2021-08-18 DOI: 10.1145/3469830.3470914

Hafsa El Hafyani, Mohammad Abboud, Jingwei Zuo, K. Zeitouni, Y. Taher

Wide spread use of sensors and mobile devices along with the new paradigm of Mobile Crowd-Sensing (MCS), allows monitoring air pollution in urban areas. Several measurements are collected, such as Particulate Matters, Nitrogen dioxide, and others. Mining the context of MCS data in such domains is a key factor for identifying the individuals’ exposure to air pollution, but it is challenging due to the lack or the weakness of predictors. We have previously developed a multi-view learning approach which learns the context solely from the sensor measurements. In this demonstration, we propose a visualization tool (COMIC) showing the different recognized contexts using an improved version of our algorithm. We also demonstrate the change points detected by a multi-dimensional CPD model. We leverage real data from a MCS campaign, and compare different methods.

传感器和移动设备的广泛使用，以及移动人群传感(MCS)的新范例，使监测城市地区的空气污染成为可能。收集了一些测量数据，如颗粒物质、二氧化氮和其他。挖掘这些领域中MCS数据的上下文是确定个人暴露于空气污染的关键因素，但由于预测器的缺乏或薄弱，这是具有挑战性的。我们之前开发了一种多视图学习方法，该方法仅从传感器测量中学习上下文。在这个演示中，我们提出了一个可视化工具(COMIC)，使用我们算法的改进版本来显示不同的识别上下文。我们还演示了由多维CPD模型检测到的变化点。我们利用来自MCS活动的真实数据，并比较不同的方法。

引用次数: 2

A Novel Indexing Method for Spatial-Keyword Range Queries 一种新的空间关键字范围查询索引方法

17th International Symposium on Spatial and Temporal Databases

Pub Date : 2021-08-18 DOI: 10.1145/3469830.3470897

Panagiotis Tampakis, Dimitris Spyrellis, C. Doulkeridis, N. Pelekis, Christos Kalyvas, Akrivi Vlachou

Spatial-keyword queries are important for a wide range of applications that retrieve data based on a combination of keyword search and spatial constraints. However, efficient processing of spatial-keyword queries is not a trivial task because the combination of textual and spatial data results in a high-dimensional representation that is challenging to index effectively. To address this problem, in this paper, we propose a novel indexing scheme for efficient support of spatial-keyword range queries. At the heart of our approach lies a carefully-designed mapping of spatio-textual data to a two-dimensional (2D) space that produces compact partitions of spatio-textual data. In turn, the mapped 2D data can be indexed effectively by traditional spatial data structures, such as an R-tree. We propose bounds, theoretically proven for correctness, that lead to the design of a filter-and-refine algorithm that prunes the search space effectively. In this way, our approach for spatial-keyword range queries is readily applicable to any database system that provides spatial support. In our experimental evaluation, we demonstrate how our algorithm can be implemented over PostgreSQL and exploit its underlying spatial index provided by PostGIS, in order to process spatial-keyword range queries efficiently. Moreover, we show that our solution outperforms different competitor approaches.

空间关键字查询对于基于关键字搜索和空间约束的组合检索数据的广泛应用程序非常重要。然而，有效地处理空间关键字查询并不是一项简单的任务，因为文本和空间数据的组合会产生高维表示，很难有效地建立索引。为了解决这个问题，本文提出了一种新的索引方案，以有效地支持空间关键字范围查询。我们方法的核心是精心设计的空间文本数据到二维(2D)空间的映射，从而产生空间文本数据的紧凑分区。反过来，映射的2D数据可以通过传统的空间数据结构(如r树)有效地索引。我们提出了从理论上证明其正确性的边界，从而设计了一种过滤和精炼算法，可以有效地修剪搜索空间。通过这种方式，我们的空间关键字范围查询方法很容易适用于任何提供空间支持的数据库系统。在我们的实验评估中，我们演示了如何在PostgreSQL上实现我们的算法，并利用PostGIS提供的底层空间索引，以便有效地处理空间关键字范围查询。此外，我们证明了我们的解决方案优于不同的竞争对手的方法。

{"title":"A Novel Indexing Method for Spatial-Keyword Range Queries","authors":"Panagiotis Tampakis, Dimitris Spyrellis, C. Doulkeridis, N. Pelekis, Christos Kalyvas, Akrivi Vlachou","doi":"10.1145/3469830.3470897","DOIUrl":"https://doi.org/10.1145/3469830.3470897","url":null,"abstract":"Spatial-keyword queries are important for a wide range of applications that retrieve data based on a combination of keyword search and spatial constraints. However, efficient processing of spatial-keyword queries is not a trivial task because the combination of textual and spatial data results in a high-dimensional representation that is challenging to index effectively. To address this problem, in this paper, we propose a novel indexing scheme for efficient support of spatial-keyword range queries. At the heart of our approach lies a carefully-designed mapping of spatio-textual data to a two-dimensional (2D) space that produces compact partitions of spatio-textual data. In turn, the mapped 2D data can be indexed effectively by traditional spatial data structures, such as an R-tree. We propose bounds, theoretically proven for correctness, that lead to the design of a filter-and-refine algorithm that prunes the search space effectively. In this way, our approach for spatial-keyword range queries is readily applicable to any database system that provides spatial support. In our experimental evaluation, we demonstrate how our algorithm can be implemented over PostgreSQL and exploit its underlying spatial index provided by PostGIS, in order to process spatial-keyword range queries efficiently. Moreover, we show that our solution outperforms different competitor approaches.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"91 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129252143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Probabilistic Deep Learning for Electric-Vehicle Energy-Use Prediction 基于概率深度学习的电动汽车能耗预测

17th International Symposium on Spatial and Temporal Databases

Pub Date : 2021-08-18 DOI: 10.1145/3469830.3470915

Linas Petkevičius, Simonas Šaltenis, A. Civilis, K. Torp

The continued spread of electric vehicles raises new challenges for the supporting digital infrastructure. For example, long-distance route planning for such vehicles relies on the prediction of both the expected travel time as well as energy use. We envision a two-tier architecture to produce such predictions. First, a routing and travel-time-prediction subsystem generates a suggested route and predicts how the speed will vary along the route. Next, the expected energy use is predicted from the speed profile and other contextual characteristics, such as weather information and slope. To this end, the paper proposes deep-learning models that are built from EV tracking data. First, as the speed profile of a route is one of the main predictors for energy use, different simple ways to build speed profiles are explored. Next, eight different deep-learning models for energy-use prediction are proposed. Four of the models are probabilistic in that they predict not a single-point estimate but parameters of a probability distribution of energy use on the route. This is particularly relevant when predicting EV energy use, which is highly sensitive to many input characteristics and, thus, can hardly be predicted precisely. Extensive experiments with two real-world EV tracking datasets validate the proposed methods. The code for this research has been made available on GitHub.

电动汽车的持续普及对配套的数字化基础设施提出了新的挑战。例如，此类车辆的长途路线规划依赖于对预期行驶时间和能源使用的预测。我们设想一个两层架构来产生这样的预测。首先，路由和行程时间预测子系统生成建议路线，并预测沿路线的速度变化情况。接下来，根据速度剖面和其他背景特征(如天气信息和坡度)预测预期的能源使用。为此，本文提出了基于电动汽车跟踪数据的深度学习模型。首先，由于路线的速度剖面是能源使用的主要预测因素之一，因此探索了建立速度剖面的不同简单方法。接下来，提出了八种不同的能源使用预测深度学习模型。其中四个模型是概率性的，因为它们预测的不是单点估计，而是路线上能源使用概率分布的参数。这在预测电动汽车能源使用时尤为重要，因为电动汽车对许多输入特性高度敏感，因此很难准确预测。两个真实EV跟踪数据集的大量实验验证了所提出的方法。这项研究的代码已经在GitHub上提供。

{"title":"Probabilistic Deep Learning for Electric-Vehicle Energy-Use Prediction","authors":"Linas Petkevičius, Simonas Šaltenis, A. Civilis, K. Torp","doi":"10.1145/3469830.3470915","DOIUrl":"https://doi.org/10.1145/3469830.3470915","url":null,"abstract":"The continued spread of electric vehicles raises new challenges for the supporting digital infrastructure. For example, long-distance route planning for such vehicles relies on the prediction of both the expected travel time as well as energy use. We envision a two-tier architecture to produce such predictions. First, a routing and travel-time-prediction subsystem generates a suggested route and predicts how the speed will vary along the route. Next, the expected energy use is predicted from the speed profile and other contextual characteristics, such as weather information and slope. To this end, the paper proposes deep-learning models that are built from EV tracking data. First, as the speed profile of a route is one of the main predictors for energy use, different simple ways to build speed profiles are explored. Next, eight different deep-learning models for energy-use prediction are proposed. Four of the models are probabilistic in that they predict not a single-point estimate but parameters of a probability distribution of energy use on the route. This is particularly relevant when predicting EV energy use, which is highly sensitive to many input characteristics and, thus, can hardly be predicted precisely. Extensive experiments with two real-world EV tracking datasets validate the proposed methods. The code for this research has been made available on GitHub.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124231371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Geo-Quantities: A Framework for Automatic Extraction of Measurements and Spatial Context from Scientific Documents 地理量:从科学文献中自动提取测量和空间背景的框架

17th International Symposium on Spatial and Temporal Databases

Pub Date : 2021-08-18 DOI: 10.1145/3469830.3470911

T. Petersen, M. A. Suryani, C. Beth, Hardik Patel, K. Wallmann, M. Renz

Quantitative information derived from scientific documents provides an important source of data for studies in almost all domains, however, manual extraction of this information is very time consuming. In this paper we will introduce a system Geo-Quantities that supports the automatic extraction of quantitative, spatial and temporal information of a given measurement entity from scientific literature using text mining techniques. The difficulty of automatic measurement recognition is mainly caused by the diverse expressions in the papers. Geo-Quantities offers an interactive interface for the visualization of extracted user-defined information, in particular spatial and temporal context. In our demonstration, we will showcase the capabilities of our system by retrieving measurements such as “mass accumulation rates” and “sedimentation rates” from scientific publications in the field of marine geology, which could have high impact in studies for building global mass accumulation rate maps. For training and evaluation of Geo-Quantities we use a corpus of domain-relevant papers.

从科学文献中获得的定量信息为几乎所有领域的研究提供了重要的数据来源，然而，人工提取这些信息非常耗时。在本文中，我们将介绍一个系统geo - quantity，它支持使用文本挖掘技术从科学文献中自动提取给定测量实体的定量、空间和时间信息。测量数据自动识别的困难主要是由于论文中表达方式的多样化。geo - quantity提供了一个交互式界面，用于可视化提取的用户定义信息，特别是空间和时间上下文。在我们的演示中，我们将通过从海洋地质领域的科学出版物中检索“质量积累率”和“沉积率”等测量值来展示我们系统的功能，这可能对构建全球质量积累率地图的研究产生重大影响。为了训练和评估geo - quantity，我们使用领域相关论文的语料库。

{"title":"Geo-Quantities: A Framework for Automatic Extraction of Measurements and Spatial Context from Scientific Documents","authors":"T. Petersen, M. A. Suryani, C. Beth, Hardik Patel, K. Wallmann, M. Renz","doi":"10.1145/3469830.3470911","DOIUrl":"https://doi.org/10.1145/3469830.3470911","url":null,"abstract":"Quantitative information derived from scientific documents provides an important source of data for studies in almost all domains, however, manual extraction of this information is very time consuming. In this paper we will introduce a system Geo-Quantities that supports the automatic extraction of quantitative, spatial and temporal information of a given measurement entity from scientific literature using text mining techniques. The difficulty of automatic measurement recognition is mainly caused by the diverse expressions in the papers. Geo-Quantities offers an interactive interface for the visualization of extracted user-defined information, in particular spatial and temporal context. In our demonstration, we will showcase the capabilities of our system by retrieving measurements such as “mass accumulation rates” and “sedimentation rates” from scientific publications in the field of marine geology, which could have high impact in studies for building global mass accumulation rate maps. For training and evaluation of Geo-Quantities we use a corpus of domain-relevant papers.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"119 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114112890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Mining High Resolution Earth Observation Data Cubes 挖掘高分辨率地球观测数据立方体

17th International Symposium on Spatial and Temporal Databases

Pub Date : 2021-08-18 DOI: 10.1145/3469830.3470917

Andreas Zuefle, K. Wessels, D. Pfoser

Earth observation data is collected by ever-expanding fleets of satellites including Landsat1-8, Sentinel1 & Sentinel2, SPOT1-7 and WorldView1-3. These satellites generate at spatial resolutions (pixel size) from 30m to 31cm and provide revisit rates of as frequent as every 5 days. This allows us not only to look at high-resolution images of every corner of the Earth, but also to track events and observe change over time. During the past 5 years, medium spatial resolution satellite data (30 − 10m pixels) have developed very high temporal revisit frequencies of 5-16 days and spatial-temporal structures have been developed to manage these vast data sets. However, high resolution satellite images and rapidly increasing revisit rates create major data management and mining challenges. This work discusses six challenges of integrating observations at different times, from different sensors, at different spatial resolutions and different temporal frequencies into a unified Earth Observation Data Cube, that is, a tensor of location, time, and spectral bands. Challenges include creating a unified data cube from heterogeneous sensors, scaling geo-registration (mapping pixel between images), accounting for uncertainty across observations, imputing missing observations, broad area event detection, and ultimately, predicting the future state of our planet. With such a unified Earth Observation Data Cube in place, we describe potential application areas such as detecting anthropogenic land cover change, early warning of natural hazards, tracing movement of animals, finding missing airplanes, and rapid detection of forest fires.

地球观测数据是由不断扩大的卫星群收集的，包括Landsat1-8、Sentinel1和Sentinel2、SPOT1-7和WorldView1-3。这些卫星的空间分辨率(像素大小)从30米到31厘米不等，并提供每5天一次的重访率。这使我们不仅可以看到地球每个角落的高分辨率图像，还可以跟踪事件并观察随时间的变化。在过去的5年中，中等空间分辨率卫星数据(30 - 10m像素)已经发展出非常高的5-16天的时间重访频率，并且已经开发出时空结构来管理这些庞大的数据集。然而，高分辨率卫星图像和快速增加的重访率带来了重大的数据管理和挖掘挑战。本文讨论了将不同时间、不同传感器、不同空间分辨率和不同时间频率的观测数据整合到统一的地球观测数据立方体(即位置、时间和光谱波段张量)中的六个挑战。挑战包括从异构传感器创建统一的数据立方体，缩放地理配准(图像之间的映射像素)，考虑观测结果之间的不确定性，输入缺失的观测结果，广域事件检测，以及最终预测我们星球的未来状态。有了这样一个统一的地球观测数据立方体，我们描述了潜在的应用领域，如检测人为的土地覆盖变化、自然灾害的早期预警、追踪动物的运动、寻找失踪的飞机和快速检测森林火灾。

{"title":"Mining High Resolution Earth Observation Data Cubes","authors":"Andreas Zuefle, K. Wessels, D. Pfoser","doi":"10.1145/3469830.3470917","DOIUrl":"https://doi.org/10.1145/3469830.3470917","url":null,"abstract":"Earth observation data is collected by ever-expanding fleets of satellites including Landsat1-8, Sentinel1 & Sentinel2, SPOT1-7 and WorldView1-3. These satellites generate at spatial resolutions (pixel size) from 30m to 31cm and provide revisit rates of as frequent as every 5 days. This allows us not only to look at high-resolution images of every corner of the Earth, but also to track events and observe change over time. During the past 5 years, medium spatial resolution satellite data (30 − 10m pixels) have developed very high temporal revisit frequencies of 5-16 days and spatial-temporal structures have been developed to manage these vast data sets. However, high resolution satellite images and rapidly increasing revisit rates create major data management and mining challenges. This work discusses six challenges of integrating observations at different times, from different sensors, at different spatial resolutions and different temporal frequencies into a unified Earth Observation Data Cube, that is, a tensor of location, time, and spectral bands. Challenges include creating a unified data cube from heterogeneous sensors, scaling geo-registration (mapping pixel between images), accounting for uncertainty across observations, imputing missing observations, broad area event detection, and ultimately, predicting the future state of our planet. With such a unified Earth Observation Data Cube in place, we describe potential application areas such as detecting anthropogenic land cover change, early warning of natural hazards, tracing movement of animals, finding missing airplanes, and rapid detection of forest fires.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125634000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Privacy-Preserving Synthetic Location Data in the Real World 现实世界中保护隐私的合成位置数据

17th International Symposium on Spatial and Temporal Databases

Pub Date : 2021-08-04 DOI: 10.1145/3469830.3470893

Teddy Cunningham, Graham Cormode, H. Ferhatosmanoğlu

Sharing sensitive data is vital in enabling many modern data analysis and machine learning tasks. However, current methods for data release are insufficiently accurate or granular to provide meaningful utility, and they carry a high risk of deanonymization or membership inference attacks. In this paper, we propose a differentially private synthetic data generation solution with a focus on the compelling domain of location data. We present two methods with high practical utility for generating synthetic location data from real locations, both of which protect the existence and true location of each individual in the original dataset. Our first, partitioning-based approach introduces a novel method for privately generating point data using kernel density estimation, in addition to employing private adaptations of classic statistical techniques, such as clustering, for private partitioning. Our second, network-based approach incorporates public geographic information, such as the road network of a city, to constrain the bounds of synthetic data points and hence improve the accuracy of the synthetic data. Both methods satisfy the requirements of differential privacy, while also enabling accurate generation of synthetic data that aims to preserve the distribution of the real locations. We conduct experiments using three large-scale location datasets to show that the proposed solutions generate synthetic location data with high utility and strong similarity to the real datasets. We highlight some practical applications for our work by applying our synthetic data to a range of location analytics queries, and we demonstrate that our synthetic data produces near-identical answers to the same queries compared to when real data is used. Our results show that the proposed approaches are practical solutions for sharing and analyzing sensitive location data privately.

共享敏感数据对于实现许多现代数据分析和机器学习任务至关重要。然而，目前的数据发布方法不够精确或粒度，无法提供有意义的实用程序，并且它们具有去匿名化或成员推理攻击的高风险。在本文中，我们提出了一种不同的私有合成数据生成解决方案，重点关注引人注目的位置数据领域。我们提出了两种实用的方法来从真实位置生成合成位置数据，这两种方法都保护了原始数据集中每个个体的存在性和真实位置。我们的第一种基于分区的方法引入了一种新方法，用于使用核密度估计私下生成点数据，此外还采用了经典统计技术(如聚类)的私下适应，用于私下分区。我们的第二种基于网络的方法结合了公共地理信息，如城市的道路网络，以约束合成数据点的边界，从而提高合成数据的准确性。这两种方法都满足了差异隐私的要求，同时也能够准确地生成旨在保留真实位置分布的合成数据。利用三个大规模的位置数据集进行了实验，结果表明所提出的解决方案生成的综合位置数据具有较高的实用性，且与真实数据集具有较强的相似性。通过将合成数据应用于一系列位置分析查询，我们重点介绍了我们工作中的一些实际应用，并演示了与使用真实数据相比，我们的合成数据对相同的查询产生了几乎相同的答案。我们的研究结果表明，所提出的方法是私密共享和分析敏感位置数据的实用解决方案。

{"title":"Privacy-Preserving Synthetic Location Data in the Real World","authors":"Teddy Cunningham, Graham Cormode, H. Ferhatosmanoğlu","doi":"10.1145/3469830.3470893","DOIUrl":"https://doi.org/10.1145/3469830.3470893","url":null,"abstract":"Sharing sensitive data is vital in enabling many modern data analysis and machine learning tasks. However, current methods for data release are insufficiently accurate or granular to provide meaningful utility, and they carry a high risk of deanonymization or membership inference attacks. In this paper, we propose a differentially private synthetic data generation solution with a focus on the compelling domain of location data. We present two methods with high practical utility for generating synthetic location data from real locations, both of which protect the existence and true location of each individual in the original dataset. Our first, partitioning-based approach introduces a novel method for privately generating point data using kernel density estimation, in addition to employing private adaptations of classic statistical techniques, such as clustering, for private partitioning. Our second, network-based approach incorporates public geographic information, such as the road network of a city, to constrain the bounds of synthetic data points and hence improve the accuracy of the synthetic data. Both methods satisfy the requirements of differential privacy, while also enabling accurate generation of synthetic data that aims to preserve the distribution of the real locations. We conduct experiments using three large-scale location datasets to show that the proposed solutions generate synthetic location data with high utility and strong similarity to the real datasets. We highlight some practical applications for our work by applying our synthetic data to a range of location analytics queries, and we demonstrate that our synthetic data produces near-identical answers to the same queries compared to when real data is used. Our results show that the proposed approaches are practical solutions for sharing and analyzing sensitive location data privately.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132590728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Sequential Recommendation in Online Games with Multiple Sequences, Tasks and User Levels 具有多个序列、任务和用户级别的在线游戏的顺序推荐

17th International Symposium on Spatial and Temporal Databases

Pub Date : 2021-02-13 DOI: 10.1145/3469830.3470906

Si Chen, Yuqiu Qian, Hui Li, Chen Lin

Online gaming is growing faster than ever before, with increasing challenges of providing better user experience. Recommender systems (RS) for online games face unique challenges since they must fulfill players’ distinct desires, at different user levels, based on their action sequences of various action types. Although many sequential RS already exist, they are mainly single-sequence, single-task, and single-user-level. In this paper, we introduce a new sequential recommendation model for multiple sequences, multiple tasks, and multiple user levels (abbreviated as M3Rec) in Tencent Games platform, which can fully utilize complex data in online games. We leverage Graph Neural Network and multi-task learning to design M3Rec in order to model the complex information in the heterogeneous sequential recommendation scenario of Tencent Games. We verify the effectiveness of M3Rec on three online games of Tencent Games platform, in both offline and online evaluations. The results show that M3Rec successfully addresses the challenges of recommendation in online games, and it generates superior recommendations compared with state-of-the-art sequential recommendation approaches.

在线游戏的发展速度比以往任何时候都要快，提供更好的用户体验的挑战也越来越大。在线游戏的推荐系统(RS)面临着独特的挑战，因为它们必须满足不同用户级别、不同动作类型的玩家的不同需求。虽然已经存在许多顺序RS，但它们主要是单序列、单任务和单用户级的。在本文中，我们引入了一种新的基于腾讯游戏平台的多序列、多任务、多用户级别的顺序推荐模型(简称M3Rec)，该模型可以充分利用网络游戏中的复杂数据。我们利用图神经网络和多任务学习来设计M3Rec，以便对腾讯游戏异构顺序推荐场景中的复杂信息进行建模。我们在腾讯游戏平台的三款网络游戏上验证了M3Rec的有效性，包括离线和在线评估。结果表明，M3Rec成功地解决了在线游戏中推荐的挑战，与最先进的顺序推荐方法相比，它产生了更好的推荐。

{"title":"Sequential Recommendation in Online Games with Multiple Sequences, Tasks and User Levels","authors":"Si Chen, Yuqiu Qian, Hui Li, Chen Lin","doi":"10.1145/3469830.3470906","DOIUrl":"https://doi.org/10.1145/3469830.3470906","url":null,"abstract":"Online gaming is growing faster than ever before, with increasing challenges of providing better user experience. Recommender systems (RS) for online games face unique challenges since they must fulfill players’ distinct desires, at different user levels, based on their action sequences of various action types. Although many sequential RS already exist, they are mainly single-sequence, single-task, and single-user-level. In this paper, we introduce a new sequential recommendation model for multiple sequences, multiple tasks, and multiple user levels (abbreviated as M3Rec) in Tencent Games platform, which can fully utilize complex data in online games. We leverage Graph Neural Network and multi-task learning to design M3Rec in order to model the complex information in the heterogeneous sequential recommendation scenario of Tencent Games. We verify the effectiveness of M3Rec on three online games of Tencent Games platform, in both offline and online evaluations. The results show that M3Rec successfully addresses the challenges of recommendation in online games, and it generates superior recommendations compared with state-of-the-art sequential recommendation approaches.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131833186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1