A spatial skyline query is a query to find a set of data points that are not spatially dominated by other data points, given a set of data points P and query points Q in a multidimensional space. The query enumerates the skyline points based on distance in a multidimensional space. However, existing spatial skyline queries can lead to large errors with actual travel distances in geo-spaces because the query is based on the Euclidean distance. We propose a spatial skyline query on triangulated irregular networks (TINs), which are frequently used to represent the surfaces of terrain. We define a new spatial skyline query based on more accurate travel distances considering the TIN distance instead of the Euclidean distance. We also propose an efficient solution method using indexes to find nearest-neighbor points in TIN space and reduce the numbers of unnecessary data points and TIN vertices. The proposed method achieves a computational complexity of O(|P′||Q|N′2 + |P′|2|Q|), where P′ and N′ are the reduced sets of data points and number of TIN vertices, respectively, based on the range of query points. The proposed method can process a query faster than the naive method with Θ(|P||Q|N2 + |P|2|Q|), where N is the number of TIN vertices. Moreover, experiments verify that the proposed method is faster than the naive method by using a spatial index to reduce the numbers of unnecessary data points and TIN vertices.
{"title":"Spatial Skyline Queries on Triangulated Irregular Networks","authors":"Yuta Kasai, Kento Sugiura, Y. Ishikawa","doi":"10.1145/3469830.3470901","DOIUrl":"https://doi.org/10.1145/3469830.3470901","url":null,"abstract":"A spatial skyline query is a query to find a set of data points that are not spatially dominated by other data points, given a set of data points P and query points Q in a multidimensional space. The query enumerates the skyline points based on distance in a multidimensional space. However, existing spatial skyline queries can lead to large errors with actual travel distances in geo-spaces because the query is based on the Euclidean distance. We propose a spatial skyline query on triangulated irregular networks (TINs), which are frequently used to represent the surfaces of terrain. We define a new spatial skyline query based on more accurate travel distances considering the TIN distance instead of the Euclidean distance. We also propose an efficient solution method using indexes to find nearest-neighbor points in TIN space and reduce the numbers of unnecessary data points and TIN vertices. The proposed method achieves a computational complexity of O(|P′||Q|N′2 + |P′|2|Q|), where P′ and N′ are the reduced sets of data points and number of TIN vertices, respectively, based on the range of query points. The proposed method can process a query faster than the naive method with Θ(|P||Q|N2 + |P|2|Q|), where N is the number of TIN vertices. Moreover, experiments verify that the proposed method is faster than the naive method by using a spatial index to reduce the numbers of unnecessary data points and TIN vertices.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116040283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdullah AlDwyish, E. Tanin, Hairuo Xie, S. Karunasekera, K. Ramamohanarao
Traffic forecasting plays a vital role in traffic management systems. Recently, deep learning models have been applied to citywide traffic forecasting. However, the existing work models and predicts traffic at a single (dense) resolution, making it challenging to capture long-range spatial dependencies or high-level traffic dynamics. This shortcoming limits the accuracy of prediction and results in computationally expensive models. We propose a traffic forecasting model based on deep convolutional networks to improve the accuracy of citywide traffic forecasting. Our model uses a hierarchical architecture that captures traffic dynamics at multiple spatial resolutions. Based on this architecture, we apply a multi-task learning scheme, which trains the model to predict traffic at different resolutions. Our model helps provide a coherent understanding of traffic dynamics by capturing spatial dependencies between different regions of a city. Experimental results on multiple real datasets show that our model can achieve competitive results compared to complex state-of-the-art approaches while being more computationally efficient.
{"title":"Effective Traffic Forecasting with Multi-Resolution Learning","authors":"Abdullah AlDwyish, E. Tanin, Hairuo Xie, S. Karunasekera, K. Ramamohanarao","doi":"10.1145/3469830.3470904","DOIUrl":"https://doi.org/10.1145/3469830.3470904","url":null,"abstract":"Traffic forecasting plays a vital role in traffic management systems. Recently, deep learning models have been applied to citywide traffic forecasting. However, the existing work models and predicts traffic at a single (dense) resolution, making it challenging to capture long-range spatial dependencies or high-level traffic dynamics. This shortcoming limits the accuracy of prediction and results in computationally expensive models. We propose a traffic forecasting model based on deep convolutional networks to improve the accuracy of citywide traffic forecasting. Our model uses a hierarchical architecture that captures traffic dynamics at multiple spatial resolutions. Based on this architecture, we apply a multi-task learning scheme, which trains the model to predict traffic at different resolutions. Our model helps provide a coherent understanding of traffic dynamics by capturing spatial dependencies between different regions of a city. Experimental results on multiple real datasets show that our model can achieve competitive results compared to complex state-of-the-art approaches while being more computationally efficient.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129031355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Songnian Zhang, S. Ray, Rongxing Lu, Yandong Zheng
A corpus of recent work has revealed that the learned index can improve query performance while reducing the storage overhead. It potentially offers an opportunity to address the spatial query processing challenges caused by the surge in location-based services. Although several learned indexes have been proposed to process spatial data, the main idea behind these approaches is to utilize the existing one-dimensional learned models, which requires either converting the spatial data into one-dimensional data or applying the learned model on individual dimensions separately. As a result, these approaches cannot fully utilize or take advantage of the information regarding the spatial distribution of the original spatial data. To this end, in this paper, we exploit it by using the spatial (multi-dimensional) interpolation function as the learned model, which can be directly employed on the spatial data. Specifically, we design an efficient SPatial inteRpolation functIon based Grid index (SPRIG) to process the range and kNN queries. Detailed experiments are conducted on real-world datasets. The results indicate that, compared to the traditional spatial indexes, our proposed learned index can significantly improve the index building and query processing performance with less storage overhead. Moreover, in the best case, our index achieves up to an order of magnitude better performance than ZM-index in range queries and is about 2.7 × , 3 × , and 9 × faster than the multi-dimensional learned index Flood in terms of index building, range queries, and kNN queries, respectively.
{"title":"SPRIG: A Learned Spatial Index for Range and kNN Queries","authors":"Songnian Zhang, S. Ray, Rongxing Lu, Yandong Zheng","doi":"10.1145/3469830.3470892","DOIUrl":"https://doi.org/10.1145/3469830.3470892","url":null,"abstract":"A corpus of recent work has revealed that the learned index can improve query performance while reducing the storage overhead. It potentially offers an opportunity to address the spatial query processing challenges caused by the surge in location-based services. Although several learned indexes have been proposed to process spatial data, the main idea behind these approaches is to utilize the existing one-dimensional learned models, which requires either converting the spatial data into one-dimensional data or applying the learned model on individual dimensions separately. As a result, these approaches cannot fully utilize or take advantage of the information regarding the spatial distribution of the original spatial data. To this end, in this paper, we exploit it by using the spatial (multi-dimensional) interpolation function as the learned model, which can be directly employed on the spatial data. Specifically, we design an efficient SPatial inteRpolation functIon based Grid index (SPRIG) to process the range and kNN queries. Detailed experiments are conducted on real-world datasets. The results indicate that, compared to the traditional spatial indexes, our proposed learned index can significantly improve the index building and query processing performance with less storage overhead. Moreover, in the best case, our index achieves up to an order of magnitude better performance than ZM-index in range queries and is about 2.7 × , 3 × , and 9 × faster than the multi-dimensional learned index Flood in terms of index building, range queries, and kNN queries, respectively.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122958227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present new generic methods to efficiently draw schematized metro maps for a wide variety of layouts, including octilinear, hexalinear, and orthoradial maps. The maps are drawn by mapping the input graph to a suitable grid graph. Previous work was restricted to regular octilinear grids. In this work, we investigate a variety of grids, including triangular grids and orthoradial grids. In particular, we also construct sparse grids where the local node density adapts to the input graph (e.g. octilinear Hanan grids, which we introduce in this work). For octilinear maps, this reduces the grid size by a factor of up to 5 compared to previous work, while still achieving close-to-optimal layouts. For many maps, this reduction also leads to up to 5 times faster solution times of the underlying optimization problem. We evaluate our approach on five maps. All octilinear maps can be computed in under 0.5 seconds, all hexalinear and orthoradial maps can be computed in under 2.5 seconds.
{"title":"Metro Maps on Flexible Base Grids","authors":"H. Bast, P. Brosi, Sabine Storandt","doi":"10.1145/3469830.3470899","DOIUrl":"https://doi.org/10.1145/3469830.3470899","url":null,"abstract":"We present new generic methods to efficiently draw schematized metro maps for a wide variety of layouts, including octilinear, hexalinear, and orthoradial maps. The maps are drawn by mapping the input graph to a suitable grid graph. Previous work was restricted to regular octilinear grids. In this work, we investigate a variety of grids, including triangular grids and orthoradial grids. In particular, we also construct sparse grids where the local node density adapts to the input graph (e.g. octilinear Hanan grids, which we introduce in this work). For octilinear maps, this reduces the grid size by a factor of up to 5 compared to previous work, while still achieving close-to-optimal layouts. For many maps, this reduction also leads to up to 5 times faster solution times of the underlying optimization problem. We evaluate our approach on five maps. All octilinear maps can be computed in under 0.5 seconds, all hexalinear and orthoradial maps can be computed in under 2.5 seconds.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116483124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Utility systems such as electric, fiber/telco, gas, and water require the realistic modeling of network attributes or values over distance. For example, consider hydraulic pressure in a pipe network; as water flows away from the reservoir or pump, pressure decreases due to pipe friction, leakage, consumption, etc. Attribute propagation is the process whereby network attributes that change over distance (e.g., maximum allowable operating pressure, phase, etc.) are calculated and maintained. This is important for improving safety as well as efficiency. However, attribute propagation is challenging due to the size of the data, which could have tens of millions of nodes and edges per utility, and billions of nodes and edges at the nationwide scale. Additionally, results may need to be calculated and available quickly for interactive analysis. Previous approaches require immediate updates to all nodes and edges downstream of a node/edge being edited (to account for changes in attribute values), which could be computationally intensive and result in a slow user experience for editing attribute values. This paper presents Propagators, which feature an in-memory approach to attribute propagation. Propagators leverage a network index as well as a heuristic based on colocated sources with similar attribute values to increase computational savings. We present experiments that demonstrate the scalability of Propagators, which have been implemented in ArcGIS Pro and ArcGIS Enterprise.
{"title":"Attribute Propagation for Utilities","authors":"Dev Oliver, P. Bakalov, Sangho Kim, E. Hoel","doi":"10.1145/3469830.3470907","DOIUrl":"https://doi.org/10.1145/3469830.3470907","url":null,"abstract":"Utility systems such as electric, fiber/telco, gas, and water require the realistic modeling of network attributes or values over distance. For example, consider hydraulic pressure in a pipe network; as water flows away from the reservoir or pump, pressure decreases due to pipe friction, leakage, consumption, etc. Attribute propagation is the process whereby network attributes that change over distance (e.g., maximum allowable operating pressure, phase, etc.) are calculated and maintained. This is important for improving safety as well as efficiency. However, attribute propagation is challenging due to the size of the data, which could have tens of millions of nodes and edges per utility, and billions of nodes and edges at the nationwide scale. Additionally, results may need to be calculated and available quickly for interactive analysis. Previous approaches require immediate updates to all nodes and edges downstream of a node/edge being edited (to account for changes in attribute values), which could be computationally intensive and result in a slow user experience for editing attribute values. This paper presents Propagators, which feature an in-memory approach to attribute propagation. Propagators leverage a network index as well as a heuristic based on colocated sources with similar attribute values to increase computational savings. We present experiments that demonstrate the scalability of Propagators, which have been implemented in ArcGIS Pro and ArcGIS Enterprise.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122479034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Tritsarolis, Y. Kontoulis, N. Pelekis, Y. Theodoridis
The massive-scale data generation of positioning (tracking) messages, collected by various surveillance means, has posed new challenges in the field of mobility data analytics in terms of extracting valuable knowledge out of this data. One of these challenges is online cluster analysis, where the goal is to unveil hidden patterns of collective behaviour from streaming trajectories, such as co-movement and co-stationary (aka anchorage) patterns. Towards this direction, in this paper, we demonstrate MaSEC (Moving and Stationary Evolving Clusters), a system that discovers valuable behavioural patterns as above. In particular, our system provides a unified solution that discovers both moving and stationary evolving clusters on streaming vessel position data in an online mode. The functionality of our system is evaluated over two real-world datasets from the maritime domain.
{"title":"MaSEC: Discovering Anchorages and Co-movement Patterns on Streaming Vessel Trajectories","authors":"A. Tritsarolis, Y. Kontoulis, N. Pelekis, Y. Theodoridis","doi":"10.1145/3469830.3470909","DOIUrl":"https://doi.org/10.1145/3469830.3470909","url":null,"abstract":"The massive-scale data generation of positioning (tracking) messages, collected by various surveillance means, has posed new challenges in the field of mobility data analytics in terms of extracting valuable knowledge out of this data. One of these challenges is online cluster analysis, where the goal is to unveil hidden patterns of collective behaviour from streaming trajectories, such as co-movement and co-stationary (aka anchorage) patterns. Towards this direction, in this paper, we demonstrate MaSEC (Moving and Stationary Evolving Clusters), a system that discovers valuable behavioural patterns as above. In particular, our system provides a unified solution that discovers both moving and stationary evolving clusters on streaming vessel position data in an online mode. The functionality of our system is evaluated over two real-world datasets from the maritime domain.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123730775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maintaining a database with the type, location, and direction of traffic signs is a labor-intensive part of asset management for many road authorities. Today there are high-quality cameras in cell-phones that can add location (EXIF) metadata to the images. This makes it efficient and cheap to collect large geo-located imagery datasets. Detecting traffic signs from imagery is also much simpler today due to the availability of several high-quality open-source object-detection solutions. In this paper, we use the detection of traffic signs to find both the location and the direction of physical traffic signs. Five approaches to cluster the detections are presented. An extensive experimental evaluation shows that it is important to consider both the location and the direction. The evaluation is done on a novel dataset with 21,565 images that is available free for download. This includes the ground-truth location of 277 traffic signs and all source code. The conclusion is that traffic signs are detected with an F1 score of 0.8889, a location accuracy of 5.097-meter (MAE), and a direction accuracy of ± 11.375°(MAE). Only data from two trips are needed to get these results.
维护一个包含交通标志的类型、位置和方向的数据库是许多道路管理部门资产管理的劳动密集型部分。如今,手机中的高质量摄像头可以为图像添加位置(EXIF)元数据。这使得收集大型地理定位图像数据集变得高效和廉价。由于一些高质量的开源对象检测解决方案的可用性,从图像中检测交通标志也变得简单得多。在本文中,我们使用交通标志检测来寻找物理交通标志的位置和方向。提出了五种聚类检测的方法。广泛的实验评估表明,同时考虑位置和方向是很重要的。评估是在一个包含21,565张图像的新数据集上完成的,该数据集可以免费下载。这包括277个交通标志的真实位置和所有源代码。结果表明,该方法检测到的交通标志F1值为0.8889,定位精度为5.097 m (MAE),方向精度为±11.375°(MAE)。只需要两次行程的数据就可以得到这些结果。
{"title":"Geolocating Traffic Signs using Large Imagery Datasets","authors":"Kasper F. Pedersen, K. Torp","doi":"10.1145/3469830.3470900","DOIUrl":"https://doi.org/10.1145/3469830.3470900","url":null,"abstract":"Maintaining a database with the type, location, and direction of traffic signs is a labor-intensive part of asset management for many road authorities. Today there are high-quality cameras in cell-phones that can add location (EXIF) metadata to the images. This makes it efficient and cheap to collect large geo-located imagery datasets. Detecting traffic signs from imagery is also much simpler today due to the availability of several high-quality open-source object-detection solutions. In this paper, we use the detection of traffic signs to find both the location and the direction of physical traffic signs. Five approaches to cluster the detections are presented. An extensive experimental evaluation shows that it is important to consider both the location and the direction. The evaluation is done on a novel dataset with 21,565 images that is available free for download. This includes the ground-truth location of 277 traffic signs and all source code. The conclusion is that traffic signs are detected with an F1 score of 0.8889, a location accuracy of 5.097-meter (MAE), and a direction accuracy of ± 11.375°(MAE). Only data from two trips are needed to get these results.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128309574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carola Trahms, P. Handmann, W. Rath, M. Visbeck, M. Renz
The distribution of passively drifting particles within highly turbulent flows is a classic problem in marine sciences. The use of trajectory clustering on huge amounts of simulated marine trajectory data to identify main pathways of drifting particles has not been widely investigated from a data science perspective yet. In this paper, we propose a fast and computationally light method to efficiently identify main pathways in large amounts of trajectory data. It aims at overcoming some of the issues of probabilistic maps and existing trajectory clustering approaches. Our approach is evaluated against simulated larvae dispersion data based on a real-world model that have been produced as part of work in the marine science domain.
{"title":"Where have all the larvae gone? Towards Fast Main Pathway Identification from Geospatial Trajectories","authors":"Carola Trahms, P. Handmann, W. Rath, M. Visbeck, M. Renz","doi":"10.1145/3469830.3470896","DOIUrl":"https://doi.org/10.1145/3469830.3470896","url":null,"abstract":"The distribution of passively drifting particles within highly turbulent flows is a classic problem in marine sciences. The use of trajectory clustering on huge amounts of simulated marine trajectory data to identify main pathways of drifting particles has not been widely investigated from a data science perspective yet. In this paper, we propose a fast and computationally light method to efficiently identify main pathways in large amounts of trajectory data. It aims at overcoming some of the issues of probabilistic maps and existing trajectory clustering approaches. Our approach is evaluated against simulated larvae dispersion data based on a real-world model that have been produced as part of work in the marine science domain.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133821935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adverse side effects of a drug may vary over space and time due to different populations, environments, and drug quality. Discovering all side effects during the development process is impossible. Once a drug is approved, observed adverse effects are reported by doctors and patients and made available in the Adverse Event Reporting System provided by the U.S. Food and Drug Administration . Mining such records of reported adverse effects, this study proposes a spatial clustering approach to identify regions that exhibit similar adverse effects. We apply a topic modeling approach on textual representations of reported adverse effects using Latent Dirichlet Allocation. By describing a spatial region as a mixture of the resulting latent topics, we find clusters of regions that exhibit similar (topics of) adverse events for the same drug using Hierarchical Agglomerative Clustering. We investigate the resulting clusters for spatial autocorrelation to test the hypothesis that certain (topics of) adverse effects may occur only in certain spatial regions using Moran’s I measure of spatial autocorrelation. Our experimental evaluation exemplary applies our proposed framework to a number of blood-thinning drugs, showing that some drugs exhibit more coherent textual topics among their reported adverse effects than other drugs, but showing no significant spatial autocorrelation of these topics. Our approach can be applied to other drugs or vaccines to study if spatially localized adverse effects may justify further investigation.
{"title":"Clustering of Adverse Events of Post-Market Approved Drugs","authors":"Ahmed Askar, Andreas Zuefle","doi":"10.1145/3469830.3470903","DOIUrl":"https://doi.org/10.1145/3469830.3470903","url":null,"abstract":"Adverse side effects of a drug may vary over space and time due to different populations, environments, and drug quality. Discovering all side effects during the development process is impossible. Once a drug is approved, observed adverse effects are reported by doctors and patients and made available in the Adverse Event Reporting System provided by the U.S. Food and Drug Administration . Mining such records of reported adverse effects, this study proposes a spatial clustering approach to identify regions that exhibit similar adverse effects. We apply a topic modeling approach on textual representations of reported adverse effects using Latent Dirichlet Allocation. By describing a spatial region as a mixture of the resulting latent topics, we find clusters of regions that exhibit similar (topics of) adverse events for the same drug using Hierarchical Agglomerative Clustering. We investigate the resulting clusters for spatial autocorrelation to test the hypothesis that certain (topics of) adverse effects may occur only in certain spatial regions using Moran’s I measure of spatial autocorrelation. Our experimental evaluation exemplary applies our proposed framework to a number of blood-thinning drugs, showing that some drugs exhibit more coherent textual topics among their reported adverse effects than other drugs, but showing no significant spatial autocorrelation of these topics. Our approach can be applied to other drugs or vaccines to study if spatially localized adverse effects may justify further investigation.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129220841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jayant Gupta, A. Long, C. Xu, Tian Tang, S. Shekhar
Spatial data brings an important dimension to AI’s quest for algorithmic transparency. For example, data driven computer-aided policy-decisions use measures of segregation (e.g., dissimilarity index) or income-inequality (e.g., Gini index), and these measures are affected by space partitioning choice. This may lead policymakers to underestimate the level of inequality or segregation within a region. The problem stems from the fact that many segregation based analyses use aggregated census data but do not report result sensitivity to choice of spatial partitioning (e.g., census block, tract). Beyond the well-known Modifiable Areal Unit Problem, this paper shows (via mathematical proofs as well as case studies with census data and census based synthetic micro-population data) that values of many measures (e.g., Gini index, dissimilarity index) diminish monotonically with increasing spatial-unit size in a hierarchical space partitioning (e.g., block, block-group, tract), however the ranking based on spatially aggregated measures remain sensitive to the scale of spatial partitions (e.g., block, block group). This paper highlights the need for social scientists to report how rankings of inequality are affected by the choice of spatial partitions.
{"title":"Spatial Dimensions of Algorithmic Transparency: A Summary","authors":"Jayant Gupta, A. Long, C. Xu, Tian Tang, S. Shekhar","doi":"10.1145/3469830.3470898","DOIUrl":"https://doi.org/10.1145/3469830.3470898","url":null,"abstract":"Spatial data brings an important dimension to AI’s quest for algorithmic transparency. For example, data driven computer-aided policy-decisions use measures of segregation (e.g., dissimilarity index) or income-inequality (e.g., Gini index), and these measures are affected by space partitioning choice. This may lead policymakers to underestimate the level of inequality or segregation within a region. The problem stems from the fact that many segregation based analyses use aggregated census data but do not report result sensitivity to choice of spatial partitioning (e.g., census block, tract). Beyond the well-known Modifiable Areal Unit Problem, this paper shows (via mathematical proofs as well as case studies with census data and census based synthetic micro-population data) that values of many measures (e.g., Gini index, dissimilarity index) diminish monotonically with increasing spatial-unit size in a hierarchical space partitioning (e.g., block, block-group, tract), however the ranking based on spatially aggregated measures remain sensitive to the scale of spatial partitions (e.g., block, block group). This paper highlights the need for social scientists to report how rankings of inequality are affected by the choice of spatial partitions.","PeriodicalId":206910,"journal":{"name":"17th International Symposium on Spatial and Temporal Databases","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129447932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}