Top-k spatial-textual queries have received significant attention in the research community. Several techniques to efficiently process this class of queries are now widely used in a variety of applications. However, the problem of how best to process multiple queries efficiently is not well understood. Applications relying on processing continuous streams of queries, and offline pre-processing of other queries could benefit from solutions to this problem. In this work, we study practical solutions to efficiently process a set of top-k spatial-textual queries. We propose an efficient best-first algorithm for the batch processing of top-k spatial-textual queries that promotes shared processing and reduced I/O in each query batch. By grouping similar queries and processing them simultaneously, we are able to demonstrate significant performance gains using publicly available datasets.
{"title":"Batch processing of Top-k Spatial-textual Queries","authors":"F. Choudhury, J. Culpepper, T. Sellis","doi":"10.1145/2786006.2786008","DOIUrl":"https://doi.org/10.1145/2786006.2786008","url":null,"abstract":"Top-k spatial-textual queries have received significant attention in the research community. Several techniques to efficiently process this class of queries are now widely used in a variety of applications. However, the problem of how best to process multiple queries efficiently is not well understood. Applications relying on processing continuous streams of queries, and offline pre-processing of other queries could benefit from solutions to this problem. In this work, we study practical solutions to efficiently process a set of top-k spatial-textual queries. We propose an efficient best-first algorithm for the batch processing of top-k spatial-textual queries that promotes shared processing and reduced I/O in each query batch. By grouping similar queries and processing them simultaneously, we are able to demonstrate significant performance gains using publicly available datasets.","PeriodicalId":443011,"journal":{"name":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130750902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Mouratidis, M. Renz, Tobias Emrich, Andreas Züfle, K. Janowicz
The aim of the GeoRich workshop is to provide a unique forum for discussing in depth the challenges, opportunities, novel techniques and applications on modeling, managing, searching and mining rich geospatial data, in order to fuel scientific research on big spatial data applications beyond the current research frontiers. The workshop is intended for researchers working on multidisciplinary topics who want to discuss problems and synergies. Following the success of the inaugural GeoRich in 2014, GeoRich'15 is the second event in the series.
{"title":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","authors":"K. Mouratidis, M. Renz, Tobias Emrich, Andreas Züfle, K. Janowicz","doi":"10.1145/2786006","DOIUrl":"https://doi.org/10.1145/2786006","url":null,"abstract":"The aim of the GeoRich workshop is to provide a unique forum for discussing in depth the challenges, opportunities, novel techniques and applications on modeling, managing, searching and mining rich geospatial data, in order to fuel scientific research on big spatial data applications beyond the current research frontiers. The workshop is intended for researchers working on multidisciplinary topics who want to discuss problems and synergies. Following the success of the inaugural GeoRich in 2014, GeoRich'15 is the second event in the series.","PeriodicalId":443011,"journal":{"name":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123121327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We say that an object o attracts a user u if o is one of the top-k objects according to the preference function defined by u. Given a set of objects (e.g., restaurants) and a set of users, in this paper, we study the problem of computing a set of representative objects considering two criteria: coverage and diversity. Coverage of a set S of objects is the distinct number of users that are attracted by the objects in S. Although a set of objects with high coverage attracts a large number of users, it is possible that all of these users have quite similar preferences. Consequently, the set of objects may be attractive only for a specific class of users with similar preference functions which may disappoint other users having widely different preferences. The diversity criterion addresses this issue by selecting a set S of objects such that the set of attracted users for each object in S is as different as possible from the sets of users attracted by the other objects in S. The existing work on representative objects considers only one of the coverage and diversity criteria. We are the first to consider both of the criteria where the importance of each criterion can be controlled using a parameter. Our algorithm has two phases. In the first phase, we prune the objects that cannot be among the representative objects and compute the set of attracted users (also called reverse top-k) for each of the remaining objects. In the second phase, the reverse top-k of these objects are used to compute the representative objects maximizing coverage and diversity. Since this problem is NP-hard, the second phase employs a greedy algorithm. For the sake of time and space efficiency, we adopt MinHash and KMV Synopses to assist the set operations. We prove that the proposed greedy algorithm is ϵ-approximate. Our extensive experimental study on real and synthetic data sets demonstrates the effectiveness of our proposed techniques.
{"title":"Selecting Representative Objects Considering Coverage and Diversity","authors":"Shenlu Wang, M. A. Cheema, Ying Zhang, Xuemin Lin","doi":"10.1145/2786006.2786012","DOIUrl":"https://doi.org/10.1145/2786006.2786012","url":null,"abstract":"We say that an object o attracts a user u if o is one of the top-k objects according to the preference function defined by u. Given a set of objects (e.g., restaurants) and a set of users, in this paper, we study the problem of computing a set of representative objects considering two criteria: coverage and diversity. Coverage of a set S of objects is the distinct number of users that are attracted by the objects in S. Although a set of objects with high coverage attracts a large number of users, it is possible that all of these users have quite similar preferences. Consequently, the set of objects may be attractive only for a specific class of users with similar preference functions which may disappoint other users having widely different preferences. The diversity criterion addresses this issue by selecting a set S of objects such that the set of attracted users for each object in S is as different as possible from the sets of users attracted by the other objects in S. The existing work on representative objects considers only one of the coverage and diversity criteria. We are the first to consider both of the criteria where the importance of each criterion can be controlled using a parameter. Our algorithm has two phases. In the first phase, we prune the objects that cannot be among the representative objects and compute the set of attracted users (also called reverse top-k) for each of the remaining objects. In the second phase, the reverse top-k of these objects are used to compute the representative objects maximizing coverage and diversity. Since this problem is NP-hard, the second phase employs a greedy algorithm. For the sake of time and space efficiency, we adopt MinHash and KMV Synopses to assist the set operations. We prove that the proposed greedy algorithm is ϵ-approximate. Our extensive experimental study on real and synthetic data sets demonstrates the effectiveness of our proposed techniques.","PeriodicalId":443011,"journal":{"name":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115396034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the social-media data explosion, near real-time queries, particularly those of a spatio-temporal nature, can be challenging. In this paper, we show how to efficiently answer queries that target recent data within very large data sets. We describe a solution that exploits a natural partitioning property that LSM-based indexes have for components, allowing us to filter out many components when answering queries. Our solution is generalizable to any LSM-based index structure, and can be applied not just on temporal fields (e.g., based on recency), but on any "time-correlated fields" such as Universally Unique Identifiers (UUIDs), user-provided integer ids, etc. We have implemented and experimentally evaluated the solution in the context of the AsterixDB system.
{"title":"LSM-Based Storage and Indexing: An Old Idea with Timely Benefits","authors":"Sattam Alsubaiee, M. Carey, Chen Li","doi":"10.1145/2786006.2786007","DOIUrl":"https://doi.org/10.1145/2786006.2786007","url":null,"abstract":"With the social-media data explosion, near real-time queries, particularly those of a spatio-temporal nature, can be challenging. In this paper, we show how to efficiently answer queries that target recent data within very large data sets. We describe a solution that exploits a natural partitioning property that LSM-based indexes have for components, allowing us to filter out many components when answering queries. Our solution is generalizable to any LSM-based index structure, and can be applied not just on temporal fields (e.g., based on recency), but on any \"time-correlated fields\" such as Universally Unique Identifiers (UUIDs), user-provided integer ids, etc. We have implemented and experimentally evaluated the solution in the context of the AsterixDB system.","PeriodicalId":443011,"journal":{"name":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116176775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Weiler, Klaus Arthur Schmid, N. Mamoulis, M. Renz
Modern technology to capture geo-spatial information produces a huge flood of geo-spatial and geo-spatio-temporal data with a new user mentality of utilizing this technology to voluntarily share information. This location information, enriched with social information, is a new source to discover new and useful knowledge. This work introduces geo-social co-location mining, the problem of finding social groups that are frequently found at the same location. This problem has applications in social sciences, allowing to research interactions between social groups and permitting social-link prediction. It can be divided into two sub-problems. The first sub-problem of finding spatial co-location instances, requires to properly address the inherent uncertainty in geo-social network data, which is a consequence of generally very sparse check-in data, and thus very sparse trajectory information. For this purpose, we propose a probabilistic model to estimate the probability of a user to be located at a given location at a given time, creating the notion of probabilistic co-locations. The second sub-problem of mining the resulting probabilistic co-location instances requires efficient methods for large databases having a high degree of uncertainty. Our approach solves this problem by extending solutions for probabilistic frequent itemset mining. Our experimental evaluation performed on real (but anonymized) geo-social network data shows the high efficiency of our approach, and its ability to find new social interactions.
{"title":"Geo-Social Co-location Mining","authors":"Michael Weiler, Klaus Arthur Schmid, N. Mamoulis, M. Renz","doi":"10.1145/2786006.2786010","DOIUrl":"https://doi.org/10.1145/2786006.2786010","url":null,"abstract":"Modern technology to capture geo-spatial information produces a huge flood of geo-spatial and geo-spatio-temporal data with a new user mentality of utilizing this technology to voluntarily share information. This location information, enriched with social information, is a new source to discover new and useful knowledge. This work introduces geo-social co-location mining, the problem of finding social groups that are frequently found at the same location. This problem has applications in social sciences, allowing to research interactions between social groups and permitting social-link prediction. It can be divided into two sub-problems. The first sub-problem of finding spatial co-location instances, requires to properly address the inherent uncertainty in geo-social network data, which is a consequence of generally very sparse check-in data, and thus very sparse trajectory information. For this purpose, we propose a probabilistic model to estimate the probability of a user to be located at a given location at a given time, creating the notion of probabilistic co-locations. The second sub-problem of mining the resulting probabilistic co-location instances requires efficient methods for large databases having a high degree of uncertainty. Our approach solves this problem by extending solutions for probabilistic frequent itemset mining. Our experimental evaluation performed on real (but anonymized) geo-social network data shows the high efficiency of our approach, and its ability to find new social interactions.","PeriodicalId":443011,"journal":{"name":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131750120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Novia Nurain, Mohammed Eunus Ali, T. Hashem, E. Tanin
A geo-spatial object with non-deterministic boundaries and compositions is commonly known as a fuzzy geo-spatial object. The advancement of data capturing devices such as sensors and satellite imaging technologies enable us to identify fuzzy geo-spatial objects from a large and complex image of an area. The nearest neighbor (NN) query processing on fuzzy objects, which finds the nearest fuzzy object to the given query point, has been addressed recently. In this paper, we envision a new set of applications that require finding the nearest fuzzy geo-spatial object for a group of fuzzy geo-spatial query objects. For example, when an oil spill occurs at a sea, the primary concern of an emergency response planner is to find an environmentally sensitive area, e.g., port or harbor, that will be affected the most by the oil spill. To support such applications, in this paper, we propose a new query type, called a fuzzy group nearest neighbor (FGNN) query. Given a set of fuzzy geo-spatial data objects, and a group of fuzzy geo-spatial query objects, an FGNN query returns a fuzzy geo-spatial object that minimizes the aggregate distance to the group. To solve FGNN queries, we develop an efficient technique in this paper. Our extensive experimental study reveals the efficacy and efficiency of our proposed technique.
{"title":"Group Nearest Neighbor Queries for Fuzzy Geo-Spatial Objects","authors":"Novia Nurain, Mohammed Eunus Ali, T. Hashem, E. Tanin","doi":"10.1145/2786006.2786011","DOIUrl":"https://doi.org/10.1145/2786006.2786011","url":null,"abstract":"A geo-spatial object with non-deterministic boundaries and compositions is commonly known as a fuzzy geo-spatial object. The advancement of data capturing devices such as sensors and satellite imaging technologies enable us to identify fuzzy geo-spatial objects from a large and complex image of an area. The nearest neighbor (NN) query processing on fuzzy objects, which finds the nearest fuzzy object to the given query point, has been addressed recently. In this paper, we envision a new set of applications that require finding the nearest fuzzy geo-spatial object for a group of fuzzy geo-spatial query objects. For example, when an oil spill occurs at a sea, the primary concern of an emergency response planner is to find an environmentally sensitive area, e.g., port or harbor, that will be affected the most by the oil spill. To support such applications, in this paper, we propose a new query type, called a fuzzy group nearest neighbor (FGNN) query. Given a set of fuzzy geo-spatial data objects, and a group of fuzzy geo-spatial query objects, an FGNN query returns a fuzzy geo-spatial object that minimizes the aggregate distance to the group. To solve FGNN queries, we develop an efficient technique in this paper. Our extensive experimental study reveals the efficacy and efficiency of our proposed technique.","PeriodicalId":443011,"journal":{"name":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131117088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Hashem, Shudip Datta, T. Islam, Mohammed Eunus Ali, L. Kulik, E. Tanin
An important class of location-based services (LBSs) is information queries that provide users with location information of nearby point of interests such as a restaurant, a hospital or a gas station. To access an LBS, a user has to reveal her location to the location-based service provider (LSP). From the revealed location, the LSP can infer private information about the user's health, habit and preferences. Thus, along with the benefits, LBSs also bring privacy concern to the users. Hence, protecting the privacy of LBSs users is a major challenge. Another major challenge is to ensure the reliability and correctness of the provided LBSs by the LSP, which is known as authentication. We develop a novel authentication technique that supports variants of privacy preserving LBSs with less storage and communication overhead. More importantly, we present a unified framework that can handle authentication for a wide range of privacy preserving location-based queries, range, nearest neighbor, and group nearest neighbor queries. We conduct experiments to show the efficiency and effectiveness of our approach in comparison with the state-of-art techniques.
{"title":"A Unified Framework for Authenticating Privacy Preserving Location Based Services","authors":"T. Hashem, Shudip Datta, T. Islam, Mohammed Eunus Ali, L. Kulik, E. Tanin","doi":"10.1145/2786006.2786009","DOIUrl":"https://doi.org/10.1145/2786006.2786009","url":null,"abstract":"An important class of location-based services (LBSs) is information queries that provide users with location information of nearby point of interests such as a restaurant, a hospital or a gas station. To access an LBS, a user has to reveal her location to the location-based service provider (LSP). From the revealed location, the LSP can infer private information about the user's health, habit and preferences. Thus, along with the benefits, LBSs also bring privacy concern to the users. Hence, protecting the privacy of LBSs users is a major challenge. Another major challenge is to ensure the reliability and correctness of the provided LBSs by the LSP, which is known as authentication. We develop a novel authentication technique that supports variants of privacy preserving LBSs with less storage and communication overhead. More importantly, we present a unified framework that can handle authentication for a wide range of privacy preserving location-based queries, range, nearest neighbor, and group nearest neighbor queries. We conduct experiments to show the efficiency and effectiveness of our approach in comparison with the state-of-art techniques.","PeriodicalId":443011,"journal":{"name":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132878085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}