Hien To, Rúben Geraldes, C. Shahabi, S. H. Kim, H. Prendinger
With the ubiquity of smartphones, spatial crowdsourcing (SC) has emerged as a new paradigm that engages mobile users to perform tasks in the physical world. Thus, various SC techniques have been studied for performance optimization. However, little research has been done to understand workers' behavior in the real world. In this study, we designed and performed two real world SC campaigns utilizing our mobile app, called Genkii, which is a GPS-enabled app for users to report their affective state (e.g., happy, sad). We used Yahoo! Japan Crowdsourcing as the payment platform to reward users for reporting their affective states at different locations and times. We studied the relationship between incentives and participation by analyzing the impact of offering a fixed reward versus an increasing reward scheme. We observed that users tend to stay in a campaign longer when the provided incentives gradually increase over time. We also found that the degree of mobility is correlated with the reported information. For example, users who travel more are observed to be happier than the ones who travel less. Furthermore, analyzing the spatiotemporal information of the reports reveals interesting mobility patterns that are unique to spatial crowdsourcing.
{"title":"An empirical study of workers' behavior in spatial crowdsourcing","authors":"Hien To, Rúben Geraldes, C. Shahabi, S. H. Kim, H. Prendinger","doi":"10.1145/2948649.2948657","DOIUrl":"https://doi.org/10.1145/2948649.2948657","url":null,"abstract":"With the ubiquity of smartphones, spatial crowdsourcing (SC) has emerged as a new paradigm that engages mobile users to perform tasks in the physical world. Thus, various SC techniques have been studied for performance optimization. However, little research has been done to understand workers' behavior in the real world. In this study, we designed and performed two real world SC campaigns utilizing our mobile app, called Genkii, which is a GPS-enabled app for users to report their affective state (e.g., happy, sad). We used Yahoo! Japan Crowdsourcing as the payment platform to reward users for reporting their affective states at different locations and times. We studied the relationship between incentives and participation by analyzing the impact of offering a fixed reward versus an increasing reward scheme. We observed that users tend to stay in a campaign longer when the provided incentives gradually increase over time. We also found that the degree of mobility is correlated with the reported information. For example, users who travel more are observed to be happier than the ones who travel less. Furthermore, analyzing the spatiotemporal information of the reports reveals interesting mobility patterns that are unique to spatial crowdsourcing.","PeriodicalId":336205,"journal":{"name":"Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133534890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Baumann, E. Hirschorn, J. Masó, A. Dumitru, Vlad Merticariu
Spatio-temporal grid data form a core structure in Earth and Space sciences alike. While Array Databases have set out to support this information category they only offer integer indexing, corresponding to equidistant grids. However, often grids in reality have irregular structures, such as raw satellite swath data. We present an approach to modeling spatio-temporal regular and non-regular grids in a coherent manner, suitable for querying, transporting, and storing such data while remaining format independent. We briefly describe an implementation based on the combination of a relational and an array DBMS. Our model is currently under adoption as an international standard by OGC and ISO.
{"title":"Taming twisted cubes","authors":"P. Baumann, E. Hirschorn, J. Masó, A. Dumitru, Vlad Merticariu","doi":"10.1145/2948649.2948650","DOIUrl":"https://doi.org/10.1145/2948649.2948650","url":null,"abstract":"Spatio-temporal grid data form a core structure in Earth and Space sciences alike. While Array Databases have set out to support this information category they only offer integer indexing, corresponding to equidistant grids. However, often grids in reality have irregular structures, such as raw satellite swath data. We present an approach to modeling spatio-temporal regular and non-regular grids in a coherent manner, suitable for querying, transporting, and storing such data while remaining format independent. We briefly describe an implementation based on the combination of a relational and an array DBMS. Our model is currently under adoption as an international standard by OGC and ISO.","PeriodicalId":336205,"journal":{"name":"Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125520500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiao-Xing Zhao, Yuanyuan Qiao, Zhongwei Si, Jie Yang, Anders Lindgren
In the era of mobile Internet, a vast amount of geo-spatial data allows us to gain further insights into human activities, which is critical for Internet Services Providers (ISP) to provide better personalized services. With the pervasiveness of mobile Internet, much evidence show that human mobility has heavy impact on app usage behavior. In this paper, we propose a method based on machine learning to predict users' app usage behavior using several features of human mobility extracted from geo-spatial data in mobile Internet traces. The core idea of our method is selecting a set of mobility attributes (e.g. location, travel pattern, and mobility indicators) that have large impact on app usage behavior and inputting them into a classification model. We evaluate our method using real-world network traffic collected by our self-developed high-speed Traffic Monitoring System (TMS). Our prediction method achieves 90.3% accuracy in our experiment, which verifies the strong correlation between human mobility and app usage behavior. Our experimental results uncover a big potential of geo-spatial data extracted from mobile Internet.
{"title":"Prediction of user app usage behavior from geo-spatial data","authors":"Xiao-Xing Zhao, Yuanyuan Qiao, Zhongwei Si, Jie Yang, Anders Lindgren","doi":"10.1145/2948649.2948656","DOIUrl":"https://doi.org/10.1145/2948649.2948656","url":null,"abstract":"In the era of mobile Internet, a vast amount of geo-spatial data allows us to gain further insights into human activities, which is critical for Internet Services Providers (ISP) to provide better personalized services. With the pervasiveness of mobile Internet, much evidence show that human mobility has heavy impact on app usage behavior. In this paper, we propose a method based on machine learning to predict users' app usage behavior using several features of human mobility extracted from geo-spatial data in mobile Internet traces. The core idea of our method is selecting a set of mobility attributes (e.g. location, travel pattern, and mobility indicators) that have large impact on app usage behavior and inputting them into a classification model. We evaluate our method using real-world network traffic collected by our self-developed high-speed Traffic Monitoring System (TMS). Our prediction method achieves 90.3% accuracy in our experiment, which verifies the strong correlation between human mobility and app usage behavior. Our experimental results uncover a big potential of geo-spatial data extracted from mobile Internet.","PeriodicalId":336205,"journal":{"name":"Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121098539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The automatic classification of patent applications into a particular patent classification system remains a challenge with many practical applications. From a computer science point of view, the task is a multi-label hierarchical classification problem, i.e. each patent application might belong to multiple classes within the class hierarchy. The problem is still especially difficult for purely text-based classifiers because patents and patent applications are often formulated in a rather generic way. Thus, additional sources of information should be used to improve class prediction. In our approach, we propose the use of location information contained in the meta data of a patent application in combination with text-based patent classification. We argue that certain technological areas often cluster in geographic regions. For example, space travel technology is often collocated at Houston, Texas due to the NASA facilities in this area. In many cases, the addresses of the inventors are correlated to the technological area of a given patent. Thus, the addresses can be exploited to provide additional information about the technological area. We present a geo-enriched classifier joining established methods for text-based classification with location-based topic prediction. Since the location-based prediction is not applicable to all cases, we provide a method to regulate the impact of the spatial predictor for these cases. Our experiments indicate that spatial prediction is applicable to a considerable amount of patent applications and that the combination of spatial prediction and text-based classification significantly improves the prediction accuracy.
{"title":"Geodata supported classification of patent applications","authors":"J. Stutzki, Matthias Schubert","doi":"10.1145/2948649.2948653","DOIUrl":"https://doi.org/10.1145/2948649.2948653","url":null,"abstract":"The automatic classification of patent applications into a particular patent classification system remains a challenge with many practical applications. From a computer science point of view, the task is a multi-label hierarchical classification problem, i.e. each patent application might belong to multiple classes within the class hierarchy. The problem is still especially difficult for purely text-based classifiers because patents and patent applications are often formulated in a rather generic way. Thus, additional sources of information should be used to improve class prediction. In our approach, we propose the use of location information contained in the meta data of a patent application in combination with text-based patent classification. We argue that certain technological areas often cluster in geographic regions. For example, space travel technology is often collocated at Houston, Texas due to the NASA facilities in this area. In many cases, the addresses of the inventors are correlated to the technological area of a given patent. Thus, the addresses can be exploited to provide additional information about the technological area. We present a geo-enriched classifier joining established methods for text-based classification with location-based topic prediction. Since the location-based prediction is not applicable to all cases, we provide a method to regulate the impact of the spatial predictor for these cases. Our experiments indicate that spatial prediction is applicable to a considerable amount of patent applications and that the combination of spatial prediction and text-based classification significantly improves the prediction accuracy.","PeriodicalId":336205,"journal":{"name":"Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126641234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the percentage of Twitter users approaching 20% of the US population by 2019, tweets provide a good sample of the public's sentiment and opinion. Consequently such data has been excessively used in commercial and research efforts. While works have analyzed the content of tweets in relation to the underlying social network of a discussion, somewhat less attention has been paid to the spatial distribution of messages and topics. This work tries to assess the locality of discussions using the concepts mentioned in tweets. Based on a global distribution of topics across the 48 contiguous states, we try to ascertain spatial topic dissimilarity by recursively subdividing the space into smaller and smaller partitions and using statistical testing to compare the distributions. Experimenting with a large Twitter dataset for the US, we can observe that locality of a discussion occurs at specific thresholds and that only 14 of the 49 most populous urban areas feature a unique discussion. Overall, this work establishes trends as to when locality in a discussion in social media occurs.
{"title":"Geo-fingerprinting social media content","authors":"Hatim Gazaz, A. Croitoru, P. Delamater, D. Pfoser","doi":"10.1145/2948649.2948654","DOIUrl":"https://doi.org/10.1145/2948649.2948654","url":null,"abstract":"With the percentage of Twitter users approaching 20% of the US population by 2019, tweets provide a good sample of the public's sentiment and opinion. Consequently such data has been excessively used in commercial and research efforts. While works have analyzed the content of tweets in relation to the underlying social network of a discussion, somewhat less attention has been paid to the spatial distribution of messages and topics. This work tries to assess the locality of discussions using the concepts mentioned in tweets. Based on a global distribution of topics across the 48 contiguous states, we try to ascertain spatial topic dissimilarity by recursively subdividing the space into smaller and smaller partitions and using statistical testing to compare the distributions. Experimenting with a large Twitter dataset for the US, we can observe that locality of a discussion occurs at specific thresholds and that only 14 of the 49 most populous urban areas feature a unique discussion. Overall, this work establishes trends as to when locality in a discussion in social media occurs.","PeriodicalId":336205,"journal":{"name":"Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116162429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a novel framework for estimating social point-of-interest (POI) boundaries, also termed GeoSocialBound, utilizing spatio--textual information based on geo-tagged tweets. We first start by defining a social POI boundary as one small-scale cluster containing its POI center, geographically formed with a convex polygon. Motivated by an insightful observation with regard to estimation accuracy, we formulate a constrained optimization problem, in which we are interested in finding the radius of a circle such that a newly defined objective function is maximized. To solve this problem, we introduce an efficient optimal estimation algorithm whose runtime complexity is linear in the number of geo-tags in a dataset. In addition, we empirically evaluate the estimation performance of our GeoSocialBound algorithm for various environments and validate the complexity analysis. As a result, vital information on how to obtain real-world GeoSocialBounds with a high degree of accuracy is provided.
{"title":"GeoSocialBound: an efficient framework for estimating social POI boundaries using spatio--textual information","authors":"Dung D. Vu, Hien To, Won-Yong Shin, C. Shahabi","doi":"10.1145/2948649.2948652","DOIUrl":"https://doi.org/10.1145/2948649.2948652","url":null,"abstract":"In this paper, we present a novel framework for estimating social point-of-interest (POI) boundaries, also termed GeoSocialBound, utilizing spatio--textual information based on geo-tagged tweets. We first start by defining a social POI boundary as one small-scale cluster containing its POI center, geographically formed with a convex polygon. Motivated by an insightful observation with regard to estimation accuracy, we formulate a constrained optimization problem, in which we are interested in finding the radius of a circle such that a newly defined objective function is maximized. To solve this problem, we introduce an efficient optimal estimation algorithm whose runtime complexity is linear in the number of geo-tags in a dataset. In addition, we empirically evaluate the estimation performance of our GeoSocialBound algorithm for various environments and validate the complexity analysis. As a result, vital information on how to obtain real-world GeoSocialBounds with a high degree of accuracy is provided.","PeriodicalId":336205,"journal":{"name":"Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114717790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Place similarity has a central role in geographic information retrieval and geographic information systems, where spatial proximity is frequently just a poor substitute for semantic relatedness. For applications such as toponym disambiguation, alternative measures are thus required to answer the non-trivial question of place similarity in a given context. In this paper, we discuss a novel approach to the construction of a network of locations from unstructured text data. By deriving similarity scores based on the textual distance of toponyms, we obtain a kind of relatedness that encodes the importance of the co-occurrences of place mentions. Based on the text of the English Wikipedia, we construct and provide such a network of place similarities, including entity linking to Wikidata as an augmentation of the contained information. In an analysis of centrality, we explore the networks capability of capturing the similarity between places. An evaluation of the network for the task of toponym disambiguation on the AIDA CoNLL-YAGO dataset reveals a performance that is in line with state-of-the-art methods.
{"title":"So far away and yet so close: augmenting toponym disambiguation and similarity with text-based networks","authors":"Andreas Spitz, Johanna Geiß, Michael Gertz","doi":"10.1145/2948649.2948651","DOIUrl":"https://doi.org/10.1145/2948649.2948651","url":null,"abstract":"Place similarity has a central role in geographic information retrieval and geographic information systems, where spatial proximity is frequently just a poor substitute for semantic relatedness. For applications such as toponym disambiguation, alternative measures are thus required to answer the non-trivial question of place similarity in a given context. In this paper, we discuss a novel approach to the construction of a network of locations from unstructured text data. By deriving similarity scores based on the textual distance of toponyms, we obtain a kind of relatedness that encodes the importance of the co-occurrences of place mentions. Based on the text of the English Wikipedia, we construct and provide such a network of place similarities, including entity linking to Wikidata as an augmentation of the contained information. In an analysis of centrality, we explore the networks capability of capturing the similarity between places. An evaluation of the network for the task of toponym disambiguation on the AIDA CoNLL-YAGO dataset reveals a performance that is in line with state-of-the-art methods.","PeriodicalId":336205,"journal":{"name":"Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126411107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the growing data volume and popularity of Web services and Location-Based Services (LBS) new spatio-textual application are emerging. These applications are contributing to a deluge of geo-tagged documents. As a result, top-k spatial keyword searches have attracted a lot of attention and a number of spatio-textual indexes have been proposed. However, these indexes do not consider the "recency" of the indexed documents. Part of the challenge is due to the fact that the textual relevance score measures that these indexes use, require all documents to be inspected. To address these issues, we propose the idea of "dynamic ranking" of spatio-textual objects. We also introduce a novel index, called STARI, which uses this ranking method to retrieve the most recent top-k relevant objects. Experimental evaluation demonstrates that that our system can support high document update rates and low query latency.
{"title":"Dynamically ranked top-k spatial keyword search","authors":"S. Ray, B. Nickerson","doi":"10.1145/2948649.2948655","DOIUrl":"https://doi.org/10.1145/2948649.2948655","url":null,"abstract":"With the growing data volume and popularity of Web services and Location-Based Services (LBS) new spatio-textual application are emerging. These applications are contributing to a deluge of geo-tagged documents. As a result, top-k spatial keyword searches have attracted a lot of attention and a number of spatio-textual indexes have been proposed. However, these indexes do not consider the \"recency\" of the indexed documents. Part of the challenge is due to the fact that the textual relevance score measures that these indexes use, require all documents to be inspected. To address these issues, we propose the idea of \"dynamic ranking\" of spatio-textual objects. We also introduce a novel index, called STARI, which uses this ranking method to retrieve the most recent top-k relevant objects. Experimental evaluation demonstrates that that our system can support high document update rates and low query latency.","PeriodicalId":336205,"journal":{"name":"Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134047651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data","authors":"","doi":"10.1145/2948649","DOIUrl":"https://doi.org/10.1145/2948649","url":null,"abstract":"","PeriodicalId":336205,"journal":{"name":"Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130172323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}