Louai Alarabi, Bin Cao, Liwei Zhao, M. Mokbel, Anas Basalamah
Recently, many ride sharing systems have been commercially introduced (e.g., Uber, Flinc, and Lyft) forming a multi-billion dollars industry. The main idea is to match people requesting a certain ride to other people who are acting as drivers on their own spare time. The matching algorithm run by these services is very simple and ignores a wide sector of users who can be exploited to maximize the benefits of these services. In this demo, we demonstrate SHAREK; a driver-rider matching algorithm that can be embedded inside existing ride sharing services to enhance the quality of their matching. SHAREK has the potential to boost the performance and widen the user base and applicability of existing ride sharing services. This is mainly because within its matching technique, SHAREK takes into account user preferences in terms of maximum waiting time the rider is willing to have before being picked up as well as the maximum cost that the rider is willing to pay. Then, within its course of execution, SHAREK applies a set of smart filters that enable it to do the matching so efficiently without the need to many expensive shortest path computations.
{"title":"A demonstration of SHAREK: an efficient matching framework for ride sharing systems","authors":"Louai Alarabi, Bin Cao, Liwei Zhao, M. Mokbel, Anas Basalamah","doi":"10.1145/2996913.2996983","DOIUrl":"https://doi.org/10.1145/2996913.2996983","url":null,"abstract":"Recently, many ride sharing systems have been commercially introduced (e.g., Uber, Flinc, and Lyft) forming a multi-billion dollars industry. The main idea is to match people requesting a certain ride to other people who are acting as drivers on their own spare time. The matching algorithm run by these services is very simple and ignores a wide sector of users who can be exploited to maximize the benefits of these services. In this demo, we demonstrate SHAREK; a driver-rider matching algorithm that can be embedded inside existing ride sharing services to enhance the quality of their matching. SHAREK has the potential to boost the performance and widen the user base and applicability of existing ride sharing services. This is mainly because within its matching technique, SHAREK takes into account user preferences in terms of maximum waiting time the rider is willing to have before being picked up as well as the maximum cost that the rider is willing to pay. Then, within its course of execution, SHAREK applies a set of smart filters that enable it to do the matching so efficiently without the need to many expensive shortest path computations.","PeriodicalId":20525,"journal":{"name":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82427406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A variety of optimal location problems have been extensively studied in the literature. However, limited visualization systems have been developed for illustrating optimal location selection process. In this demonstration, we present a system that visualizes an advanced solution that can efficiently answer multi-criteria optimal location updating query by incrementally updating the Minimum Overlapping Voronoi Diagram (MOVD) model. Not only does our system display an example as a practical multi-criteria optimal location updating query, but also visualizes the process of the query evaluation in a more intuitive manner. With the object insertion and deletion operations defined over the MOVD model, any object changes in an MOVD can be represented by removing the objects from initial datasets and adding them back with updated attributes. Moreover, Haxe toolkit is used to provide friendly and flexible user interfaces in our system.
{"title":"A framework for updating multi-criteria optimal location query (demo paper)","authors":"P. Harn, Ji Zhang, Min-Te Sun, Wei-Shinn Ku","doi":"10.1145/2996913.2997012","DOIUrl":"https://doi.org/10.1145/2996913.2997012","url":null,"abstract":"A variety of optimal location problems have been extensively studied in the literature. However, limited visualization systems have been developed for illustrating optimal location selection process. In this demonstration, we present a system that visualizes an advanced solution that can efficiently answer multi-criteria optimal location updating query by incrementally updating the Minimum Overlapping Voronoi Diagram (MOVD) model. Not only does our system display an example as a practical multi-criteria optimal location updating query, but also visualizes the process of the query evaluation in a more intuitive manner. With the object insertion and deletion operations defined over the MOVD model, any object changes in an MOVD can be represented by removing the objects from initial datasets and adding them back with updated attributes. Moreover, Haxe toolkit is used to provide friendly and flexible user interfaces in our system.","PeriodicalId":20525,"journal":{"name":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84133755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhranil Chatterjee, Janit Anjaria, Sourav Roy, A. Ganguli, K. Seal
With the recent explosion of e-commerce industry in India, the problem of address geocoding, that is, transforming textual address descriptions to geographic reference, such as latitude, longitude coordinates, has emerged as a core problem for supply chain management. Some of the major areas that rely on precise and accurate address geocoding are supply chain fulfilment, supply chain analytics and logistics. In this paper, we present some of the challenges faced in practice while building an address geocoding engine as a core capability at Flipkart. We discuss the unique challenges of building a geocoding engine for a rapidly developing country like India, such as, fuzzy region boundaries, dynamic topography and lack of convention in spellings of toponyms, to name a few. We motivate the need for building a reliable and precise address geocoding system from a business perspective and argue why some of the commercially available solutions do not suffice for our requirements. SAGEL has evolved through 3 cycles of solution prototypes and pilot experiments. We describe the learnings from each of these phases and how we incorporated them to get to the first production-ready version. We describe how we store and index map data on a SolrCloud cluster of Apache Solr, an open-source search platform, and the core algorithm for geocoding which works post-retrieval in order to determine the best matches among a set of candidate results. We give a brief description of the system architecture and provide accuracy results of our geocoding engine by measuring deviations of geocoded customer addresses across India, from verified latitude, longitude coordinates of those addresses, for a sizeable address set. We also measure and report our system's ability to geocode up to different region levels, like city, locality or building. We compare our results with those of the geocoding service provided by Google against a set of addresses for which we have verified latitude-longitude coordinates and show that our geocoding engine is almost as accurate as Google's, while having a higher coverage.
{"title":"SAGEL: smart address geocoding engine for supply-chain logistics","authors":"Abhranil Chatterjee, Janit Anjaria, Sourav Roy, A. Ganguli, K. Seal","doi":"10.1145/2996913.2996917","DOIUrl":"https://doi.org/10.1145/2996913.2996917","url":null,"abstract":"With the recent explosion of e-commerce industry in India, the problem of address geocoding, that is, transforming textual address descriptions to geographic reference, such as latitude, longitude coordinates, has emerged as a core problem for supply chain management. Some of the major areas that rely on precise and accurate address geocoding are supply chain fulfilment, supply chain analytics and logistics. In this paper, we present some of the challenges faced in practice while building an address geocoding engine as a core capability at Flipkart. We discuss the unique challenges of building a geocoding engine for a rapidly developing country like India, such as, fuzzy region boundaries, dynamic topography and lack of convention in spellings of toponyms, to name a few. We motivate the need for building a reliable and precise address geocoding system from a business perspective and argue why some of the commercially available solutions do not suffice for our requirements. SAGEL has evolved through 3 cycles of solution prototypes and pilot experiments. We describe the learnings from each of these phases and how we incorporated them to get to the first production-ready version. We describe how we store and index map data on a SolrCloud cluster of Apache Solr, an open-source search platform, and the core algorithm for geocoding which works post-retrieval in order to determine the best matches among a set of candidate results. We give a brief description of the system architecture and provide accuracy results of our geocoding engine by measuring deviations of geocoded customer addresses across India, from verified latitude, longitude coordinates of those addresses, for a sizeable address set. We also measure and report our system's ability to geocode up to different region levels, like city, locality or building. We compare our results with those of the geocoding service provided by Google against a set of addresses for which we have verified latitude-longitude coordinates and show that our geocoding engine is almost as accurate as Google's, while having a higher coverage.","PeriodicalId":20525,"journal":{"name":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"116 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80687089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Predicting the movement of crowds in a city is strategically important for traffic management, risk assessment, and public safety. In this paper, we propose predicting two types of flows of crowds in every region of a city based on big data, including human mobility data, weather conditions, and road network data. To develop a practical solution for citywide traffic prediction, we first partition the map of a city into regions using both its road network and historical records of human mobility. Our problem is different than the predictions of each individual's movements and each road segment's traffic conditions, which are computationally costly and not necessary from the perspective of public safety on a citywide scale. To model the multiple complex factors affecting crowd flows, we decompose flows into three components: seasonal (periodic patterns), trend (changes in periodic patterns), and residual flows (instantaneous changes). The seasonal and trend models are built as intrinsic Gaussian Markov random fields which can cope with noisy and missing data, whereas a residual model exploits the spatio-temporal dependence among different flows and regions, as well as the effect of weather. Experiment results on three real-world datasets show that our method is scalable and outperforms all baselines significantly in terms of accuracy.
{"title":"FCCF: forecasting citywide crowd flows based on big data","authors":"Minh X. Hoang, Yu Zheng, Ambuj K. Singh","doi":"10.1145/2996913.2996934","DOIUrl":"https://doi.org/10.1145/2996913.2996934","url":null,"abstract":"Predicting the movement of crowds in a city is strategically important for traffic management, risk assessment, and public safety. In this paper, we propose predicting two types of flows of crowds in every region of a city based on big data, including human mobility data, weather conditions, and road network data. To develop a practical solution for citywide traffic prediction, we first partition the map of a city into regions using both its road network and historical records of human mobility. Our problem is different than the predictions of each individual's movements and each road segment's traffic conditions, which are computationally costly and not necessary from the perspective of public safety on a citywide scale. To model the multiple complex factors affecting crowd flows, we decompose flows into three components: seasonal (periodic patterns), trend (changes in periodic patterns), and residual flows (instantaneous changes). The seasonal and trend models are built as intrinsic Gaussian Markov random fields which can cope with noisy and missing data, whereas a residual model exploits the spatio-temporal dependence among different flows and regions, as well as the effect of weather. Experiment results on three real-world datasets show that our method is scalable and outperforms all baselines significantly in terms of accuracy.","PeriodicalId":20525,"journal":{"name":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78855609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large scale disasters cause severe social disorder and trigger mass evacuation activities. Managing the evacuation shelters efficiently is crucial for disaster management. Kumamoto prefecture, Japan, was hit by an enormous (Magnitude 7.3) earthquake on 16th of April, 2016. As a result, more than 10,000 buildings were severely damaged and over 100,000 people had to evacuate from their homes. After the earthquake, it took the decision makers several days to grasp the locations where people were evacuating, which delayed of distribution of supply and rescue. This situation was made even more complex since some people evacuated to places that were not designated as evacuation shelters. Conventional methods for grasping evacuation hotspots require on-foot field surveys that take time and are difficult to execute right after the hazard in the confusion. We propose a novel framework to efficiently estimate the evacuation hotspots after large disasters using location data collected from smartphones. To validate our framework and show the useful analysis using our output, we demonstrated the framework on the Kumamoto earthquake using GPS data of smartphones collected by Yahoo Japan. We verified that our estimation accuracy of evacuation hotspots were very high by checking the located facilities and also by comparing the population transition results with newspaper reports. Additionally, we demonstrated analysis using our framework outputs that would help decision makers, such as the population transition and function period of each hotspot. The efficiency of our framework is also validated by checking the processing time, showing that it could be utilized efficiently in disasters of any scale. Our framework provides useful output for decision makers that manage evacuation shelters after various kinds of large scale disasters.
{"title":"A framework for evacuation hotspot detection after large scale disasters using location data from smartphones: case study of Kumamoto earthquake","authors":"T. Yabe, K. Tsubouchi, Akihito Sudo, Y. Sekimoto","doi":"10.1145/2996913.2997014","DOIUrl":"https://doi.org/10.1145/2996913.2997014","url":null,"abstract":"Large scale disasters cause severe social disorder and trigger mass evacuation activities. Managing the evacuation shelters efficiently is crucial for disaster management. Kumamoto prefecture, Japan, was hit by an enormous (Magnitude 7.3) earthquake on 16th of April, 2016. As a result, more than 10,000 buildings were severely damaged and over 100,000 people had to evacuate from their homes. After the earthquake, it took the decision makers several days to grasp the locations where people were evacuating, which delayed of distribution of supply and rescue. This situation was made even more complex since some people evacuated to places that were not designated as evacuation shelters. Conventional methods for grasping evacuation hotspots require on-foot field surveys that take time and are difficult to execute right after the hazard in the confusion. We propose a novel framework to efficiently estimate the evacuation hotspots after large disasters using location data collected from smartphones. To validate our framework and show the useful analysis using our output, we demonstrated the framework on the Kumamoto earthquake using GPS data of smartphones collected by Yahoo Japan. We verified that our estimation accuracy of evacuation hotspots were very high by checking the located facilities and also by comparing the population transition results with newspaper reports. Additionally, we demonstrated analysis using our framework outputs that would help decision makers, such as the population transition and function period of each hotspot. The efficiency of our framework is also validated by checking the processing time, showing that it could be utilized efficiently in disasters of any scale. Our framework provides useful output for decision makers that manage evacuation shelters after various kinds of large scale disasters.","PeriodicalId":20525,"journal":{"name":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90412503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bicycle-sharing systems (BSSs) which provide short-term shared bike usage services for the public are becoming very popular in many large cities. The accelerating bike traveling demands from the public have driven several significant expansions of many BSSs to place additional bikes and stations in their extended service regions. Meanwhile, to capture individuals' traveling needs more precisely, in the expansion, many BSSs have set up online websites to receive station location suggestions from the public. In this paper, we will study the bike station re-deployment problem in the BSSs expansion. Besides the historical bike usage and construction cost information, the crowd suggestions are also incorporated in the problem. The station re-deployment problem is very challenging to solve, and it covers two sub-tasks simultaneously: (1) bike station locations identification, and (2) bike dock assignment (to the deployed stations). To address the problem, a novel bike station re-deployment framework, CrowdPlanning, is introduced in this paper. In both station deployment and capacity assignment tasks, CrowdPlanning fuses different categories of spatial information including the crowd suggestions, individuals' historical bike usage and the construction costs simultaneously. By formulating these two tasks as two optimization problems, the optimal expansion strategies can be identified by CrowdPlanning. for the BSSs. Extensive experiments are conducted on the real-world BSSs and crowd suggestion dataset to demonstrate the effectiveness of framework CrowdPlanning.
{"title":"Bicycle-sharing systems expansion: station re-deployment through crowd planning","authors":"Jiawei Zhang, Xiao Pan, Moyin Li, Philip S. Yu","doi":"10.1145/2996913.2996926","DOIUrl":"https://doi.org/10.1145/2996913.2996926","url":null,"abstract":"Bicycle-sharing systems (BSSs) which provide short-term shared bike usage services for the public are becoming very popular in many large cities. The accelerating bike traveling demands from the public have driven several significant expansions of many BSSs to place additional bikes and stations in their extended service regions. Meanwhile, to capture individuals' traveling needs more precisely, in the expansion, many BSSs have set up online websites to receive station location suggestions from the public. In this paper, we will study the bike station re-deployment problem in the BSSs expansion. Besides the historical bike usage and construction cost information, the crowd suggestions are also incorporated in the problem. The station re-deployment problem is very challenging to solve, and it covers two sub-tasks simultaneously: (1) bike station locations identification, and (2) bike dock assignment (to the deployed stations). To address the problem, a novel bike station re-deployment framework, CrowdPlanning, is introduced in this paper. In both station deployment and capacity assignment tasks, CrowdPlanning fuses different categories of spatial information including the crowd suggestions, individuals' historical bike usage and the construction costs simultaneously. By formulating these two tasks as two optimization problems, the optimal expansion strategies can be identified by CrowdPlanning. for the BSSs. Extensive experiments are conducted on the real-world BSSs and crowd suggestion dataset to demonstrate the effectiveness of framework CrowdPlanning.","PeriodicalId":20525,"journal":{"name":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83728517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A location corroboration of a person is a proof, in the form of a digital record, indicating that this person was at a particular place at a given time. That is, given a user u, a location l and a time t, a location corroboration is a certified evidence that u was at location l at time t. Such corroborations can be used in legal procedures, help solving personal disputes or enable services that rely on knowing with certainty the location of a user at a given time. A corroboration without traces means that the user location is not stored in any public server or in any other public entity, to protect the user privacy. In this paper we present the problem of producing a location corroboration without traces, using a mobile device, and we discuss possible solutions to it.
{"title":"Location corroborations by mobile devices without traces","authors":"Y. Kanza","doi":"10.1145/2996913.2997010","DOIUrl":"https://doi.org/10.1145/2996913.2997010","url":null,"abstract":"A location corroboration of a person is a proof, in the form of a digital record, indicating that this person was at a particular place at a given time. That is, given a user u, a location l and a time t, a location corroboration is a certified evidence that u was at location l at time t. Such corroborations can be used in legal procedures, help solving personal disputes or enable services that rely on knowing with certainty the location of a user at a given time. A corroboration without traces means that the user location is not stored in any public server or in any other public entity, to protect the user privacy. In this paper we present the problem of producing a location corroboration without traces, using a mobile device, and we discuss possible solutions to it.","PeriodicalId":20525,"journal":{"name":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72870961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Goncalves, T. V. Tilburg, K. Kyzirakos, F. Alvanaki, P. Koutsourakis, B. V. Werkhoven, W. V. Hage
3D digital city models, important for urban planning, are currently constructed from massive point clouds obtained through airborne LiDAR (Light Detection and Ranging). They are semantically enriched with information obtained from auxiliary GIS data like Cadastral data which contains information about the boundaries of properties, road networks, rivers, lakes etc. Technical advances in the LiDAR data acquisition systems made possible the rapid acquisition of high resolution topographical information for an entire country. Such data sets are now reaching the trillion points barrier. To cope with this data deluge and provide up-to-date 3D digital city models on demand current geospatial management strategies should be re-thought. This work presents a column-oriented Spatial Database Management System which provides in-situ data access, effective data skipping, efficient spatial operations, and interactive data visualization. Its efficiency and scalability is demonstrated using a dense LiDAR scan of The Netherlands consisting of 640 billion points and the latest Cadastral information, and compared with PostGIS.
{"title":"A spatial column-store to triangulate the Netherlands on the fly.","authors":"R. Goncalves, T. V. Tilburg, K. Kyzirakos, F. Alvanaki, P. Koutsourakis, B. V. Werkhoven, W. V. Hage","doi":"10.1145/2996913.2997005","DOIUrl":"https://doi.org/10.1145/2996913.2997005","url":null,"abstract":"3D digital city models, important for urban planning, are currently constructed from massive point clouds obtained through airborne LiDAR (Light Detection and Ranging). They are semantically enriched with information obtained from auxiliary GIS data like Cadastral data which contains information about the boundaries of properties, road networks, rivers, lakes etc. Technical advances in the LiDAR data acquisition systems made possible the rapid acquisition of high resolution topographical information for an entire country. Such data sets are now reaching the trillion points barrier. To cope with this data deluge and provide up-to-date 3D digital city models on demand current geospatial management strategies should be re-thought. This work presents a column-oriented Spatial Database Management System which provides in-situ data access, effective data skipping, efficient spatial operations, and interactive data visualization. Its efficiency and scalability is demonstrated using a dense LiDAR scan of The Netherlands consisting of 640 billion points and the latest Cadastral information, and compared with PostGIS.","PeriodicalId":20525,"journal":{"name":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84966063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
User identification across domains draws lots of research effort in recent years. Although most of existing works focus on user identification in a single space, in this paper, we first try to identify users by fusing their activities in cyber space and physical space, which helps us obtain a comprehensive understanding about users' online behaviours as well as offline visitation. Out profound insight to tackle this problem is that we can build a connection between the cyber space and the physical space with the stable location distribution of IP addresses. Thus, we propose a novel framework for user identification in cyber-physical space, which consists of three key steps: 1) modeling the location distribution of each IP address; 2) computing the co-occurrence with an inverted index to reduce the space and time cost; and 3) a learning-to-rank tactic to fuse user's features shared in both spaces to improve the accuracy. We conduct experiments to identify individual users from mobile query logs (generated in cyber space) and trajectory data (generated in physical space) to demonstrate the efficiency and effectiveness of our framework.
{"title":"User identification in cyber-physical space: a case study on mobile query logs and trajectories","authors":"Tianyi Hao, Jingbo Zhou, Yunsheng Cheng, Longbo Huang, Haishan Wu","doi":"10.1145/2996913.2997017","DOIUrl":"https://doi.org/10.1145/2996913.2997017","url":null,"abstract":"User identification across domains draws lots of research effort in recent years. Although most of existing works focus on user identification in a single space, in this paper, we first try to identify users by fusing their activities in cyber space and physical space, which helps us obtain a comprehensive understanding about users' online behaviours as well as offline visitation. Out profound insight to tackle this problem is that we can build a connection between the cyber space and the physical space with the stable location distribution of IP addresses. Thus, we propose a novel framework for user identification in cyber-physical space, which consists of three key steps: 1) modeling the location distribution of each IP address; 2) computing the co-occurrence with an inverted index to reduce the space and time cost; and 3) a learning-to-rank tactic to fuse user's features shared in both spaces to improve the accuracy. We conduct experiments to identify individual users from mobile query logs (generated in cyber space) and trajectory data (generated in physical space) to demonstrate the efficiency and effectiveness of our framework.","PeriodicalId":20525,"journal":{"name":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"38 2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87675824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work is motivated by a smart car application which analyses streams of data generated from cars to enhance transportation safety. We treated the problem as real-time abnormal driving behaviour detection using spatio-temporal data collected from mobile devices including GPS location, speed and steering angle. A concise summary was proposed to summarise spatial patterns from GPS trajectory data for efficient real-time anomaly detection. An approach solving this problem by nearest neighbour search has O(n) space and O(log(n) + k) query time complexity, where k is the neighbourhood size and n is the data size. On the other hand, the concise summary approach requires only O(ε * n) memory space and has O(log(ε * n)) query time complexity, where k is several orders of magnitude smaller than one. Experiments with two large datasets from Porto and Beijing showed that our method used only a few megabytes to summarise datasets with n = 80 million data points and was able to process 30K queries per second which was several orders of magnitude faster than the baseline approach. Besides, in the work, interesting spatio-temporal patterns regarding abnormal driving behaviours from the real-world datasets are also discussed to demonstrate potential application of the work in many industries including insurance, transportation safety enhancement and city transport management.
{"title":"A concise summary of spatial anomalies and its application in efficient real-time driving behaviour monitoring","authors":"Hoang Thanh Lam","doi":"10.1145/2996913.2996989","DOIUrl":"https://doi.org/10.1145/2996913.2996989","url":null,"abstract":"This work is motivated by a smart car application which analyses streams of data generated from cars to enhance transportation safety. We treated the problem as real-time abnormal driving behaviour detection using spatio-temporal data collected from mobile devices including GPS location, speed and steering angle. A concise summary was proposed to summarise spatial patterns from GPS trajectory data for efficient real-time anomaly detection. An approach solving this problem by nearest neighbour search has O(n) space and O(log(n) + k) query time complexity, where k is the neighbourhood size and n is the data size. On the other hand, the concise summary approach requires only O(ε * n) memory space and has O(log(ε * n)) query time complexity, where k is several orders of magnitude smaller than one. Experiments with two large datasets from Porto and Beijing showed that our method used only a few megabytes to summarise datasets with n = 80 million data points and was able to process 30K queries per second which was several orders of magnitude faster than the baseline approach. Besides, in the work, interesting spatio-temporal patterns regarding abnormal driving behaviours from the real-world datasets are also discussed to demonstrate potential application of the work in many industries including insurance, transportation safety enhancement and city transport management.","PeriodicalId":20525,"journal":{"name":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"137 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85369975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}