In this talk we present two historical models of human migration from the 19th and 20th centuries, and discuss how they apply to location data on the Web, in the 21st Century.
{"title":"Two Ways of Thinking About Where People Go","authors":"Vanessa Murdock","doi":"10.1145/2663713.2664429","DOIUrl":"https://doi.org/10.1145/2663713.2664429","url":null,"abstract":"In this talk we present two historical models of human migration from the 19th and 20th centuries, and discuss how they apply to location data on the Web, in the 21st Century.","PeriodicalId":320466,"journal":{"name":"International Workshop on Location and the Web","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114784125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Record linkage is the task of identifying which records in one or more data collections refer to the same entity, and address is one of the most commonly used fields in databases. Hence, segmentation of the raw addresses into a set of semantic fields is the primary step in this task. In this paper, we present a probabilistic address parsing system based on the Hidden Markov Model. We also introduce several novel approaches of synthetic training data generation to build robust models for noisy real-world addresses, obtaining 95.6% F-measure. Furthermore, we demonstrate the viability and efficiency of this system for large-scale data by scaling it up to parse billions of addresses.
{"title":"HMM-based Address Parsing with Massive Synthetic Training Data Generation","authors":"Xiang Li, Hakan Kardes, Xin Wang, Ang Sun","doi":"10.1145/2663713.2664430","DOIUrl":"https://doi.org/10.1145/2663713.2664430","url":null,"abstract":"Record linkage is the task of identifying which records in one or more data collections refer to the same entity, and address is one of the most commonly used fields in databases. Hence, segmentation of the raw addresses into a set of semantic fields is the primary step in this task. In this paper, we present a probabilistic address parsing system based on the Hidden Markov Model. We also introduce several novel approaches of synthetic training data generation to build robust models for noisy real-world addresses, obtaining 95.6% F-measure. Furthermore, we demonstrate the viability and efficiency of this system for large-scale data by scaling it up to parse billions of addresses.","PeriodicalId":320466,"journal":{"name":"International Workshop on Location and the Web","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125309538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Chang, Yao-Chung Fan, Kuo-Chen Wu, Arbee L. P. Chen
Over the recent years smart devices have become a ubiquitous medium supporting various forms of functionality and are widely accepted for common users. One distinguishing feature for smart devices is the ability of positioning the physical location of a device, and numerous applications based on user location information have been proposed. While the potentials have been foreseen, location based services fundamentally suffer from the problem of lacking an effective and scalable mechanism to bridge the gap between the machine-observed locations and the human understandable places. In this study, we contribute on this fundamental problem. Differing from the existing solutions on this subject, we start from a novel perspective; we propose to address the place semantic understanding problem by casting it as a classification problem and employ machine learning techniques to automatically infer the types of the places. The key observation is that human behaviors are not random, e.g., people visit restaurants around noon, go for work in the daytime, and stay at home at night. Namely, by properly selecting features, a mechanism for automatically inferring place type semantics can be achieved. This paper summarizes our treatment and findings of leveraging the human behaviors patterns to infer the type of a place. Experiments using month-long trace logs from the recruited participants are conducted, and the experiment results demonstrate the effectiveness of the proposed method.
{"title":"On the Semantic Annotation of Daily Places: A Machine-Learning Approach","authors":"C. Chang, Yao-Chung Fan, Kuo-Chen Wu, Arbee L. P. Chen","doi":"10.1145/2663713.2664424","DOIUrl":"https://doi.org/10.1145/2663713.2664424","url":null,"abstract":"Over the recent years smart devices have become a ubiquitous medium supporting various forms of functionality and are widely accepted for common users. One distinguishing feature for smart devices is the ability of positioning the physical location of a device, and numerous applications based on user location information have been proposed. While the potentials have been foreseen, location based services fundamentally suffer from the problem of lacking an effective and scalable mechanism to bridge the gap between the machine-observed locations and the human understandable places. In this study, we contribute on this fundamental problem. Differing from the existing solutions on this subject, we start from a novel perspective; we propose to address the place semantic understanding problem by casting it as a classification problem and employ machine learning techniques to automatically infer the types of the places. The key observation is that human behaviors are not random, e.g., people visit restaurants around noon, go for work in the daytime, and stay at home at night. Namely, by properly selecting features, a mechanism for automatically inferring place type semantics can be achieved. This paper summarizes our treatment and findings of leveraging the human behaviors patterns to infer the type of a place. Experiments using month-long trace logs from the recruited participants are conducted, and the experiment results demonstrate the effectiveness of the proposed method.","PeriodicalId":320466,"journal":{"name":"International Workshop on Location and the Web","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131961592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the proliferation of smartphones and the increasing popularity of social media, people have developed habits of posting not only their thoughts and opinions, but also content concerning their whereabouts. On such highly-interactive yet informal social media platforms, people make heavy use of informal language, including when it comes to locative expressions. Such usage inhibits the ability of traditional Natural Language Processing approaches to retrieve geospatial information from social media text. In this research, we: (1) develop a medium-scale corpus of "locative expressions" derived from a variety of social media sources; (2) benchmark the performance of a range of geoparsers over the corpus, with the finding that even the best-performing systems are substantially lacking; and (3) carry out extensive error analysis to suggest ways of improving the accuracy and robustness of geoparsers.
{"title":"Automatic Identification of Locative Expressions from Social Media Text: A Comparative Analysis","authors":"Fei Liu, M. Vasardani, Timothy Baldwin","doi":"10.1145/2663713.2664426","DOIUrl":"https://doi.org/10.1145/2663713.2664426","url":null,"abstract":"With the proliferation of smartphones and the increasing popularity of social media, people have developed habits of posting not only their thoughts and opinions, but also content concerning their whereabouts. On such highly-interactive yet informal social media platforms, people make heavy use of informal language, including when it comes to locative expressions. Such usage inhibits the ability of traditional Natural Language Processing approaches to retrieve geospatial information from social media text. In this research, we: (1) develop a medium-scale corpus of \"locative expressions\" derived from a variety of social media sources; (2) benchmark the performance of a range of geoparsers over the corpus, with the finding that even the best-performing systems are substantially lacking; and (3) carry out extensive error analysis to suggest ways of improving the accuracy and robustness of geoparsers.","PeriodicalId":320466,"journal":{"name":"International Workshop on Location and the Web","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134043795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Indoor positioning system (IPS) identifies positions of various indoor objects, and is a key technology to achieve sophisticated Indoor Location-Aware Services (InLAS). In most conventional systems, InLAS and IPS are tightly coupled. That is, one system does not supposed to reuse indoor location data and program of another system. This makes individual systems complex and difficult to manage. To cope with the problem, we propose Data Model for Indoor Location (DM4InL), which prescribes a common data schema, independent of implementation of IPS or the usage of InLAS. The proposed DM4InL represents the location of every indoor object in a standard way, by using three kinds of models: location, building and object models. We also design the fundamental API, which implements typical queries to the indoor location data from external applications. The proposed method achieves loose-coupling of InLAS and IPS, which significantly improves the efficiency and reusability in the InLAS development.
{"title":"Considering Common Data Model for Indoor Location-aware Services","authors":"L. Niu, S. Matsumoto, S. Saiki, Masahide Nakamura","doi":"10.1145/2663713.2664423","DOIUrl":"https://doi.org/10.1145/2663713.2664423","url":null,"abstract":"Indoor positioning system (IPS) identifies positions of various indoor objects, and is a key technology to achieve sophisticated Indoor Location-Aware Services (InLAS). In most conventional systems, InLAS and IPS are tightly coupled. That is, one system does not supposed to reuse indoor location data and program of another system. This makes individual systems complex and difficult to manage. To cope with the problem, we propose Data Model for Indoor Location (DM4InL), which prescribes a common data schema, independent of implementation of IPS or the usage of InLAS. The proposed DM4InL represents the location of every indoor object in a standard way, by using three kinds of models: location, building and object models. We also design the fundamental API, which implements typical queries to the indoor location data from external applications. The proposed method achieves loose-coupling of InLAS and IPS, which significantly improves the efficiency and reusability in the InLAS development.","PeriodicalId":320466,"journal":{"name":"International Workshop on Location and the Web","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130563563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper is concerned with the automatic prediction of the zoom level at which to present a map with results for an informal location description. We propose the use of identifiability (relative uniqueness) and zoom level of each component geospatial expression (GE) in the location description, as a means of predicting the appropriate zoom level for the overall description. We apply a simple classification approach to zoom level prediction, and compare results using gold-standard and automatically-inferred GE information. We find the approach to have strong promise, including relative to the zoom level used in results from Google Maps for our location descriptions dataset.
{"title":"Automatic Zoom Level Prediction for Informal Location Descriptions","authors":"Igor Tytyk, Timothy Baldwin","doi":"10.1145/2663713.2664427","DOIUrl":"https://doi.org/10.1145/2663713.2664427","url":null,"abstract":"This paper is concerned with the automatic prediction of the zoom level at which to present a map with results for an informal location description. We propose the use of identifiability (relative uniqueness) and zoom level of each component geospatial expression (GE) in the location description, as a means of predicting the appropriate zoom level for the overall description. We apply a simple classification approach to zoom level prediction, and compare results using gold-standard and automatically-inferred GE information. We find the approach to have strong promise, including relative to the zoom level used in results from Google Maps for our location descriptions dataset.","PeriodicalId":320466,"journal":{"name":"International Workshop on Location and the Web","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123666350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Location nowadays is an important aspect of the Web. One scenario in this respect are archives or collections of geo-tagged media items. More concretely, we can think of collections in the arts and humanities available via OAI-PMH (a protocol for metadata harvesting) on the web or web accessible personal media archives maintained in a peer-to-peer manner. In such scenarios for search problems, source selection becomes an important aspect. For example, we would like to access only those collections containing media items in a certain geospatial region (maybe we are interested in images from Shanghai only). Here, the geospatial search criterion allows for a high selectivity. What is needed in such a scenario are expressive and nevertheless compact representations or descriptions of the ``geospatial footprint'' of each collection. A minimum bounding rectangle would be a trivial but not very accurate option. Generally, summarization techniques for this purpose can be distinguished into three categories, geometric approaches, space partitioning approaches and hybrid approaches. In this work, we present novel hybrid techniques, which mostly apply a set of approximating minimum area rectangles for subspace description together with quantization techniques in order to increase the selectivity of the summaries and, at the same time, keep the storage requirements small.
{"title":"Hybrid Quantized Resource Descriptions for Geospatial Source Selection","authors":"Stefan Kufer, A. Henrich","doi":"10.1145/2663713.2664428","DOIUrl":"https://doi.org/10.1145/2663713.2664428","url":null,"abstract":"Location nowadays is an important aspect of the Web. One scenario in this respect are archives or collections of geo-tagged media items. More concretely, we can think of collections in the arts and humanities available via OAI-PMH (a protocol for metadata harvesting) on the web or web accessible personal media archives maintained in a peer-to-peer manner. In such scenarios for search problems, source selection becomes an important aspect. For example, we would like to access only those collections containing media items in a certain geospatial region (maybe we are interested in images from Shanghai only). Here, the geospatial search criterion allows for a high selectivity. What is needed in such a scenario are expressive and nevertheless compact representations or descriptions of the ``geospatial footprint'' of each collection. A minimum bounding rectangle would be a trivial but not very accurate option. Generally, summarization techniques for this purpose can be distinguished into three categories, geometric approaches, space partitioning approaches and hybrid approaches. In this work, we present novel hybrid techniques, which mostly apply a set of approximating minimum area rectangles for subspace description together with quantization techniques in order to increase the selectivity of the summaries and, at the same time, keep the storage requirements small.","PeriodicalId":320466,"journal":{"name":"International Workshop on Location and the Web","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130384690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Third International Workshop on Location and the Web (LocWeb 2010) focuses on research and development that targets the intersection of Internet-enabled location-aware and/or located devices, and services based on Web technologies and Web architecture. The rapid rise of multi-sensory mobile devices and Internet-enabled "things" equipped with sensors and ubiquitous connectivity opens new possibilities and provides the foundations to capture, share and use Web services and applications in ways which go beyond the traditional scenarios of stationary or even mobile computer-like devices. Increasingly, applications will have to bridge the physical world and the Web space, and location is one of the major connecting links. When Web services will "surround" users, designers have to address the challenges of scalability and interoperability on the Web, and designers also have to look at policy, regulatory, and legislative responses to the privacy and security challenges created by something as sensitive as location information.
{"title":"LocWeb 2010: Third International Workshop on Location and the Web","authors":"Erik Wilde, Susanne CJ Boll, Johannes Schöning","doi":"10.1145/1899662.1899663","DOIUrl":"https://doi.org/10.1145/1899662.1899663","url":null,"abstract":"The Third International Workshop on Location and the Web (LocWeb 2010) focuses on research and development that targets the intersection of Internet-enabled location-aware and/or located devices, and services based on Web technologies and Web architecture. The rapid rise of multi-sensory mobile devices and Internet-enabled \"things\" equipped with sensors and ubiquitous connectivity opens new possibilities and provides the foundations to capture, share and use Web services and applications in ways which go beyond the traditional scenarios of stationary or even mobile computer-like devices. Increasingly, applications will have to bridge the physical world and the Web space, and location is one of the major connecting links. When Web services will \"surround\" users, designers have to address the challenges of scalability and interoperability on the Web, and designers also have to look at policy, regulatory, and legislative responses to the privacy and security challenges created by something as sensitive as location information.","PeriodicalId":320466,"journal":{"name":"International Workshop on Location and the Web","volume":"46 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114110153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce an iPhone / iPod touch timetable application using WiFi location system named "Eki.Locky". This application adopts UGC (User Generated Content) approach to collect TimeTable information and WiFi access point (AP) information from public users. Since the service started in October 2009, Eki. Locky has been used by over 440,000 people, posted timetable information covers 98% of all stations in Japan and 350,000 WiFi AP information were collected. In addition, since June 2010, we started a new version of this application named "TimeTable.Locky" which supports any timetable such as buses and airplanes. TimeTable.Locky also has been used large number of people, and collected over 23,000 of timetable information.
{"title":"TimeTable.Locky: nation wide WiFi location information system based on user contributed information","authors":"Motoki Yano, K. Kaji, Nobuo Kawaguchi","doi":"10.1145/1899662.1899669","DOIUrl":"https://doi.org/10.1145/1899662.1899669","url":null,"abstract":"We introduce an iPhone / iPod touch timetable application using WiFi location system named \"Eki.Locky\". This application adopts UGC (User Generated Content) approach to collect TimeTable information and WiFi access point (AP) information from public users. Since the service started in October 2009, Eki. Locky has been used by over 440,000 people, posted timetable information covers 98% of all stations in Japan and 350,000 WiFi AP information were collected. In addition, since June 2010, we started a new version of this application named \"TimeTable.Locky\" which supports any timetable such as buses and airplanes. TimeTable.Locky also has been used large number of people, and collected over 23,000 of timetable information.","PeriodicalId":320466,"journal":{"name":"International Workshop on Location and the Web","volume":"696 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133970279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gwan Jang, Keun-Chan Park, Kyung-min Kim, Yoonjae Jeong, Sung-Hyon Myaeng
In providing a service to mobile users, it would be critical to know what types of information they would look for in association with geo-referenced entities that may be extractable from queries or contexts. While understanding high-level user intentions in accessing the Web, such as informational, navigational, and transactional, is useful, a finer-level classification of user interests would further help adapting mobile search results to user intensions. Our research focuses on understanding what aspects of geo-referenced entities are mentioned in user queries in an attempt to create a model for user intents in geo-referenced Web searching. By collecting and analyzing geo-referenced questions posed to operational question answering systems, we delineated major aspects of non-topical information that people would seek in association with geographic information. The identified aspects were further conceptualized to develop a user interest model with three dimensions, which was validated with two sets of data. The model can be a basis for identifying user's intent in a mobile search context as well as classifying geo-related text to be retrieved for its aspectual category.
{"title":"What aspects do people search in geo-referenced text?","authors":"Gwan Jang, Keun-Chan Park, Kyung-min Kim, Yoonjae Jeong, Sung-Hyon Myaeng","doi":"10.1145/1899662.1899666","DOIUrl":"https://doi.org/10.1145/1899662.1899666","url":null,"abstract":"In providing a service to mobile users, it would be critical to know what types of information they would look for in association with geo-referenced entities that may be extractable from queries or contexts. While understanding high-level user intentions in accessing the Web, such as informational, navigational, and transactional, is useful, a finer-level classification of user interests would further help adapting mobile search results to user intensions. Our research focuses on understanding what aspects of geo-referenced entities are mentioned in user queries in an attempt to create a model for user intents in geo-referenced Web searching. By collecting and analyzing geo-referenced questions posed to operational question answering systems, we delineated major aspects of non-topical information that people would seek in association with geographic information. The identified aspects were further conceptualized to develop a user interest model with three dimensions, which was validated with two sets of data. The model can be a basis for identifying user's intent in a mobile search context as well as classifying geo-related text to be retrieved for its aspectual category.","PeriodicalId":320466,"journal":{"name":"International Workshop on Location and the Web","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130362864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}