Tin Vu, Solluna Liu, Renzhong Wang, Kumarswamy Valegerepura
{"title":"Noise Prediction for Geocoding Queries using Word Geospatial Embedding and Bidirectional LSTM","authors":"Tin Vu, Solluna Liu, Renzhong Wang, Kumarswamy Valegerepura","doi":"10.1145/3397536.3422201","DOIUrl":null,"url":null,"abstract":"User geocoding queries in map applications often contain noisy tokens such as typos in street, city name, wrong postal code, redundant words due to copy-paste action, etc. This issue becomes worse with the rapid growth of mobile devices, where errors from user input are inevitable. Such noisy tokens may fail the searching process if they are passed as-is to the downstream query processing components. In particular, there might be nothing or irrelevant results returned to the user. Therefore, noisy tokens in geocoding queries should be recognized and handled properly prior to the searching process. In this paper, a deep learning based noise prediction model for geocoding queries is proposed. It combines a novel Word Geospatial Embedding (WGE) and a Bidirectional LSTM based sequence tagging model. The proposed WGE is the first language model that allows geospatial semantics to be encoded into the vector representations. It allows geo-related machine learning/deep learning models making spatial-aware prediction.","PeriodicalId":233918,"journal":{"name":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3397536.3422201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
User geocoding queries in map applications often contain noisy tokens such as typos in street, city name, wrong postal code, redundant words due to copy-paste action, etc. This issue becomes worse with the rapid growth of mobile devices, where errors from user input are inevitable. Such noisy tokens may fail the searching process if they are passed as-is to the downstream query processing components. In particular, there might be nothing or irrelevant results returned to the user. Therefore, noisy tokens in geocoding queries should be recognized and handled properly prior to the searching process. In this paper, a deep learning based noise prediction model for geocoding queries is proposed. It combines a novel Word Geospatial Embedding (WGE) and a Bidirectional LSTM based sequence tagging model. The proposed WGE is the first language model that allows geospatial semantics to be encoded into the vector representations. It allows geo-related machine learning/deep learning models making spatial-aware prediction.