{"title":"An Active Transfer Learning Method Combining Uncertainty with Diversity for Chinese Address Resolution","authors":"Yuwei Hu, Xueyuan Zheng, Ping Zong","doi":"10.1145/3581807.3581902","DOIUrl":null,"url":null,"abstract":"Chinese address resolution (CAR) is a key step in geocoding technology, and the resolution results directly affect the service quality of address-based applications. Deep learning models have been widely used in CAR task but they require abundant annotated address data to obtain satisfied performance. In this paper, an active transfer learning method combining uncertainty with diversity for CAR is proposed, for which the main goal is to mitigate the annotation requirement for unlabeled address in the target region and to Improve the utilization of labeled data in the source region. Considering the correlation among Chinese addresses, we propose a clustering method of unlabeled address on the basis of feature words, mined from address data based on LDA model, to reflect the distribution of the address. A metric of comprehensive sample strategy combing uncertainty with diversity (CSSCUD) is constructed to select training samples from the target region, which can obtain high valuable samples by considering informativeness and distribution in feature words space jointly in each batch. Experiments on the address dataset from two different regions show that the comprehensive active transfer learning method achieves a higher resolution accuracy than various baselines by using the same number of labeled training samples, which illustrates that the proposed method is effective and practical for CAR.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581902","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Chinese address resolution (CAR) is a key step in geocoding technology, and the resolution results directly affect the service quality of address-based applications. Deep learning models have been widely used in CAR task but they require abundant annotated address data to obtain satisfied performance. In this paper, an active transfer learning method combining uncertainty with diversity for CAR is proposed, for which the main goal is to mitigate the annotation requirement for unlabeled address in the target region and to Improve the utilization of labeled data in the source region. Considering the correlation among Chinese addresses, we propose a clustering method of unlabeled address on the basis of feature words, mined from address data based on LDA model, to reflect the distribution of the address. A metric of comprehensive sample strategy combing uncertainty with diversity (CSSCUD) is constructed to select training samples from the target region, which can obtain high valuable samples by considering informativeness and distribution in feature words space jointly in each batch. Experiments on the address dataset from two different regions show that the comprehensive active transfer learning method achieves a higher resolution accuracy than various baselines by using the same number of labeled training samples, which illustrates that the proposed method is effective and practical for CAR.