{"title":"利用自增长特征的新型中文地址分割方法","authors":"Yong Zhang, Yingqiu Li, Fengkun Li, Yujun Shen, Yanxin Xu","doi":"10.56028/aetr.8.1.169.2023","DOIUrl":null,"url":null,"abstract":"Chinese Address Segmentation (CAS) is a crucial step that can greatly enhance the performance, accuracy, and reliability of geo-coding technology. However, it presents a tremendous challenge due to the inherent lack of obvious word boundaries, complex grammatical and semantic features. To address this challenge, we propose a novel CAS method or model that starts from scratch, without relying on any pre-installed knowledge about Chinese addresses. Instead, it dynamically evolves and grows its knowledge library by leveraging contextual information and comparing addresses during the process of dividing them into address elements. Our approach does not rely on Chinese language or address-element dictionaries, nor does it depend on address statistics. The knowledge library is automatically extracted and organized in a tree data structure. This unique approach allows our method to effectively segment addresses from any area of China, including regions with intricate address expressions, such as the Inner Mongolia Autonomous Region. Experimental results demonstrate that our method achieves high precision in address segmentation.","PeriodicalId":502380,"journal":{"name":"Advances in Engineering Technology Research","volume":"66 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Chinese Address Segmentation Method with Self-growth Feature\",\"authors\":\"Yong Zhang, Yingqiu Li, Fengkun Li, Yujun Shen, Yanxin Xu\",\"doi\":\"10.56028/aetr.8.1.169.2023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Chinese Address Segmentation (CAS) is a crucial step that can greatly enhance the performance, accuracy, and reliability of geo-coding technology. However, it presents a tremendous challenge due to the inherent lack of obvious word boundaries, complex grammatical and semantic features. To address this challenge, we propose a novel CAS method or model that starts from scratch, without relying on any pre-installed knowledge about Chinese addresses. Instead, it dynamically evolves and grows its knowledge library by leveraging contextual information and comparing addresses during the process of dividing them into address elements. Our approach does not rely on Chinese language or address-element dictionaries, nor does it depend on address statistics. The knowledge library is automatically extracted and organized in a tree data structure. This unique approach allows our method to effectively segment addresses from any area of China, including regions with intricate address expressions, such as the Inner Mongolia Autonomous Region. Experimental results demonstrate that our method achieves high precision in address segmentation.\",\"PeriodicalId\":502380,\"journal\":{\"name\":\"Advances in Engineering Technology Research\",\"volume\":\"66 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Engineering Technology Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.56028/aetr.8.1.169.2023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Engineering Technology Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56028/aetr.8.1.169.2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
中文地址分割(CAS)是一个关键步骤,可大大提高地理编码技术的性能、准确性和可靠性。然而,由于缺乏明显的词界、复杂的语法和语义特征,它带来了巨大的挑战。为了应对这一挑战,我们提出了一种新颖的 CAS 方法或模型,这种方法或模型从零开始,不依赖任何预装的中文地址知识。相反,它在将地址划分为地址元素的过程中,通过利用上下文信息和比较地址,动态地发展和壮大其知识库。我们的方法不依赖中文词典或地址元素词典,也不依赖地址统计数据。知识库是自动提取的,并以树形数据结构进行组织。这种独特的方法使我们的方法能够有效地分割中国任何地区的地址,包括地址表达错综复杂的地区,如内蒙古自治区。实验结果表明,我们的方法实现了高精度的地址分割。
A Novel Chinese Address Segmentation Method with Self-growth Feature
Chinese Address Segmentation (CAS) is a crucial step that can greatly enhance the performance, accuracy, and reliability of geo-coding technology. However, it presents a tremendous challenge due to the inherent lack of obvious word boundaries, complex grammatical and semantic features. To address this challenge, we propose a novel CAS method or model that starts from scratch, without relying on any pre-installed knowledge about Chinese addresses. Instead, it dynamically evolves and grows its knowledge library by leveraging contextual information and comparing addresses during the process of dividing them into address elements. Our approach does not rely on Chinese language or address-element dictionaries, nor does it depend on address statistics. The knowledge library is automatically extracted and organized in a tree data structure. This unique approach allows our method to effectively segment addresses from any area of China, including regions with intricate address expressions, such as the Inner Mongolia Autonomous Region. Experimental results demonstrate that our method achieves high precision in address segmentation.