利用自增长特征的新型中文地址分割方法

Yong Zhang, Yingqiu Li, Fengkun Li, Yujun Shen, Yanxin Xu
{"title":"利用自增长特征的新型中文地址分割方法","authors":"Yong Zhang, Yingqiu Li, Fengkun Li, Yujun Shen, Yanxin Xu","doi":"10.56028/aetr.8.1.169.2023","DOIUrl":null,"url":null,"abstract":"Chinese Address Segmentation (CAS) is a crucial step that can greatly enhance the performance, accuracy, and reliability of geo-coding technology. However, it presents a tremendous challenge due to the inherent lack of obvious word boundaries, complex grammatical and semantic features. To address this challenge, we propose a novel CAS method or model that starts from scratch, without relying on any pre-installed knowledge about Chinese addresses. Instead, it dynamically evolves and grows its knowledge library by leveraging contextual information and comparing addresses during the process of dividing them into address elements. Our approach does not rely on Chinese language or address-element dictionaries, nor does it depend on address statistics. The knowledge library is automatically extracted and organized in a tree data structure. This unique approach allows our method to effectively segment addresses from any area of China, including regions with intricate address expressions, such as the Inner Mongolia Autonomous Region. Experimental results demonstrate that our method achieves high precision in address segmentation.","PeriodicalId":502380,"journal":{"name":"Advances in Engineering Technology Research","volume":"66 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Chinese Address Segmentation Method with Self-growth Feature\",\"authors\":\"Yong Zhang, Yingqiu Li, Fengkun Li, Yujun Shen, Yanxin Xu\",\"doi\":\"10.56028/aetr.8.1.169.2023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Chinese Address Segmentation (CAS) is a crucial step that can greatly enhance the performance, accuracy, and reliability of geo-coding technology. However, it presents a tremendous challenge due to the inherent lack of obvious word boundaries, complex grammatical and semantic features. To address this challenge, we propose a novel CAS method or model that starts from scratch, without relying on any pre-installed knowledge about Chinese addresses. Instead, it dynamically evolves and grows its knowledge library by leveraging contextual information and comparing addresses during the process of dividing them into address elements. Our approach does not rely on Chinese language or address-element dictionaries, nor does it depend on address statistics. The knowledge library is automatically extracted and organized in a tree data structure. This unique approach allows our method to effectively segment addresses from any area of China, including regions with intricate address expressions, such as the Inner Mongolia Autonomous Region. Experimental results demonstrate that our method achieves high precision in address segmentation.\",\"PeriodicalId\":502380,\"journal\":{\"name\":\"Advances in Engineering Technology Research\",\"volume\":\"66 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Engineering Technology Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.56028/aetr.8.1.169.2023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Engineering Technology Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56028/aetr.8.1.169.2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

中文地址分割(CAS)是一个关键步骤,可大大提高地理编码技术的性能、准确性和可靠性。然而,由于缺乏明显的词界、复杂的语法和语义特征,它带来了巨大的挑战。为了应对这一挑战,我们提出了一种新颖的 CAS 方法或模型,这种方法或模型从零开始,不依赖任何预装的中文地址知识。相反,它在将地址划分为地址元素的过程中,通过利用上下文信息和比较地址,动态地发展和壮大其知识库。我们的方法不依赖中文词典或地址元素词典,也不依赖地址统计数据。知识库是自动提取的,并以树形数据结构进行组织。这种独特的方法使我们的方法能够有效地分割中国任何地区的地址,包括地址表达错综复杂的地区,如内蒙古自治区。实验结果表明,我们的方法实现了高精度的地址分割。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Novel Chinese Address Segmentation Method with Self-growth Feature
Chinese Address Segmentation (CAS) is a crucial step that can greatly enhance the performance, accuracy, and reliability of geo-coding technology. However, it presents a tremendous challenge due to the inherent lack of obvious word boundaries, complex grammatical and semantic features. To address this challenge, we propose a novel CAS method or model that starts from scratch, without relying on any pre-installed knowledge about Chinese addresses. Instead, it dynamically evolves and grows its knowledge library by leveraging contextual information and comparing addresses during the process of dividing them into address elements. Our approach does not rely on Chinese language or address-element dictionaries, nor does it depend on address statistics. The knowledge library is automatically extracted and organized in a tree data structure. This unique approach allows our method to effectively segment addresses from any area of China, including regions with intricate address expressions, such as the Inner Mongolia Autonomous Region. Experimental results demonstrate that our method achieves high precision in address segmentation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Novel Financial Anti-fraud Method based on Machine Learning Algorithms Network Pharmacology Study on the Neurotoxic Mechanism of Acorus tatarinowii Analysis on the Growth of Shared Bike Users Based on Random Forest Model The synthesis of acetone from isobutane with the intermediate of di-tert-butyl peroxide A Blockchain-Based Intelligent Data Management Platform for Power Grid Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1