Automation of data preparation for mapping using natural language processing systems

A. Kolesnikov, Egor Plitchenko, Maria Kropacheva
{"title":"Automation of data preparation for mapping using natural language processing systems","authors":"A. Kolesnikov, Egor Plitchenko, Maria Kropacheva","doi":"10.35595/2414-9179-2022-1-28-659-669","DOIUrl":null,"url":null,"abstract":"The current level of development of information technology makes it possible to automate the processing of those types of data that only a specialist could previously work with. One such example is natural language processing technologies that implement the functions of sentiment analysis, machine translation, and question-answer systems. For the processes of creating cartographic and geoinformation works, the methods of extracting named entities are of the greatest interest, which allows extracting geographical names from unstructured text and linking named entities, which make it possible to create logical links between the extracted names of spatial objects. Their processing, through a local or network database of the service for geocoding, will automate the creation of map layers in a geographic information system based on text messages. The article describes the most popular approaches and their software implementations for solving the problem of extracting named entities in the example of texts of biographies and works of Siberian writers. Rule-based methodologies, maximum entropy models, and convolutional neural networks are analyzed. To assess the quality of the results of extracting geographical names and objects from the text, in addition to the standard F1-score, the authors propose an additional variant of the evaluation method that takes into account a larger number of criteria and is also based on an error matrix. The description of text block markup formats is given to improve the quality of recognition and expand the possible options for geographical names of named entities based on additional training of the neural network model.","PeriodicalId":31498,"journal":{"name":"InterCarto InterGIS","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"InterCarto InterGIS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35595/2414-9179-2022-1-28-659-669","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The current level of development of information technology makes it possible to automate the processing of those types of data that only a specialist could previously work with. One such example is natural language processing technologies that implement the functions of sentiment analysis, machine translation, and question-answer systems. For the processes of creating cartographic and geoinformation works, the methods of extracting named entities are of the greatest interest, which allows extracting geographical names from unstructured text and linking named entities, which make it possible to create logical links between the extracted names of spatial objects. Their processing, through a local or network database of the service for geocoding, will automate the creation of map layers in a geographic information system based on text messages. The article describes the most popular approaches and their software implementations for solving the problem of extracting named entities in the example of texts of biographies and works of Siberian writers. Rule-based methodologies, maximum entropy models, and convolutional neural networks are analyzed. To assess the quality of the results of extracting geographical names and objects from the text, in addition to the standard F1-score, the authors propose an additional variant of the evaluation method that takes into account a larger number of criteria and is also based on an error matrix. The description of text block markup formats is given to improve the quality of recognition and expand the possible options for geographical names of named entities based on additional training of the neural network model.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用自然语言处理系统进行制图数据准备的自动化
目前信息技术的发展水平使以前只有专家才能处理的那些类型的数据的处理自动化成为可能。其中一个例子是实现情感分析、机器翻译和问答系统功能的自然语言处理技术。对于创建地图和地理信息作品的过程,提取命名实体的方法是最感兴趣的,它允许从非结构化文本中提取地理名称和链接命名实体,这使得在提取的空间对象名称之间创建逻辑链接成为可能。通过地理编码服务的本地或网络数据库对它们进行处理,将在基于文本信息的地理信息系统中自动创建地图层。本文以西伯利亚作家的传记文本和作品为例,介绍了解决命名实体提取问题的最流行的方法及其软件实现。基于规则的方法,最大熵模型和卷积神经网络进行了分析。为了评估从文本中提取地名和物体的结果的质量,除了标准的f1分数外,作者还提出了评估方法的另一种变体,该方法考虑了更多的标准,并且也是基于误差矩阵。本文给出了文本块标记格式的描述,以提高识别质量,并在神经网络模型的额外训练基础上扩展命名实体地理名称的可能选项。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
0.90
自引率
0.00%
发文量
2
审稿时长
8 weeks
期刊最新文献
Creating virtual models for designing port complexes based on lidar data Methodology of constructive approach in geoinformation mapping of geographical environment Cartographic support of identification of natural and man-made sources of dissolved matter in lake Baikal basin Geoinformation support of the school course of geography Evaluation of the possibility of vegetation interpretation on thermal infrared satellite images, case of the Southern Urals and Kuznetsk Alatau
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1