泰语研究者语料中人名前缀模式的发现及其应用

Nongnuch Ketui, Nattapong Tongtep, T. Theeramunkong
{"title":"泰语研究者语料中人名前缀模式的发现及其应用","authors":"Nongnuch Ketui, Nattapong Tongtep, T. Theeramunkong","doi":"10.1109/ecti-con49241.2020.9158214","DOIUrl":null,"url":null,"abstract":"In the context of information extraction, a person’s name is one of the important named entities to be extracted which are applied to the question-answering and summarizing tasks. However, the boundary of a person’s name is still ambiguous since there are several writing patterns of a person’s name from online public data sources such as news, events, and researcher corpora. To extract, identify, and unify the person’s name, discovering the name prefix can be applied as clue words or phrases to such processes. In this paper, the name prefix discovering framework is proposed for collecting the integrated researcher corpus from various data sources and extracting name prefix patterns. Four main functions of the proposed framework are collecting data from data sources, tagging entities, preprocessing the researcher’s names, and finding the pattern of the personal name prefix. In this work, six data sources are gathered and ten entities related to the research domain are focused. The preprocessing data uses three sub-processes to provide the researcher’s name. The result shows that the 408 personal name prefixes are extracted. Moreover, the API development for extracting a person or researcher’s name is implemented using a Flask Python framework. The output of this work can be used to support the researcher’s name identification from the integrated researcher corpus.","PeriodicalId":371552,"journal":{"name":"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","volume":"2008 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discovering of Personal Name Prefix Patterns in Thai Researcher Corpus and Its Application\",\"authors\":\"Nongnuch Ketui, Nattapong Tongtep, T. Theeramunkong\",\"doi\":\"10.1109/ecti-con49241.2020.9158214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the context of information extraction, a person’s name is one of the important named entities to be extracted which are applied to the question-answering and summarizing tasks. However, the boundary of a person’s name is still ambiguous since there are several writing patterns of a person’s name from online public data sources such as news, events, and researcher corpora. To extract, identify, and unify the person’s name, discovering the name prefix can be applied as clue words or phrases to such processes. In this paper, the name prefix discovering framework is proposed for collecting the integrated researcher corpus from various data sources and extracting name prefix patterns. Four main functions of the proposed framework are collecting data from data sources, tagging entities, preprocessing the researcher’s names, and finding the pattern of the personal name prefix. In this work, six data sources are gathered and ten entities related to the research domain are focused. The preprocessing data uses three sub-processes to provide the researcher’s name. The result shows that the 408 personal name prefixes are extracted. Moreover, the API development for extracting a person or researcher’s name is implemented using a Flask Python framework. The output of this work can be used to support the researcher’s name identification from the integrated researcher corpus.\",\"PeriodicalId\":371552,\"journal\":{\"name\":\"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)\",\"volume\":\"2008 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ecti-con49241.2020.9158214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ecti-con49241.2020.9158214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在信息抽取中,人名是抽取的重要命名实体之一,用于问答和总结任务。然而,由于从新闻、事件和研究人员语料库等在线公共数据源中存在几种人名的书写模式,人名的边界仍然是模糊的。为了提取、识别和统一人名,发现姓名前缀可以作为线索词或短语应用于这些过程。本文提出了一个名称前缀发现框架,用于从各种数据源中收集集成研究者语料库并提取名称前缀模式。该框架的四个主要功能是:从数据源中收集数据、标记实体、对研究人员姓名进行预处理以及查找个人姓名前缀的模式。在这项工作中,收集了六个数据源,并集中了与研究领域相关的十个实体。预处理数据使用三个子过程来提供研究人员的姓名。结果表明,提取了408个个人姓名前缀。此外,用于提取个人或研究人员姓名的API开发是使用Flask Python框架实现的。这项工作的输出可用于支持从集成的研究人员语料库中识别研究人员的姓名。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Discovering of Personal Name Prefix Patterns in Thai Researcher Corpus and Its Application
In the context of information extraction, a person’s name is one of the important named entities to be extracted which are applied to the question-answering and summarizing tasks. However, the boundary of a person’s name is still ambiguous since there are several writing patterns of a person’s name from online public data sources such as news, events, and researcher corpora. To extract, identify, and unify the person’s name, discovering the name prefix can be applied as clue words or phrases to such processes. In this paper, the name prefix discovering framework is proposed for collecting the integrated researcher corpus from various data sources and extracting name prefix patterns. Four main functions of the proposed framework are collecting data from data sources, tagging entities, preprocessing the researcher’s names, and finding the pattern of the personal name prefix. In this work, six data sources are gathered and ten entities related to the research domain are focused. The preprocessing data uses three sub-processes to provide the researcher’s name. The result shows that the 408 personal name prefixes are extracted. Moreover, the API development for extracting a person or researcher’s name is implemented using a Flask Python framework. The output of this work can be used to support the researcher’s name identification from the integrated researcher corpus.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Simple Tunable Biquadratic Digital Bandpass Filter Design for Spectrum Sensing in Cognitive Radio ElectricVehicle Simulator Using DC Drives Comparison of Machine Learning Algorithm’s on Self-Driving Car Navigation using Nvidia Jetson Nano Enhancing CNN Based Knowledge Graph Embedding Algorithms Using Auxiliary Vectors: A Case Study of Wordnet Knowledge Graph A Study of Radiated EMI Predictions from Measured Common-mode Currents for Switching Power Supplies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1