泰语研究者语料中人名前缀模式的发现及其应用

2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) Pub Date : 2020-06-01 DOI:10.1109/ecti-con49241.2020.9158214

Nongnuch Ketui, Nattapong Tongtep, T. Theeramunkong

{"title":"泰语研究者语料中人名前缀模式的发现及其应用","authors":"Nongnuch Ketui, Nattapong Tongtep, T. Theeramunkong","doi":"10.1109/ecti-con49241.2020.9158214","DOIUrl":null,"url":null,"abstract":"In the context of information extraction, a person’s name is one of the important named entities to be extracted which are applied to the question-answering and summarizing tasks. However, the boundary of a person’s name is still ambiguous since there are several writing patterns of a person’s name from online public data sources such as news, events, and researcher corpora. To extract, identify, and unify the person’s name, discovering the name prefix can be applied as clue words or phrases to such processes. In this paper, the name prefix discovering framework is proposed for collecting the integrated researcher corpus from various data sources and extracting name prefix patterns. Four main functions of the proposed framework are collecting data from data sources, tagging entities, preprocessing the researcher’s names, and finding the pattern of the personal name prefix. In this work, six data sources are gathered and ten entities related to the research domain are focused. The preprocessing data uses three sub-processes to provide the researcher’s name. The result shows that the 408 personal name prefixes are extracted. Moreover, the API development for extracting a person or researcher’s name is implemented using a Flask Python framework. The output of this work can be used to support the researcher’s name identification from the integrated researcher corpus.","PeriodicalId":371552,"journal":{"name":"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","volume":"2008 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discovering of Personal Name Prefix Patterns in Thai Researcher Corpus and Its Application\",\"authors\":\"Nongnuch Ketui, Nattapong Tongtep, T. Theeramunkong\",\"doi\":\"10.1109/ecti-con49241.2020.9158214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the context of information extraction, a person’s name is one of the important named entities to be extracted which are applied to the question-answering and summarizing tasks. However, the boundary of a person’s name is still ambiguous since there are several writing patterns of a person’s name from online public data sources such as news, events, and researcher corpora. To extract, identify, and unify the person’s name, discovering the name prefix can be applied as clue words or phrases to such processes. In this paper, the name prefix discovering framework is proposed for collecting the integrated researcher corpus from various data sources and extracting name prefix patterns. Four main functions of the proposed framework are collecting data from data sources, tagging entities, preprocessing the researcher’s names, and finding the pattern of the personal name prefix. In this work, six data sources are gathered and ten entities related to the research domain are focused. The preprocessing data uses three sub-processes to provide the researcher’s name. The result shows that the 408 personal name prefixes are extracted. Moreover, the API development for extracting a person or researcher’s name is implemented using a Flask Python framework. The output of this work can be used to support the researcher’s name identification from the integrated researcher corpus.\",\"PeriodicalId\":371552,\"journal\":{\"name\":\"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)\",\"volume\":\"2008 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ecti-con49241.2020.9158214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ecti-con49241.2020.9158214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在信息抽取中，人名是抽取的重要命名实体之一，用于问答和总结任务。然而，由于从新闻、事件和研究人员语料库等在线公共数据源中存在几种人名的书写模式，人名的边界仍然是模糊的。为了提取、识别和统一人名，发现姓名前缀可以作为线索词或短语应用于这些过程。本文提出了一个名称前缀发现框架，用于从各种数据源中收集集成研究者语料库并提取名称前缀模式。该框架的四个主要功能是:从数据源中收集数据、标记实体、对研究人员姓名进行预处理以及查找个人姓名前缀的模式。在这项工作中，收集了六个数据源，并集中了与研究领域相关的十个实体。预处理数据使用三个子过程来提供研究人员的姓名。结果表明，提取了408个个人姓名前缀。此外，用于提取个人或研究人员姓名的API开发是使用Flask Python框架实现的。这项工作的输出可用于支持从集成的研究人员语料库中识别研究人员的姓名。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Discovering of Personal Name Prefix Patterns in Thai Researcher Corpus and Its Application

In the context of information extraction, a person’s name is one of the important named entities to be extracted which are applied to the question-answering and summarizing tasks. However, the boundary of a person’s name is still ambiguous since there are several writing patterns of a person’s name from online public data sources such as news, events, and researcher corpora. To extract, identify, and unify the person’s name, discovering the name prefix can be applied as clue words or phrases to such processes. In this paper, the name prefix discovering framework is proposed for collecting the integrated researcher corpus from various data sources and extracting name prefix patterns. Four main functions of the proposed framework are collecting data from data sources, tagging entities, preprocessing the researcher’s names, and finding the pattern of the personal name prefix. In this work, six data sources are gathered and ten entities related to the research domain are focused. The preprocessing data uses three sub-processes to provide the researcher’s name. The result shows that the 408 personal name prefixes are extracted. Moreover, the API development for extracting a person or researcher’s name is implemented using a Flask Python framework. The output of this work can be used to support the researcher’s name identification from the integrated researcher corpus.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)

自引率

0.00%

发文量