Named Entity Recognition using CRF with Active Learning Algorithm in English Texts

2022 6th International Conference on Electronics, Communication and Aerospace Technology Pub Date : 2022-12-01 DOI:10.1109/ICECA55336.2022.10009592

B. VeeraSekharReddy, Koppula Srinivas Rao, Neeraja Koppula

{"title":"Named Entity Recognition using CRF with Active Learning Algorithm in English Texts","authors":"B. VeeraSekharReddy, Koppula Srinivas Rao, Neeraja Koppula","doi":"10.1109/ICECA55336.2022.10009592","DOIUrl":null,"url":null,"abstract":"Various Natural Language Processing (NLP) applications rely on Named Entity Recognition (NER) to help them sift through mountains of unstructured text data and find the information they need. Named Entity Recognition (NER) is the process of assigning labels to words in a text so that they can be sorted into categories. These state-of-the-art models achieve improved results despite limited resources, making language models increasingly valuable in a variety of NLP tasks. The Conditional Random Field and Active Learning Procedure form the basis of a novel Approach to named entity recognition discussed in this article. Following is an algorithmic description of how the AL-CRF model operates: Initially the samples are clustered with K-Means. Samples are used to train the fundamental CRF classifier, which is done by performing stratified sampling on the generated clusters. The following phase involves starting the selection process based on entropy. The training set is expanded to include examples with the greatest entropy values. The CRF classifier is then trained again using with a new training set, and the procedure is repeated. The AL's learning and selection procedure is repeatedly done until the harmonic mean stabilises and model for NER is obtained. The primary benefit of our method is that it is both more efficient and requires less manually marked training samples. Because of this, the procedure may become more reliable and cost-efficient.","PeriodicalId":356949,"journal":{"name":"2022 6th International Conference on Electronics, Communication and Aerospace Technology","volume":"144 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th International Conference on Electronics, Communication and Aerospace Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECA55336.2022.10009592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Various Natural Language Processing (NLP) applications rely on Named Entity Recognition (NER) to help them sift through mountains of unstructured text data and find the information they need. Named Entity Recognition (NER) is the process of assigning labels to words in a text so that they can be sorted into categories. These state-of-the-art models achieve improved results despite limited resources, making language models increasingly valuable in a variety of NLP tasks. The Conditional Random Field and Active Learning Procedure form the basis of a novel Approach to named entity recognition discussed in this article. Following is an algorithmic description of how the AL-CRF model operates: Initially the samples are clustered with K-Means. Samples are used to train the fundamental CRF classifier, which is done by performing stratified sampling on the generated clusters. The following phase involves starting the selection process based on entropy. The training set is expanded to include examples with the greatest entropy values. The CRF classifier is then trained again using with a new training set, and the procedure is repeated. The AL's learning and selection procedure is repeatedly done until the harmonic mean stabilises and model for NER is obtained. The primary benefit of our method is that it is both more efficient and requires less manually marked training samples. Because of this, the procedure may become more reliable and cost-efficient.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于CRF主动学习算法的英语文本命名实体识别

各种自然语言处理(NLP)应用程序依赖于命名实体识别(NER)来帮助它们筛选大量的非结构化文本数据并找到所需的信息。命名实体识别(NER)是为文本中的单词分配标签的过程，以便将它们分类。这些最先进的模型在资源有限的情况下取得了更好的结果，使得语言模型在各种NLP任务中越来越有价值。条件随机场和主动学习过程构成了本文讨论的命名实体识别新方法的基础。以下是AL-CRF模型如何运作的算法描述:最初，样本使用K-Means聚类。样本用于训练基本的CRF分类器，这是通过对生成的聚类进行分层抽样来完成的。接下来的阶段涉及到基于熵的选择过程。将训练集扩展到包含具有最大熵值的示例。然后使用新的训练集再次训练CRF分类器，并重复该过程。人工智能的学习和选择过程反复进行，直到得到调和均值稳定和NER模型。我们的方法的主要优点是它更有效，并且需要更少的人工标记训练样本。正因为如此，这个过程可能会变得更加可靠和经济。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 6th International Conference on Electronics, Communication and Aerospace Technology

自引率

0.00%

发文量