Named Entity Recognition using CRF with Active Learning Algorithm in English Texts

B. VeeraSekharReddy, Koppula Srinivas Rao, Neeraja Koppula
{"title":"Named Entity Recognition using CRF with Active Learning Algorithm in English Texts","authors":"B. VeeraSekharReddy, Koppula Srinivas Rao, Neeraja Koppula","doi":"10.1109/ICECA55336.2022.10009592","DOIUrl":null,"url":null,"abstract":"Various Natural Language Processing (NLP) applications rely on Named Entity Recognition (NER) to help them sift through mountains of unstructured text data and find the information they need. Named Entity Recognition (NER) is the process of assigning labels to words in a text so that they can be sorted into categories. These state-of-the-art models achieve improved results despite limited resources, making language models increasingly valuable in a variety of NLP tasks. The Conditional Random Field and Active Learning Procedure form the basis of a novel Approach to named entity recognition discussed in this article. Following is an algorithmic description of how the AL-CRF model operates: Initially the samples are clustered with K-Means. Samples are used to train the fundamental CRF classifier, which is done by performing stratified sampling on the generated clusters. The following phase involves starting the selection process based on entropy. The training set is expanded to include examples with the greatest entropy values. The CRF classifier is then trained again using with a new training set, and the procedure is repeated. The AL's learning and selection procedure is repeatedly done until the harmonic mean stabilises and model for NER is obtained. The primary benefit of our method is that it is both more efficient and requires less manually marked training samples. Because of this, the procedure may become more reliable and cost-efficient.","PeriodicalId":356949,"journal":{"name":"2022 6th International Conference on Electronics, Communication and Aerospace Technology","volume":"144 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th International Conference on Electronics, Communication and Aerospace Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECA55336.2022.10009592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Various Natural Language Processing (NLP) applications rely on Named Entity Recognition (NER) to help them sift through mountains of unstructured text data and find the information they need. Named Entity Recognition (NER) is the process of assigning labels to words in a text so that they can be sorted into categories. These state-of-the-art models achieve improved results despite limited resources, making language models increasingly valuable in a variety of NLP tasks. The Conditional Random Field and Active Learning Procedure form the basis of a novel Approach to named entity recognition discussed in this article. Following is an algorithmic description of how the AL-CRF model operates: Initially the samples are clustered with K-Means. Samples are used to train the fundamental CRF classifier, which is done by performing stratified sampling on the generated clusters. The following phase involves starting the selection process based on entropy. The training set is expanded to include examples with the greatest entropy values. The CRF classifier is then trained again using with a new training set, and the procedure is repeated. The AL's learning and selection procedure is repeatedly done until the harmonic mean stabilises and model for NER is obtained. The primary benefit of our method is that it is both more efficient and requires less manually marked training samples. Because of this, the procedure may become more reliable and cost-efficient.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于CRF主动学习算法的英语文本命名实体识别
各种自然语言处理(NLP)应用程序依赖于命名实体识别(NER)来帮助它们筛选大量的非结构化文本数据并找到所需的信息。命名实体识别(NER)是为文本中的单词分配标签的过程,以便将它们分类。这些最先进的模型在资源有限的情况下取得了更好的结果,使得语言模型在各种NLP任务中越来越有价值。条件随机场和主动学习过程构成了本文讨论的命名实体识别新方法的基础。以下是AL-CRF模型如何运作的算法描述:最初,样本使用K-Means聚类。样本用于训练基本的CRF分类器,这是通过对生成的聚类进行分层抽样来完成的。接下来的阶段涉及到基于熵的选择过程。将训练集扩展到包含具有最大熵值的示例。然后使用新的训练集再次训练CRF分类器,并重复该过程。人工智能的学习和选择过程反复进行,直到得到调和均值稳定和NER模型。我们的方法的主要优点是它更有效,并且需要更少的人工标记训练样本。正因为如此,这个过程可能会变得更加可靠和经济。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multi-Objective Artificial Flora Algorithm Based Optimal Handover Scheme for LTE-Advanced Networks Named Entity Recognition using CRF with Active Learning Algorithm in English Texts FPGA Implementation of Lattice-Wave Half-Order Digital Integrator using Radix-$2^{r}$ Digit Recoding Green Cloud Computing- Next Step Towards Eco-friendly Work Stations Diabetes Prediction using Support Vector Machine, Naive Bayes and Random Forest Machine Learning Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1