Supervised named entity recognition in Assamese language

G. Talukdar, Pranjal Protim Borah, Arup Baruah
{"title":"Supervised named entity recognition in Assamese language","authors":"G. Talukdar, Pranjal Protim Borah, Arup Baruah","doi":"10.1109/IC3I.2014.7019728","DOIUrl":null,"url":null,"abstract":"In each and every natural language nouns play a very important role. A subcategory of noun is proper noun. They represent the names of person, location, organization etc. The task of recognizing the proper nouns in a text and categorizing them into some classes such as person, location, organization and other is called Named Entity Recognition. This is a very essential step of many natural language processing applications that makes the process of information extraction easier. Named Entity Recognition (NER) in most of the Indian languages has been performed using rule-based, supervised and unsupervised approaches. In this work our target language is Assamese, the language spoken by most of the people in North-Eastern part of India and particularly in Assam. In Assamese language, Named Entity Recognition has been performed using the rule based and suffix stripping based approaches. Supervised learning technique is more useful and can be easily adapted to new domains compared to rule based approaches. This paper reports the first work in Assamese NER using a machine learning technique. In this paper Assamese Named Entity Recognition is performed using Naïve Bayes classifier. Since feature extraction plays the most important role in getting better performance in any machine learning technique, in this work our aim is to put forward a description of a few important features related to Assamese NER and performance measure of the system using these features.","PeriodicalId":430848,"journal":{"name":"2014 International Conference on Contemporary Computing and Informatics (IC3I)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I.2014.7019728","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

In each and every natural language nouns play a very important role. A subcategory of noun is proper noun. They represent the names of person, location, organization etc. The task of recognizing the proper nouns in a text and categorizing them into some classes such as person, location, organization and other is called Named Entity Recognition. This is a very essential step of many natural language processing applications that makes the process of information extraction easier. Named Entity Recognition (NER) in most of the Indian languages has been performed using rule-based, supervised and unsupervised approaches. In this work our target language is Assamese, the language spoken by most of the people in North-Eastern part of India and particularly in Assam. In Assamese language, Named Entity Recognition has been performed using the rule based and suffix stripping based approaches. Supervised learning technique is more useful and can be easily adapted to new domains compared to rule based approaches. This paper reports the first work in Assamese NER using a machine learning technique. In this paper Assamese Named Entity Recognition is performed using Naïve Bayes classifier. Since feature extraction plays the most important role in getting better performance in any machine learning technique, in this work our aim is to put forward a description of a few important features related to Assamese NER and performance measure of the system using these features.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
阿萨姆语中有监督的命名实体识别
在每一种自然语言中,名词都扮演着非常重要的角色。名词的一个子类是专有名词。它们代表个人、地点、组织等的名称。识别文本中的专有名词并将其分类为诸如人、地点、组织等类的任务称为命名实体识别。这是许多自然语言处理应用程序中非常重要的一步,它使信息提取过程变得更加容易。大多数印度语言的命名实体识别(NER)都是使用基于规则的、有监督的和无监督的方法进行的。在这项工作中,我们的目标语言是阿萨姆语,这是印度东北部尤其是阿萨姆邦大多数人使用的语言。在阿萨姆语中,使用基于规则和基于后缀剥离的方法来执行命名实体识别。与基于规则的方法相比,监督学习技术更有用,更容易适应新的领域。本文报告了使用机器学习技术在阿萨姆邦NER中的第一个工作。本文使用Naïve贝叶斯分类器对阿萨姆语命名实体进行识别。由于在任何机器学习技术中,特征提取在获得更好的性能方面发挥着最重要的作用,因此在这项工作中,我们的目标是提出与阿萨姆邦NER相关的几个重要特征的描述,并使用这些特征对系统进行性能度量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Smart home and smart city solutions enabled by 5G, IoT, AAI and CoT services Video retrieval: An accurate approach based on Kirsch descriptor Microarray data classification using Fuzzy K-Nearest Neighbor Assessment of data quality in Web sites: towards a model A novel cross layer wireless mesh network protocol for distributed generation in electrical networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1