V. Advaith, Anushka Shivkumar, B. S. Sowmya Lakshmi
{"title":"Parts of Speech Tagging for Kannada and Hindi Languages using ML and DL models","authors":"V. Advaith, Anushka Shivkumar, B. S. Sowmya Lakshmi","doi":"10.1109/CONECCT55679.2022.9865745","DOIUrl":null,"url":null,"abstract":"Part-of-speech (POS) tagging is one of the vital Natural Language Processing (NLP) tasks that entails categorising words in a text (corpus) in accordance with a specific part of the speech, based on the word’s context. POS tagging for Indian Languages is not widely explored. Kannada is extremely inflectional and contains one of the most complex and richest collections of linguistic traits. Hence, developing a POS tagger for a resource-poor language such as Kannada is difficult The morphological complexity of Hindi becomes a challenge despite there having been numerous attempts of building a POS tagger for the language. The proposed work deals with the development of a POS tagger for both Kannada and Hindi by employing Machine Learning (ML) and Deep Learning (DL) algorithms. The results obtained are based on experiments conducted on a corpus consisting of around 3 lakh unique words for Kannada and Hindi combined. The 17 POS tags have been taken from the BIS tag set.","PeriodicalId":380005,"journal":{"name":"2022 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONECCT55679.2022.9865745","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Part-of-speech (POS) tagging is one of the vital Natural Language Processing (NLP) tasks that entails categorising words in a text (corpus) in accordance with a specific part of the speech, based on the word’s context. POS tagging for Indian Languages is not widely explored. Kannada is extremely inflectional and contains one of the most complex and richest collections of linguistic traits. Hence, developing a POS tagger for a resource-poor language such as Kannada is difficult The morphological complexity of Hindi becomes a challenge despite there having been numerous attempts of building a POS tagger for the language. The proposed work deals with the development of a POS tagger for both Kannada and Hindi by employing Machine Learning (ML) and Deep Learning (DL) algorithms. The results obtained are based on experiments conducted on a corpus consisting of around 3 lakh unique words for Kannada and Hindi combined. The 17 POS tags have been taken from the BIS tag set.