Apurbalal Senapati, Arun Poudyal, P. Adhikary, Sahana Kaushar, Anmol Mahajan, Baidya Nath Saha
{"title":"A Machine Learning Approach to Anaphora Resolution in Nepali Language","authors":"Apurbalal Senapati, Arun Poudyal, P. Adhikary, Sahana Kaushar, Anmol Mahajan, Baidya Nath Saha","doi":"10.1109/ComPE49325.2020.9200135","DOIUrl":null,"url":null,"abstract":"In this paper, we attempt a machine learning (ML) approach to Anaphora Resolution (AR) system in Nepali language. It is one of the pioneering approaches in anaphora resolution using machine learning in Nepali language, which is a resource-limited language. For this work, we have developed our own data set in the standard format available in this domain. Data has been tagged with the necessary information like Parts-of-speech (POS), Named entity, Chunking information, Gender, Number, Person, etc. We divided the data for training and testing purposes in approximately 5:1 ratio and ML classifiers are used for training and testing. Results show encouraging for further progress.","PeriodicalId":6804,"journal":{"name":"2020 International Conference on Computational Performance Evaluation (ComPE)","volume":"117 1","pages":"436-441"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computational Performance Evaluation (ComPE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ComPE49325.2020.9200135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In this paper, we attempt a machine learning (ML) approach to Anaphora Resolution (AR) system in Nepali language. It is one of the pioneering approaches in anaphora resolution using machine learning in Nepali language, which is a resource-limited language. For this work, we have developed our own data set in the standard format available in this domain. Data has been tagged with the necessary information like Parts-of-speech (POS), Named entity, Chunking information, Gender, Number, Person, etc. We divided the data for training and testing purposes in approximately 5:1 ratio and ML classifiers are used for training and testing. Results show encouraging for further progress.