Asraf Hossain Patoary, Md. Jahid Bin Kibria, Abdul Kaium
{"title":"Implementation of Automated Bengali Parts of Speech Tagger: An Approach Using Deep Learning Algorithm","authors":"Asraf Hossain Patoary, Md. Jahid Bin Kibria, Abdul Kaium","doi":"10.1109/TENSYMP50017.2020.9230907","DOIUrl":null,"url":null,"abstract":"Parts-of-Speech(POS) tagging is the technique to assign each word in a sentence as an individual part of speech. POS tagging is the first important step in Natural Language Processing applications (NLP). In some languages, POS tagging works well with higher accuracy, but in the Bengali language, it is still an unsolved problem. The Bengali language is much ambiguous and inflectional, where every word has many more variants based on their suffixes and prefixes. Although developing POS tagging is not new for the Bengali language, we aim to make a highly accurate model with a minimal dataset. Here we developed a deep learning model, and it is mainly based on suffixes, which are parts of Bengali grammar. Moreover, we did experiment with a Bengali corpus that contains 2927 words with their corresponding parts of speech tags. The accuracy of our proposed POS tagging deep learning model is 93.90%. We also included this model as a python package to our open-source Bengali Natural language processing toolkit (BNLTK), which is now live on pipy.org.","PeriodicalId":6721,"journal":{"name":"2020 IEEE Region 10 Symposium (TENSYMP)","volume":"39 1","pages":"308-311"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Region 10 Symposium (TENSYMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENSYMP50017.2020.9230907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Parts-of-Speech(POS) tagging is the technique to assign each word in a sentence as an individual part of speech. POS tagging is the first important step in Natural Language Processing applications (NLP). In some languages, POS tagging works well with higher accuracy, but in the Bengali language, it is still an unsolved problem. The Bengali language is much ambiguous and inflectional, where every word has many more variants based on their suffixes and prefixes. Although developing POS tagging is not new for the Bengali language, we aim to make a highly accurate model with a minimal dataset. Here we developed a deep learning model, and it is mainly based on suffixes, which are parts of Bengali grammar. Moreover, we did experiment with a Bengali corpus that contains 2927 words with their corresponding parts of speech tags. The accuracy of our proposed POS tagging deep learning model is 93.90%. We also included this model as a python package to our open-source Bengali Natural language processing toolkit (BNLTK), which is now live on pipy.org.