Rahmat Ali Rajper, Samina Rajper Samina Rajper, Abdullah Maitlo, Ghulam Nabi
{"title":"Analysis and Comparative Study of POS Tagging Techniques for National (Urdu) Language and other Regional Languages of Pakistan","authors":"Rahmat Ali Rajper, Samina Rajper Samina Rajper, Abdullah Maitlo, Ghulam Nabi","doi":"10.26692/surj.v53i04.4223","DOIUrl":null,"url":null,"abstract":"Defining algorithms and techniques to enable computers to understand human language is the Natural Language Processing (NLP), which is an integral part of speech recognition. Parts of Speech (POS) is considered as one of the well understood problems of Natural Language Processing, in which natural language words and sentence are tagged or assigned grammatical classes, because tagging a single word by human hand is a time consuming and tedious job. To automate the tagging job is the way to automate the lexicons of the text of a language. Many of the languages are enriched with their POS tagging systems. Pakistani regional languages are less developed due to the many reasons and much of the work is needed in POS tagging system. Some of the regional languages have their POS tagging systems but still they need some more attention to refine their system. Some of the languages need to develop from the scratch. Balochi language has no any POS tagging system. This study presents the comparative analysis of POS tagging approaches for the national language (Urdu) and other regional languages of Pakistan. The approaches and their data sets used and their reported results are presented here","PeriodicalId":21635,"journal":{"name":"SINDH UNIVERSITY RESEARCH JOURNAL -SCIENCE SERIES","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SINDH UNIVERSITY RESEARCH JOURNAL -SCIENCE SERIES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26692/surj.v53i04.4223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Defining algorithms and techniques to enable computers to understand human language is the Natural Language Processing (NLP), which is an integral part of speech recognition. Parts of Speech (POS) is considered as one of the well understood problems of Natural Language Processing, in which natural language words and sentence are tagged or assigned grammatical classes, because tagging a single word by human hand is a time consuming and tedious job. To automate the tagging job is the way to automate the lexicons of the text of a language. Many of the languages are enriched with their POS tagging systems. Pakistani regional languages are less developed due to the many reasons and much of the work is needed in POS tagging system. Some of the regional languages have their POS tagging systems but still they need some more attention to refine their system. Some of the languages need to develop from the scratch. Balochi language has no any POS tagging system. This study presents the comparative analysis of POS tagging approaches for the national language (Urdu) and other regional languages of Pakistan. The approaches and their data sets used and their reported results are presented here