{"title":"Part of Speech Tagging for Hindi Corpus","authors":"N. Mishra, Amit Mishra","doi":"10.1109/CSNT.2011.118","DOIUrl":null,"url":null,"abstract":"The wide utilization of internet for making search of information has got emerging use of computational linguistics as most of the search systems uses bag of words mode which causes problem in retrieval due to polysemy, homonymy, synonymy[9][3]. This has lead to shift in the accepted boundary between what kinds of query information are submitted by humans and what kinds further intreprations in form of annotation of query information can be done so as to get better results[1][4]. In this regards the process of annotating the words in a text in accordance to a particular part of speech is the objective of this paper. further POS tagging is much tougher than making a list of words and their parts of speech, as most words tend to have more than one part of speech in different scenarios and some parts of speech of these words are rather complex or unspoken[5] [6]. There are large numbers of POS tagger available for english language which has got satisfactory performance but cannot be applied to hindi language due to structural differences[8]. This paper aims at part of speech tagging for hindi corpus as large no of hindi documents are growing on internet.","PeriodicalId":294850,"journal":{"name":"2011 International Conference on Communication Systems and Network Technologies","volume":"11 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Communication Systems and Network Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSNT.2011.118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29
Abstract
The wide utilization of internet for making search of information has got emerging use of computational linguistics as most of the search systems uses bag of words mode which causes problem in retrieval due to polysemy, homonymy, synonymy[9][3]. This has lead to shift in the accepted boundary between what kinds of query information are submitted by humans and what kinds further intreprations in form of annotation of query information can be done so as to get better results[1][4]. In this regards the process of annotating the words in a text in accordance to a particular part of speech is the objective of this paper. further POS tagging is much tougher than making a list of words and their parts of speech, as most words tend to have more than one part of speech in different scenarios and some parts of speech of these words are rather complex or unspoken[5] [6]. There are large numbers of POS tagger available for english language which has got satisfactory performance but cannot be applied to hindi language due to structural differences[8]. This paper aims at part of speech tagging for hindi corpus as large no of hindi documents are growing on internet.