{"title":"Punjabi Stop Words: A Gurmukhi, Shahmukhi and Roman Scripted Chronicle","authors":"Jasleen Kaur, Jatinderkumar R. Saini","doi":"10.1145/2909067.2909073","DOIUrl":null,"url":null,"abstract":"With advent of Unicode encoding, Punjabi language content, written using gurmukhi script as well as in shahmukhi script, is increasing day by day on internet. Processing textual information involves passing it to various pre-processing phases. Stop-word elimination is one such sub phase. 256 Gurmukhi stop words had been identified from poetry, stories and online material and passed to Punjabi stemmer. After stemming, 184 stemmed stop words were generated and these stemmed stop words were passed to transliteration phase. This led to generation of stop words in shahmukhi script. For the first time in scientific community dealing with computational linguistics and literature processing using NLP techniques, the list of 184 stop words of Punjabi language is released for public usage and further NLP applications. The presented list consists of stop words of Punjabi language with their Gurmukhi, Shahmukhi as well as Roman scripted forms.","PeriodicalId":371590,"journal":{"name":"Women In Research","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Women In Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2909067.2909073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22
Abstract
With advent of Unicode encoding, Punjabi language content, written using gurmukhi script as well as in shahmukhi script, is increasing day by day on internet. Processing textual information involves passing it to various pre-processing phases. Stop-word elimination is one such sub phase. 256 Gurmukhi stop words had been identified from poetry, stories and online material and passed to Punjabi stemmer. After stemming, 184 stemmed stop words were generated and these stemmed stop words were passed to transliteration phase. This led to generation of stop words in shahmukhi script. For the first time in scientific community dealing with computational linguistics and literature processing using NLP techniques, the list of 184 stop words of Punjabi language is released for public usage and further NLP applications. The presented list consists of stop words of Punjabi language with their Gurmukhi, Shahmukhi as well as Roman scripted forms.