Punjabi Stop Words: A Gurmukhi, Shahmukhi and Roman Scripted Chronicle

Jasleen Kaur, Jatinderkumar R. Saini
{"title":"Punjabi Stop Words: A Gurmukhi, Shahmukhi and Roman Scripted Chronicle","authors":"Jasleen Kaur, Jatinderkumar R. Saini","doi":"10.1145/2909067.2909073","DOIUrl":null,"url":null,"abstract":"With advent of Unicode encoding, Punjabi language content, written using gurmukhi script as well as in shahmukhi script, is increasing day by day on internet. Processing textual information involves passing it to various pre-processing phases. Stop-word elimination is one such sub phase. 256 Gurmukhi stop words had been identified from poetry, stories and online material and passed to Punjabi stemmer. After stemming, 184 stemmed stop words were generated and these stemmed stop words were passed to transliteration phase. This led to generation of stop words in shahmukhi script. For the first time in scientific community dealing with computational linguistics and literature processing using NLP techniques, the list of 184 stop words of Punjabi language is released for public usage and further NLP applications. The presented list consists of stop words of Punjabi language with their Gurmukhi, Shahmukhi as well as Roman scripted forms.","PeriodicalId":371590,"journal":{"name":"Women In Research","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Women In Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2909067.2909073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

Abstract

With advent of Unicode encoding, Punjabi language content, written using gurmukhi script as well as in shahmukhi script, is increasing day by day on internet. Processing textual information involves passing it to various pre-processing phases. Stop-word elimination is one such sub phase. 256 Gurmukhi stop words had been identified from poetry, stories and online material and passed to Punjabi stemmer. After stemming, 184 stemmed stop words were generated and these stemmed stop words were passed to transliteration phase. This led to generation of stop words in shahmukhi script. For the first time in scientific community dealing with computational linguistics and literature processing using NLP techniques, the list of 184 stop words of Punjabi language is released for public usage and further NLP applications. The presented list consists of stop words of Punjabi language with their Gurmukhi, Shahmukhi as well as Roman scripted forms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
旁遮普停止词:Gurmukhi, Shahmukhi和罗马脚本编年史
随着Unicode编码的出现,旁遮普语内容,使用gurmukhi脚本和shahmukhi脚本编写,在互联网上日益增加。文本信息的处理包括将其传递到各种预处理阶段。停止词消除就是这样一个子阶段。256个Gurmukhi停顿词从诗歌、故事和网络材料中被识别出来,并传递给旁遮普语的词干。词干提取后,生成184个词干停止词,这些词干停止词进入音译阶段。这导致了shahmukhi文字中停顿词的产生。在科学界第一次使用NLP技术处理计算语言学和文献处理,旁遮普语184个停止词的列表被发布给公众使用和进一步的NLP应用。所呈现的列表包括旁遮普语的停顿词及其Gurmukhi, Shahmukhi以及罗马脚本形式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Handling Uncertainty in Linguistics Using Probability Theory Secure Sum Computation Using Homomorphic Encryption Skyline Computation for Big Data Missing Value Imputation in Medical Records for Remote Health Care A Review of Wireless Charging Nodes in Wireless Sensor Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1