Rule Based Approach for Word Normalization in Transliterated Search Queries

Varsha M. Pathak, M. Joshi
{"title":"Rule Based Approach for Word Normalization in Transliterated Search Queries","authors":"Varsha M. Pathak, M. Joshi","doi":"10.30726/ijlca/v7.i2.2020.72002","DOIUrl":null,"url":null,"abstract":"SMS based Information Systems is the need of the age. Most of the present SMS based information systems send one way SMS based informative text messages generated from respective knowledge systems. By applying information retrieval methodology using models like Vector Space Mode, the systems can allow its users to send queries as per their requirement of information. This makes the system more fruitful from the user’s point of view. This paper is about such initiatives for accessing relevant literature like poems, phrases, Rhymes, stories, abhang and much more. The mobile based quick library access system MQuickLib allows users to access such literature by formulating transliterated queries. The Vector Space Model is used to create the systems knowledge base by processing. The document terms and matched with the query terms by allowing variation in spelling due to transliteration style of the users. The matching score is assigned by devising a set of rules that identify the distance between two terms dk the term from document and qj the query term. The original Levenshtein’s minimum edit distance algorithm is modified by applying this rule based approach. These rules are identified by collecting SMS queries from users for a given set of known queries in Marathi (Devnagari). Experiments were carried out for the collection of Marathi and Hindi literature that mainly include songs, gazals, powadas, bharud and other types. These documents are available in a standard transliteration form like ITRANS (an Indic Transliteration System). This paper elaborated a rule based approach and analyses the results to select appropriate rule based model that is further applied for the development of MQuickLib system.","PeriodicalId":271922,"journal":{"name":"International Journal of Linguistics and Computational Applications","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Linguistics and Computational Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30726/ijlca/v7.i2.2020.72002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

SMS based Information Systems is the need of the age. Most of the present SMS based information systems send one way SMS based informative text messages generated from respective knowledge systems. By applying information retrieval methodology using models like Vector Space Mode, the systems can allow its users to send queries as per their requirement of information. This makes the system more fruitful from the user’s point of view. This paper is about such initiatives for accessing relevant literature like poems, phrases, Rhymes, stories, abhang and much more. The mobile based quick library access system MQuickLib allows users to access such literature by formulating transliterated queries. The Vector Space Model is used to create the systems knowledge base by processing. The document terms and matched with the query terms by allowing variation in spelling due to transliteration style of the users. The matching score is assigned by devising a set of rules that identify the distance between two terms dk the term from document and qj the query term. The original Levenshtein’s minimum edit distance algorithm is modified by applying this rule based approach. These rules are identified by collecting SMS queries from users for a given set of known queries in Marathi (Devnagari). Experiments were carried out for the collection of Marathi and Hindi literature that mainly include songs, gazals, powadas, bharud and other types. These documents are available in a standard transliteration form like ITRANS (an Indic Transliteration System). This paper elaborated a rule based approach and analyses the results to select appropriate rule based model that is further applied for the development of MQuickLib system.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
音译搜索查询中基于规则的词规范化方法
基于短信的信息系统是时代的需要。目前大多数基于短信的信息系统都是单向发送由各自的知识系统生成的基于短信的信息文本消息。通过使用像向量空间模式这样的模型来应用信息检索方法,系统可以允许用户根据他们的信息需求发送查询。从用户的角度来看,这使得系统更富有成效。本文是关于这样的倡议,以获取相关的文学,如诗歌,短语,押韵,故事,abhang和更多。基于移动的快速图书馆访问系统MQuickLib允许用户通过制定音译查询来访问此类文献。利用向量空间模型对系统进行处理,建立系统知识库。根据用户的音译风格,允许拼写变化,从而使文档术语与查询术语相匹配。匹配分数是通过设计一组规则来分配的,这些规则标识两个词之间的距离dk(来自文档的词)和qj(查询词)。利用这种基于规则的方法对原有的Levenshtein最小编辑距离算法进行了改进。这些规则是通过收集用户针对马拉地语(Devnagari)给定的一组已知查询的SMS查询来确定的。对马拉地语和印地语文学的收集进行了实验,主要包括歌曲,gazals, powadas, bharud和其他类型。这些文件以ITRANS(印度音译系统)等标准音译形式提供。本文阐述了一种基于规则的方法,并对结果进行了分析,选择了合适的基于规则的模型,并将其进一步应用于MQuickLib系统的开发。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Stock Prediction using Machine Learning தொல்காப்பியர் காட்டும் நில அமைப்பும் வாழ்க்கை முறையும் குறளும் தொகையும் अरुणकमल की कविताओं में अंतरराष्ट्रीय चेतना Rule Based Approach for Word Normalization in Transliterated Search Queries
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1