Impact of Stemming on Efficiency of Messages Likelihood Definition in Telegram Newsfeeds

Olesia Barkovska, Patrik Rusnak, Vitalii Tkachov, T. Muzyka
{"title":"Impact of Stemming on Efficiency of Messages Likelihood Definition in Telegram Newsfeeds","authors":"Olesia Barkovska, Patrik Rusnak, Vitalii Tkachov, T. Muzyka","doi":"10.1109/KhPIWeek57572.2022.9916415","DOIUrl":null,"url":null,"abstract":"The work is dedicated to the development of the system to define the credibility of text messages posted in Telegram newsfeeds. The topicality of the work is stipulated by the concentration of information and its ability to influence shaping of the social opinions on the state relations and political moods via news feeds in messengers and social networks, the number of which is constantly growing and supported by bots and biased authors. The proposed system functions on the basis of coordination of text parsing, text processing, database with messages from the official sources of information, and the client (author) database. The degree of similarity of the generated text messages is determined on the basis of defining Damerau-Levenshtein distance in the Text Processing Module. The work shows it is possible to increase the efficiency (up to 1,44 times for messages of around 1500 symbols) of the given module performance through incoming messages stemming at the preprocessing stage because this enables to reduce the computational complexity of Damerau-Levenshtein method at the expense of word shortening to their stem via neglecting auxiliary parts such as suffixes and endings. Thus, stemming helps to reduce the amount of symbols to be processed at the very stage of Damera u-Levenshtein algorithm application, which proves feasibily of applying stemming in the preprocessing block.","PeriodicalId":197096,"journal":{"name":"2022 IEEE 3rd KhPI Week on Advanced Technology (KhPIWeek)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 3rd KhPI Week on Advanced Technology (KhPIWeek)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KhPIWeek57572.2022.9916415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The work is dedicated to the development of the system to define the credibility of text messages posted in Telegram newsfeeds. The topicality of the work is stipulated by the concentration of information and its ability to influence shaping of the social opinions on the state relations and political moods via news feeds in messengers and social networks, the number of which is constantly growing and supported by bots and biased authors. The proposed system functions on the basis of coordination of text parsing, text processing, database with messages from the official sources of information, and the client (author) database. The degree of similarity of the generated text messages is determined on the basis of defining Damerau-Levenshtein distance in the Text Processing Module. The work shows it is possible to increase the efficiency (up to 1,44 times for messages of around 1500 symbols) of the given module performance through incoming messages stemming at the preprocessing stage because this enables to reduce the computational complexity of Damerau-Levenshtein method at the expense of word shortening to their stem via neglecting auxiliary parts such as suffixes and endings. Thus, stemming helps to reduce the amount of symbols to be processed at the very stage of Damera u-Levenshtein algorithm application, which proves feasibily of applying stemming in the preprocessing block.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
词干化对电报新闻源中消息似然定义效率的影响
这项工作致力于开发一个系统来定义在电报新闻源中发布的文本消息的可信度。作品的话题性是由信息的集中和它通过信使和社交网络的新闻源影响国家关系和政治情绪的社会观点形成的能力决定的,这些信息的数量不断增长,并得到机器人和有偏见的作者的支持。本文提出的系统功能是基于文本解析、文本处理、官方信息源消息数据库和客户端(作者)数据库的协调。生成的文本消息的相似程度是在文本处理模块中定义Damerau-Levenshtein距离的基础上确定的。这项工作表明,通过在预处理阶段对输入消息进行词干提取,可以提高给定模块性能的效率(对于大约1500个符号的消息,可以提高1444倍),因为这可以降低Damerau-Levenshtein方法的计算复杂度,但忽略了后缀和结尾等辅助部分,从而降低了对词干的缩短。因此,词干提取有助于减少Damera u-Levenshtein算法应用阶段需要处理的符号量,这证明了在预处理块中应用词干提取的可行性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Flexible textile thermoelectric materials with CuI nanostructured films deposited on composites of nanocellulose and polyester fabric Nonlinear vibrations of sandwich shells of revolutions with carbon nanotubes reinforced composite faces and fused deposition processed honeycomb core Comparative Analysis of New Methods for Defect Type Recognition by Dissolved Gas Analysis $3\mathrm{D} \text{Al}_{\mathrm{x}}\text{Ga}_{1-\mathrm{x}}\text{As}/\text{por}\text{-}\text{GaAs}/\text{GaAs}$ heterostructures for solar cells Simulation Modelling of the Process of Birds Fly into the Turbojet Aircraft Engine Fan to Determine Most Dangerous Cases in Terms of Blade Strength
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1