Finding the presence of borrowings in scientific works based on Markov chains

Rustam Saakyan, I. Shpekht, Gevorg A. Petrosyan
{"title":"Finding the presence of borrowings in scientific works based on Markov chains","authors":"Rustam Saakyan, I. Shpekht, Gevorg A. Petrosyan","doi":"10.21638/11701/spbu10.2023.104","DOIUrl":null,"url":null,"abstract":"The study aims to develop optimal approaches to the search for borrowings in scientific works. The article discusses the stages of searching for the presence of borrowings, such as preprocessing, rough filtering of texts, searching for similar texts, and searching for borrowings. The main focus is on the description of approaches and techniques that can be effectively implemented at each stage. For example, for the preprocessing stage, it may be converting text characters from uppercase to lowercase, removing punctuation marks, and removing stop words. For the stage of rough text filtering, it is filters by topic and word frequency. It may be calculating the importance of words in the context of the text and representing the word as a vector in multidimensional space to determine the proximity measure for the stage of finding similar texts. Finally, it is a search for an exact match, paraphrases and a measure of similarity of expressions for the stage of finding borrowings. The scientific novelty lies in using Markov chains to find the similarity of texts for the second and third stages of the search for borrowings proposed by authors. As a result, the example shows the technique of using Markov chains for text representation, searching for the most frequently occurring words, building a graph of a Markov chain of words, and the prospects for using Markov chains of texts for rough filtering and searching for similar texts.","PeriodicalId":43738,"journal":{"name":"Vestnik Sankt-Peterburgskogo Universiteta Seriya 10 Prikladnaya Matematika Informatika Protsessy Upravleniya","volume":null,"pages":null},"PeriodicalIF":0.3000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vestnik Sankt-Peterburgskogo Universiteta Seriya 10 Prikladnaya Matematika Informatika Protsessy Upravleniya","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21638/11701/spbu10.2023.104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

The study aims to develop optimal approaches to the search for borrowings in scientific works. The article discusses the stages of searching for the presence of borrowings, such as preprocessing, rough filtering of texts, searching for similar texts, and searching for borrowings. The main focus is on the description of approaches and techniques that can be effectively implemented at each stage. For example, for the preprocessing stage, it may be converting text characters from uppercase to lowercase, removing punctuation marks, and removing stop words. For the stage of rough text filtering, it is filters by topic and word frequency. It may be calculating the importance of words in the context of the text and representing the word as a vector in multidimensional space to determine the proximity measure for the stage of finding similar texts. Finally, it is a search for an exact match, paraphrases and a measure of similarity of expressions for the stage of finding borrowings. The scientific novelty lies in using Markov chains to find the similarity of texts for the second and third stages of the search for borrowings proposed by authors. As a result, the example shows the technique of using Markov chains for text representation, searching for the most frequently occurring words, building a graph of a Markov chain of words, and the prospects for using Markov chains of texts for rough filtering and searching for similar texts.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于马尔可夫链发现科学作品中借用的存在
本研究旨在探索科学著作借阅的最佳方法。本文讨论了搜索借阅存在的各个阶段,如预处理、文本粗过滤、搜索相似文本和搜索借阅。主要重点是描述在每个阶段可以有效实施的方法和技术。例如,对于预处理阶段,它可能是将文本字符从大写转换为小写,删除标点符号,并删除停止词。在文本粗过滤阶段,主要是根据主题和词频进行过滤。它可能是计算单词在文本上下文中的重要性,并将单词表示为多维空间中的向量,以确定查找相似文本阶段的接近度量。最后,在寻找借词的阶段,它是对精确匹配、释义和表达相似性的衡量。科学的新颖性在于利用马尔可夫链在作者提出的第二和第三阶段寻找文本的相似性。因此,该示例展示了使用马尔可夫链进行文本表示、搜索最频繁出现的单词、构建马尔可夫词链图的技术,以及使用马尔可夫文本链进行粗过滤和搜索类似文本的前景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.30
自引率
50.00%
发文量
10
期刊介绍: The journal is the prime outlet for the findings of scientists from the Faculty of applied mathematics and control processes of St. Petersburg State University. It publishes original contributions in all areas of applied mathematics, computer science and control. Vestnik St. Petersburg University: Applied Mathematics. Computer Science. Control Processes features articles that cover the major areas of applied mathematics, computer science and control.
期刊最新文献
Beam dynamics simulation in the linear accelerator used as an injector for the 4th generation Specialized Synchrotron Radiation Source SSRS-4 Dynamic network model of production and investment Algorithm for optimal coloring of square (0,1)-matrices Sound synthesis approach based on the elastic stress analysis of a wrinkled thin film coating Method for solving an optimal control problem in the Mayer form with a quasidifferentiable functional in the presence of phase constraints
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1