首页 > 最新文献

Doklady Mathematics最新文献

英文 中文
Common Digital Space of Scientific Knowledge as an Integrator of Polythematic Information Resources 作为多主题信息资源整合器的科学知识共同数字空间
IF 0.5 4区 数学 Q3 MATHEMATICS Pub Date : 2024-03-11 DOI: 10.1134/S106456242470176X
N. E. Kalenov, A. N. Sotnikov

The goals, objectives, and structure of the ontology of the Common Digital Space of Scientific Knowledge (CDSSK) are considered. The CDSSK is an integrated information structure that combines state scientific information systems presented on the Internet (the Great Russian Encyclopedia, the National Electronic Library, the State Catalog of Geographical Names, etc.) with industry information systems, databases, and electronic libraries (MathNet, Socionet, Scientific Heritage of Russia, etc.). CDSSK can be considered as an information basis for solving artificial intelligence problems. The article presents the unified structure of the CDSSK ontology developed at the Joint Supercomputer Center of the Russian Academy of Sciences and its modeling on an example of ten subject classes and eight auxiliary classes of objects of the CDSSK universal subspace.

摘 要 本文探讨了 "科学知识共同数字空间"(CDSSK)本体的目的、目标和结构。CDSSK 是一个综合信息结构,它将互联网上的国家科学信息系统(俄罗斯大百科全书、国家电子图书馆、国家地名目录等)与行业信息系统、数据库和电子图书馆(MathNet、Socionet、俄罗斯科学遗产等)结合在一起。CDSSK 可被视为解决人工智能问题的信息基础。文章介绍了俄罗斯科学院联合超级计算机中心开发的 CDSSK 本体的统一结构,并以 CDSSK 通用子空间的 10 个主题类和 8 个辅助类对象为例进行了建模。
{"title":"Common Digital Space of Scientific Knowledge as an Integrator of Polythematic Information Resources","authors":"N. E. Kalenov,&nbsp;A. N. Sotnikov","doi":"10.1134/S106456242470176X","DOIUrl":"10.1134/S106456242470176X","url":null,"abstract":"<p>The goals, objectives, and structure of the ontology of the Common Digital Space of Scientific Knowledge (CDSSK) are considered. The CDSSK is an integrated information structure that combines state scientific information systems presented on the Internet (the Great Russian Encyclopedia, the National Electronic Library, the State Catalog of Geographical Names, etc.) with industry information systems, databases, and electronic libraries (MathNet, Socionet, Scientific Heritage of Russia, etc.). CDSSK can be considered as an information basis for solving artificial intelligence problems. The article presents the unified structure of the CDSSK ontology developed at the Joint Supercomputer Center of the Russian Academy of Sciences and its modeling on an example of ten subject classes and eight auxiliary classes of objects of the CDSSK universal subspace.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Efficient Learning of GNNs on High-Dimensional Multilayered Representations of Tabular Data 在表格数据的高维多层表示上实现 GNN 的高效学习
IF 0.5 4区 数学 Q3 MATHEMATICS Pub Date : 2024-03-11 DOI: 10.1134/S1064562423701193
A. V. Medvedev, A. G. Djakonov

For prediction tasks using tabular data, it is possible to extract additional information about the target variable by examining the relationships between the objects. Specifically, if it is possible to receive agraph in which the objects are represented as vertices and the relationships are expressed as edges, then it is likely that the graph structure contains valuable information. Recent research has indicated that jointly training graph neural networks and gradient boostings on this type of data can increase the accuracy of predictions. This article proposes new methods for learning on tabular data that incorporates a graph structure, in an attempt to combine modern multilayer techniques for processing tabular data and graph neural networks. In addition, we discuss ways to mitigate the computational complexity of the proposed models and conduct experiments in both inductive and transductive settings. Our findings demonstrate tha the proposed approaches provide comparable quality to modern methods.

摘要--对于使用表格数据的预测任务,可以通过检查对象之间的关系来提取有关目标变量的额外信息。具体来说,如果能接收到对象以顶点表示、关系以边表示的图,那么该图结构很可能包含有价值的信息。最近的研究表明,在这类数据上联合训练图神经网络和梯度提升可以提高预测的准确性。本文提出了对包含图结构的表格数据进行学习的新方法,试图将处理表格数据的现代多层技术与图神经网络结合起来。此外,我们还讨论了如何降低所提模型的计算复杂性,并在归纳和反推环境中进行了实验。我们的研究结果表明,所提出的方法可提供与现代方法相当的质量。
{"title":"Towards Efficient Learning of GNNs on High-Dimensional Multilayered Representations of Tabular Data","authors":"A. V. Medvedev,&nbsp;A. G. Djakonov","doi":"10.1134/S1064562423701193","DOIUrl":"10.1134/S1064562423701193","url":null,"abstract":"<p>For prediction tasks using tabular data, it is possible to extract additional information about the target variable by examining the relationships between the objects. Specifically, if it is possible to receive agraph in which the objects are represented as vertices and the relationships are expressed as edges, then it is likely that the graph structure contains valuable information. Recent research has indicated that jointly training graph neural networks and gradient boostings on this type of data can increase the accuracy of predictions. This article proposes new methods for learning on tabular data that incorporates a graph structure, in an attempt to combine modern multilayer techniques for processing tabular data and graph neural networks. In addition, we discuss ways to mitigate the computational complexity of the proposed models and conduct experiments in both inductive and transductive settings. Our findings demonstrate tha the proposed approaches provide comparable quality to modern methods.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Models for Contextual Intention Prediction in Dialog Systems 用于对话系统中上下文意向预测的图模型
IF 0.5 4区 数学 Q3 MATHEMATICS Pub Date : 2024-03-11 DOI: 10.1134/S106456242370117X
D. P. Kuznetsov, D. R. Ledneva

The paper introduces a novel methodology for predicting intentions in dialog systems through a graph-based approach. This methodology involves constructing graph structures that represent dialogs, thus capturing contextual information effectively. By analyzing results from various open and closed domain datasets, the authors demonstrate the substantial enhancement in intention prediction accuracy achieved by combining graph models with text encoders. The primary focus of the study revolves around assessing the impact of diverse graph architectures and encoders on the performance of the proposed technique. Through empirical evaluation, the experimental outcomes affirm the superiority of graph neural networks in terms of both (Recall@k) (MAR) metric and computational resources when compared to alternative methods. This research uncovers a novel avenue for intention prediction in dialog systems by leveraging graph-based representations.

摘要 本文介绍了一种通过基于图的方法预测对话系统意图的新方法。该方法涉及构建表示对话的图结构,从而有效捕捉上下文信息。通过分析各种开放和封闭领域数据集的结果,作者证明了将图模型与文本编码器相结合可大大提高意图预测的准确性。研究的主要重点是评估不同图架构和编码器对所提技术性能的影响。通过实证评估,实验结果肯定了图神经网络与其他方法相比,在 (Recall@k) (MAR) 指标和计算资源方面的优越性。这项研究通过利用基于图的表征,为对话系统中的意图预测开辟了一条新途径。
{"title":"Graph Models for Contextual Intention Prediction in Dialog Systems","authors":"D. P. Kuznetsov,&nbsp;D. R. Ledneva","doi":"10.1134/S106456242370117X","DOIUrl":"10.1134/S106456242370117X","url":null,"abstract":"<p>The paper introduces a novel methodology for predicting intentions in dialog systems through a graph-based approach. This methodology involves constructing graph structures that represent dialogs, thus capturing contextual information effectively. By analyzing results from various open and closed domain datasets, the authors demonstrate the substantial enhancement in intention prediction accuracy achieved by combining graph models with text encoders. The primary focus of the study revolves around assessing the impact of diverse graph architectures and encoders on the performance of the proposed technique. Through empirical evaluation, the experimental outcomes affirm the superiority of graph neural networks in terms of both <span>(Recall@k)</span> (MAR) metric and computational resources when compared to alternative methods. This research uncovers a novel avenue for intention prediction in dialog systems by leveraging graph-based representations.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithms with Gradient Clipping for Stochastic Optimization with Heavy-Tailed Noise 针对重尾噪声随机优化的梯度剪切算法
IF 0.5 4区 数学 Q3 MATHEMATICS Pub Date : 2024-03-11 DOI: 10.1134/S1064562423701144
M. Danilova

This article provides a survey of the results of several research studies [12–14, 26], in which open questions related to the high-probability convergence analysis of stochastic first-order optimization methods under mild assumptions on the noise were gradually addressed. In the beginning, we introduce the concept of gradient clipping, which plays a pivotal role in the development of stochastic methods for successful operation in the case of heavy-tailed distributions. Next, we examine the importance of obtaining the high-probability convergence guarantees and their connection with in-expectation convergence guarantees. The concluding sections of the article are dedicated to presenting the primary findings related to minimization problems and the results of numerical experiments.

摘要 本文概述了几项研究的成果[12-14, 26],在这些研究中,与噪声温和假设下随机一阶优化方法的高概率收敛分析有关的开放性问题逐渐得到了解决。首先,我们介绍梯度削波的概念,它对随机方法在重尾分布情况下成功运行的发展起着关键作用。接下来,我们探讨了获得高概率收敛保证的重要性及其与预期内收敛保证的联系。文章的结尾部分专门介绍了与最小化问题相关的主要发现和数值实验结果。
{"title":"Algorithms with Gradient Clipping for Stochastic Optimization with Heavy-Tailed Noise","authors":"M. Danilova","doi":"10.1134/S1064562423701144","DOIUrl":"10.1134/S1064562423701144","url":null,"abstract":"<p>This article provides a survey of the results of several research studies [12–14, 26], in which open questions related to the high-probability convergence analysis of stochastic first-order optimization methods under mild assumptions on the noise were gradually addressed. In the beginning, we introduce the concept of gradient clipping, which plays a pivotal role in the development of stochastic methods for successful operation in the case of heavy-tailed distributions. Next, we examine the importance of obtaining the high-probability convergence guarantees and their connection with in-expectation convergence guarantees. The concluding sections of the article are dedicated to presenting the primary findings related to minimization problems and the results of numerical experiments.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Discovery of the Differential Equations 发现微分方程
IF 0.5 4区 数学 Q3 MATHEMATICS Pub Date : 2024-03-11 DOI: 10.1134/S1064562423701156
A. A. Hvatov, R. V. Titov

Differential equation discovery, a machine learning subfield, is used to develop interpretable models, particularly, in nature-related applications. By expertly incorporating the general parametric form of the equation of motion and appropriate differential terms, algorithms can autonomously uncover equations from data. This paper explores the prerequisites and tools for independent equation discovery without expert input, eliminating the need for equation form assumptions. We focus on addressing the challenge of assessing the adequacy of discovered equations when the correct equation is unknown, with the aim of providing insights for reliable equation discovery without prior knowledge of the equation form.

摘要--微分方程发现是机器学习的一个子领域,用于开发可解释的模型,特别是在与自然相关的应用中。通过专家将运动方程的一般参数形式和适当的微分项结合起来,算法可以自主地从数据中发现方程。本文探讨了无需专家输入、无需方程形式假设即可自主发现方程的先决条件和工具。我们的重点是解决在正确方程未知的情况下评估已发现方程的适当性这一难题,目的是在事先不了解方程形式的情况下为可靠的方程发现提供见解。
{"title":"Towards Discovery of the Differential Equations","authors":"A. A. Hvatov,&nbsp;R. V. Titov","doi":"10.1134/S1064562423701156","DOIUrl":"10.1134/S1064562423701156","url":null,"abstract":"<p>Differential equation discovery, a machine learning subfield, is used to develop interpretable models, particularly, in nature-related applications. By expertly incorporating the general parametric form of the equation of motion and appropriate differential terms, algorithms can autonomously uncover equations from data. This paper explores the prerequisites and tools for independent equation discovery without expert input, eliminating the need for equation form assumptions. We focus on addressing the challenge of assessing the adequacy of discovered equations when the correct equation is unknown, with the aim of providing insights for reliable equation discovery without prior knowledge of the equation form.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial Intelligence in Society 社会中的人工智能
IF 0.5 4区 数学 Q3 MATHEMATICS Pub Date : 2024-03-11 DOI: 10.1134/S106456242355001X
A. L. Semenov

This article is the author’s review of the singularity in which events in the field of artificial intelligence (AI) are developing. A general view is offered on the role of revolutions in information technology as they expand the human personality. The current stage of personal expansion is considered, covering the last decade, especially 2023. The most important and common socially significant documents expressing concern about AI, as well as those that assert an optimistic view of events, are considered: ethical principles, important directions, requirements, and restrictions. It takes a closer look at the pending European AI Act (AIA) and how different groups are reacting to it. Cultural and historical factors are highlighted that can counteract the negative and catastrophic developments that may result from AI. Possible mechanisms for preserving genuine knowledge among professionals and disseminating it among the general public are analyzed.

摘要 本文是作者对人工智能(AI)领域事件发展奇点的回顾。文章对信息技术革命在扩展人类个性方面的作用提出了总体看法。文章考虑了当前的人格扩展阶段,涵盖了过去十年,尤其是 2023 年。考虑了对人工智能表示担忧的最重要和最常见的具有社会意义的文件,以及对事件持乐观看法的文件:伦理原则、重要方向、要求和限制。报告还仔细研究了即将出台的《欧洲人工智能法案》(AIA),以及不同群体对该法案的反应。报告强调了文化和历史因素,这些因素可以抵消人工智能可能带来的负面和灾难性发展。还分析了在专业人士中保存真正知识并在公众中传播知识的可能机制。
{"title":"Artificial Intelligence in Society","authors":"A. L. Semenov","doi":"10.1134/S106456242355001X","DOIUrl":"10.1134/S106456242355001X","url":null,"abstract":"<p>This article is the author’s review of the singularity in which events in the field of artificial intelligence (AI) are developing. A general view is offered on the role of revolutions in information technology as they expand the human personality. The current stage of personal expansion is considered, covering the last decade, especially 2023. The most important and common socially significant documents expressing concern about AI, as well as those that assert an optimistic view of events, are considered: ethical principles, important directions, requirements, and restrictions. It takes a closer look at the pending European AI Act (AIA) and how different groups are reacting to it. Cultural and historical factors are highlighted that can counteract the negative and catastrophic developments that may result from AI. Possible mechanisms for preserving genuine knowledge among professionals and disseminating it among the general public are analyzed.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Operator Estimates for Problems in Domains with Singularly Curved Boundary: Dirichlet and Neumann Conditions 具有奇异曲线边界的域中问题的算子估计: 迪里希特条件和诺依曼条件
IF 0.5 4区 数学 Q3 MATHEMATICS Pub Date : 2024-03-11 DOI: 10.1134/S1064562424701758
D. I. Borisov, R. R. Suleimanov

We consider a system of second-order semilinear elliptic equations in a multidimensional domain with an arbitrarily curved boundary contained in a narrow layer along the unperturbed boundary. The Dirichlet or Neumann condition is imposed on the curved boundary. In the case of the Neumann condition, rather natural and weak conditions are additionally imposed on the structure of the curving. Under these conditions, we show that the homogenized problem is one for the same system of equations in the unperturbed problem with a boundary condition of the same kind as on the perturbed boundary. The main result is operator (W_{2}^{1})- and L2- estimates.

摘要 我们考虑了多维域中的二阶半线性椭圆方程系统,该多维域具有任意弯曲的边界,沿未扰动边界包含一个狭窄层。在弯曲边界上施加了 Dirichlet 或 Neumann 条件。在诺依曼条件的情况下,还对弯曲的结构施加了相当自然和微弱的条件。在这些条件下,我们证明了同质化问题是未扰动问题中相同方程组的问题,其边界条件与扰动边界上的边界条件相同。主要结果是算子 (W_{2}^{1}) - 和 L2- 估计。
{"title":"Operator Estimates for Problems in Domains with Singularly Curved Boundary: Dirichlet and Neumann Conditions","authors":"D. I. Borisov,&nbsp;R. R. Suleimanov","doi":"10.1134/S1064562424701758","DOIUrl":"10.1134/S1064562424701758","url":null,"abstract":"<p>We consider a system of second-order semilinear elliptic equations in a multidimensional domain with an arbitrarily curved boundary contained in a narrow layer along the unperturbed boundary. The Dirichlet or Neumann condition is imposed on the curved boundary. In the case of the Neumann condition, rather natural and weak conditions are additionally imposed on the structure of the curving. Under these conditions, we show that the homogenized problem is one for the same system of equations in the unperturbed problem with a boundary condition of the same kind as on the perturbed boundary. The main result is operator <span>(W_{2}^{1})</span>- and <i>L</i><sub>2</sub>- estimates.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introductory Words of AI Journey Team 人工智能之旅团队简介
IF 0.5 4区 数学 Q3 MATHEMATICS Pub Date : 2024-03-11 DOI: 10.1134/S1064562423900015
The AI Journey Team
{"title":"Introductory Words of AI Journey Team","authors":"The AI Journey Team","doi":"10.1134/S1064562423900015","DOIUrl":"10.1134/S1064562423900015","url":null,"abstract":"","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Networks for Coordination Analysis 用于协调分析的神经网络
IF 0.5 4区 数学 Q3 MATHEMATICS Pub Date : 2024-03-11 DOI: 10.1134/S1064562423701181
A. I. Predelina, S. Yu. Dulikov, A. M. Alekseev

This paper is dedicated to the development of a novel method for coordination analysis (CA) in English using the neural (deep learning) methods. An efficient solution for the task allows identifying potentially valuable links and relationships between specific parts of a sentence, making the extraction of coordinate structures an important text preprocessing tool. In this study, a number of ideas for approaching the task within the framework of one-stage detectors were tested. The achieved results are comparable in quality to the current most advanced CA methods while allowing to process more than three-fold more sentences per unit time.

摘要 本文致力于开发一种使用神经(深度学习)方法进行英语协调分析(CA)的新方法。对这一任务的有效解决方案可以识别句子特定部分之间潜在的有价值的联系和关系,从而使提取坐标结构成为重要的文本预处理工具。在本研究中,测试了在单级检测器框架内处理该任务的若干想法。所取得的结果在质量上可与目前最先进的 CA 方法相媲美,同时单位时间内可处理的句子数量增加了三倍以上。
{"title":"Neural Networks for Coordination Analysis","authors":"A. I. Predelina,&nbsp;S. Yu. Dulikov,&nbsp;A. M. Alekseev","doi":"10.1134/S1064562423701181","DOIUrl":"10.1134/S1064562423701181","url":null,"abstract":"<p>This paper is dedicated to the development of a novel method for coordination analysis (CA) in English using the neural (deep learning) methods. An efficient solution for the task allows identifying potentially valuable links and relationships between specific parts of a sentence, making the extraction of coordinate structures an important text preprocessing tool. In this study, a number of ideas for approaching the task within the framework of one-stage detectors were tested. The achieved results are comparable in quality to the current most advanced CA methods while allowing to process more than three-fold more sentences per unit time.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificially Generated Text Fragments Search in Academic Documents 在学术文献中搜索人工生成的文本片段
IF 0.5 4区 数学 Q3 MATHEMATICS Pub Date : 2024-03-11 DOI: 10.1134/S1064562423701211
G. M. Gritsay, A. V. Grabovoy, A. S. Kildyakov, Yu. V. Chekhovich

Recent advances in text generative models make it possible to create artificial texts that look like human-written texts. A large number of methods for detecting texts obtained using large language models have already been developed. However, improvement of detection methods occurs simultaneously with the improvement of generation methods. Therefore, it is necessary to explore new generative models and modernize existing approaches to their detection. In this paper, we present a large analysis of existing detection methods, as well as a study of lexical, syntactic, and stylistic features of the generated fragments. Taking into account the developments, we have tested the most qualitative, in our opinion, methods of detecting machine-generated documents for their further application in the scientific domain. Experiments were conducted for Russian and English languages on the collected datasets. The developed methods improved the detection quality to a value of 0.968 on the F1-score metric for Russian and 0.825 for English, respectively. The described techniques can be applied to detect generated fragments in scientific, research, and graduate papers.

摘要 文本生成模型方面的最新进展使创建与人类书写文本相似的人工文本成为可能。目前已开发出大量使用大型语言模型检测文本的方法。然而,检测方法的改进与生成方法的改进是同步进行的。因此,有必要探索新的生成模型,并更新现有的检测方法。在本文中,我们对现有的检测方法进行了大量分析,并对生成片段的词法、句法和文体特征进行了研究。考虑到发展情况,我们测试了我们认为最有质量的机器生成文档检测方法,以便在科学领域进一步应用。我们在收集到的数据集上对俄语和英语进行了实验。所开发的方法提高了检测质量,俄语的 F1 分数指标值为 0.968,英语的 F1 分数指标值为 0.825。所述技术可用于检测科学、研究和研究生论文中生成的片段。
{"title":"Artificially Generated Text Fragments Search in Academic Documents","authors":"G. M. Gritsay,&nbsp;A. V. Grabovoy,&nbsp;A. S. Kildyakov,&nbsp;Yu. V. Chekhovich","doi":"10.1134/S1064562423701211","DOIUrl":"10.1134/S1064562423701211","url":null,"abstract":"<p>Recent advances in text generative models make it possible to create artificial texts that look like human-written texts. A large number of methods for detecting texts obtained using large language models have already been developed. However, improvement of detection methods occurs simultaneously with the improvement of generation methods. Therefore, it is necessary to explore new generative models and modernize existing approaches to their detection. In this paper, we present a large analysis of existing detection methods, as well as a study of lexical, syntactic, and stylistic features of the generated fragments. Taking into account the developments, we have tested the most qualitative, in our opinion, methods of detecting machine-generated documents for their further application in the scientific domain. Experiments were conducted for Russian and English languages on the collected datasets. The developed methods improved the detection quality to a value of 0.968 on the F1-score metric for Russian and 0.825 for English, respectively. The described techniques can be applied to detect generated fragments in scientific, research, and graduate papers.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Doklady Mathematics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1