Pub Date : 2025-10-14DOI: 10.3103/S000510552570089X
R. R. Nigmatullin, A. A. Litvinov, S. I. Osokin
In this paper, the authors propose the foundations of an original theory of quasi-reproducible experiments (QREs) based on the testable hypothesis that there exists an essential correlation (memory) between successive measurements. From this hypothesis, which the authors call for brevity as the verified partial correlation principle (VPCP), it can be proved that there exists a universal fitting function (UFF) for quasi-reproducible (QR) measurements. In other words, there is some common platform or bridge across which, figuratively speaking, a true theory (claiming to describe data from first principles or verifiable models) and an experiment offering this theory for verification of measured data, maximally cleaned from the influence of uncontrollable factors and apparatus/software function, meet. Actually, the proposed theory provides the potential researcher with a method of the purification of initial data, finally suggesting a curve that describes the data, is periodic, and is cleaned from a set of uncontrollable factors. The final curve corresponds to an ideal experiment. The proposed theory has been tested on eddy-covariance ecological data related to the content of CH4, CO2, and H2O in the local atmosphere, where the corresponding detectors for measuring of the desired gases content are located. For these tested eddy-covariance data associated with the presence of CH4, CO2, and H2O vapor in the atmosphere there is no simple hypothesis containing a minimal number of the fitting parameters, and, therefore, the fitting function that follows from this theory can serve as the only and most reliable quantitative description of this kind of data that belongs to the tested complex system. We should also note that the final fitting function that is removed from uncontrollable factors becomes purely periodic and corresponds to an ideal experiment. Applications of this theory to practical applications, its place among other alternative approaches (especially touching the professional interests of ecologists), and its further development are discussed in the paper.
{"title":"New Method of Description of Eddy-Covariance Ecologic Data","authors":"R. R. Nigmatullin, A. A. Litvinov, S. I. Osokin","doi":"10.3103/S000510552570089X","DOIUrl":"10.3103/S000510552570089X","url":null,"abstract":"<p>In this paper, the authors propose the foundations of an original theory of quasi-reproducible experiments (QREs) based on the testable hypothesis that there exists an essential correlation (memory) between successive measurements. From this hypothesis, which the authors call for brevity as the verified partial correlation principle (VPCP), it can be proved that there exists a universal fitting function (UFF) for quasi-reproducible (QR) measurements. In other words, there is some common platform or bridge across which, figuratively speaking, a true theory (claiming to describe data from first principles or verifiable models) and an experiment offering this theory for verification of measured data, maximally cleaned from the influence of uncontrollable factors and apparatus/software function, meet. Actually, the proposed theory provides the potential researcher with a method of the purification of initial data, finally suggesting a curve that describes the data, is periodic, and is cleaned from a set of uncontrollable factors. The final curve corresponds to an ideal experiment. The proposed theory has been tested on eddy-covariance ecological data related to the content of CH<sub>4</sub>, CO<sub>2</sub>, and H<sub>2</sub>O in the local atmosphere, where the corresponding detectors for measuring of the desired gases content are located. For these tested eddy-covariance data associated with the presence of CH<sub>4</sub>, CO<sub>2</sub>, and H<sub>2</sub>O vapor in the atmosphere there is no simple hypothesis containing a minimal number of the fitting parameters, and, therefore, the fitting function that follows from this theory can serve as the only and most reliable quantitative description of this kind of data that belongs to the tested complex system. We should also note that the final fitting function that is removed from uncontrollable factors becomes purely periodic and corresponds to an ideal experiment. Applications of this theory to practical applications, its place among other alternative approaches (especially touching the professional interests of ecologists), and its further development are discussed in the paper.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 1","pages":"S15 - S28"},"PeriodicalIF":0.5,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145284493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.3103/S0005105525700918
I. G. Olgina
With the advent of network science, it has become possible to explore complex network systems, including social and information networks, by presenting them as graph models. The exponential growth of the total volume of scientific publications determines the relevance of the tasks of analyzing their interrelations. In network science, models and methods that are related to the field of citation networks are being developed to solve these problems. However, network metrics are not used when analyzing publications in citation databases. The paper considers the issues of creating a decision support system for the selection of information sources based on data on the citation of scientific publications. A software package has been developed for making decisions on determining an important publication in a certain thematic area. The software package is based on a method of ranking publications by importance based on the analysis of citation networks, which allows publications to be identified that do not clearly stand out when ranking based on known bibliometric indicators or known measures of centrality of nodes in their pure form. A study and comparative analysis of software for the visualization and study of all types of graphs and social networks has been conducted. Studies have been carried out to confirm the effectiveness of the proposed decision support system in the selection of information sources.
{"title":"Support System for the Selection of Information Sources in Citation Networks","authors":"I. G. Olgina","doi":"10.3103/S0005105525700918","DOIUrl":"10.3103/S0005105525700918","url":null,"abstract":"<p>With the advent of network science, it has become possible to explore complex network systems, including social and information networks, by presenting them as graph models. The exponential growth of the total volume of scientific publications determines the relevance of the tasks of analyzing their interrelations. In network science, models and methods that are related to the field of citation networks are being developed to solve these problems. However, network metrics are not used when analyzing publications in citation databases. The paper considers the issues of creating a decision support system for the selection of information sources based on data on the citation of scientific publications. A software package has been developed for making decisions on determining an important publication in a certain thematic area. The software package is based on a method of ranking publications by importance based on the analysis of citation networks, which allows publications to be identified that do not clearly stand out when ranking based on known bibliometric indicators or known measures of centrality of nodes in their pure form. A study and comparative analysis of software for the visualization and study of all types of graphs and social networks has been conducted. Studies have been carried out to confirm the effectiveness of the proposed decision support system in the selection of information sources.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 1","pages":"S29 - S36"},"PeriodicalIF":0.5,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145284498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.3103/S0005105525700980
K. A. Romadanskii, A. E. Akhaev, T. R. Gilyazov
In this article, a pseudoword is defined as a unit of speech or text that appears to be a real word in Russian but actually has no meaning. A real or natural word is a unit of speech or text that has an interpretation and is included in a dictionary. The paper presents two models for working with the Russian language: a generator that creates pseudowords resembling real words and a classifier that evaluates the degree of similarity between the entered sequence of characters and real words. The classifier is used to evaluate the results produced by the generator. Both models are based on recurrent neural networks with long short-term memory layers and are trained on a dataset of Russian nouns. The resulting file contained a list of the generated pseudowords that were then evaluated by the classifier to filter out those that were not similar enough to real words. Pseudowords can be used in naming, branding, and layout design, in art, creative works, and in linguistic studies exploring the language structure and words.
{"title":"Creating Generator of Pseudowords and Classifying Them by Similarity with Words from Russian Language Dictionary Using Machine Learning","authors":"K. A. Romadanskii, A. E. Akhaev, T. R. Gilyazov","doi":"10.3103/S0005105525700980","DOIUrl":"10.3103/S0005105525700980","url":null,"abstract":"<p>In this article, a pseudoword is defined as a unit of speech or text that appears to be a real word in Russian but actually has no meaning. A real or natural word is a unit of speech or text that has an interpretation and is included in a dictionary. The paper presents two models for working with the Russian language: a generator that creates pseudowords resembling real words and a classifier that evaluates the degree of similarity between the entered sequence of characters and real words. The classifier is used to evaluate the results produced by the generator. Both models are based on recurrent neural networks with long short-term memory layers and are trained on a dataset of Russian nouns. The resulting file contained a list of the generated pseudowords that were then evaluated by the classifier to filter out those that were not similar enough to real words. Pseudowords can be used in naming, branding, and layout design, in art, creative works, and in linguistic studies exploring the language structure and words.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 1","pages":"S59 - S66"},"PeriodicalIF":0.5,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145284497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.3103/S0005105525700876
E. A. Znamenskaya, A. A. Pechnikov, D. E. Chebukov
Scientific coauthorship is a direct reflection of scientific collaboration. There is empirical evidence of the value of coauthorship: articles with a bigger number of authors tend to be cited more often, which is important for calculating various indices. Many foreign studies show an increase in coauthorship both overall and specifically in various scientific disciplines; it is, however, quite difficult to assess the situation with coauthorship with regard to Russian scientists based on Web of Science or Scopus for a number of reasons. Based on data from Math-Net.Ru, this paper looks at some issues of coauthorship in mathematical and computer sciences in Russia. In particular, it shows a small but steady increase in the average number of coauthors per publication and an increase in the number of coauthored articles in 2000–2020.
科学合著是科学合作的直接反映。有经验证据表明合著的价值:作者数量越多的文章往往被引用的次数越多,这对计算各种指数很重要。许多国外研究表明,在各个科学学科中,合作作者的数量总体上和具体上都有所增加;然而,基于Web of Science或Scopus的俄罗斯科学家的合作情况很难评估,原因有很多。基于数学网的数据。因此,本文着眼于俄罗斯数学和计算机科学领域的一些合作问题。特别是,在2000-2020年期间,每份出版物的平均合著者数量和合著文章数量都有小幅但稳定的增长。
{"title":"Scientific Coauthorship in Russian Mathematics in 2000–2020: A Study Based on Leading Russian Journals","authors":"E. A. Znamenskaya, A. A. Pechnikov, D. E. Chebukov","doi":"10.3103/S0005105525700876","DOIUrl":"10.3103/S0005105525700876","url":null,"abstract":"<p>Scientific coauthorship is a direct reflection of scientific collaboration. There is empirical evidence of the value of coauthorship: articles with a bigger number of authors tend to be cited more often, which is important for calculating various indices. Many foreign studies show an increase in coauthorship both overall and specifically in various scientific disciplines; it is, however, quite difficult to assess the situation with coauthorship with regard to Russian scientists based on Web of Science or Scopus for a number of reasons. Based on data from Math-Net.Ru, this paper looks at some issues of coauthorship in mathematical and computer sciences in Russia. In particular, it shows a small but steady increase in the average number of coauthors per publication and an increase in the number of coauthored articles in 2000–2020.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 1","pages":"S8 - S14"},"PeriodicalIF":0.5,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145284480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.3103/S0005105525700967
Yu. E. Polak
The paper examines the phenomenon of joint creativity involving several authors, with examples from various fields of activity. The focus is on information technologies, analyzing inventions from the late 20th century. Their authors are duos of outstanding professionals who combined the talents of a programmer and a manager. They were responsible for the continued development of the IT industry and radically changed the quality of the entire mankind’s way of life. The histories behind the emergence of famous computers, operating systems, the World Wide Web, and network navigation tools are briefly described.
{"title":"Two Heads Are Better Than One","authors":"Yu. E. Polak","doi":"10.3103/S0005105525700967","DOIUrl":"10.3103/S0005105525700967","url":null,"abstract":"<p>The paper examines the phenomenon of joint creativity involving several authors, with examples from various fields of activity. The focus is on information technologies, analyzing inventions from the late 20th century. Their authors are duos of outstanding professionals who combined the talents of a programmer and a manager. They were responsible for the continued development of the IT industry and radically changed the quality of the entire mankind’s way of life. The histories behind the emergence of famous computers, operating systems, the World Wide Web, and network navigation tools are briefly described.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 1","pages":"S37 - S46"},"PeriodicalIF":0.5,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145284492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.3103/S0005105525700992
V. B. Chechnev
The increasing complexity of decision-making pushes organizations to automate this process. One of key elements of this is decision support systems. This paper examines their theoretical and practical aspects. The formulated definition, as well as the provided list of the main attributes and functions of systems of this type, makes it possible to identify one of the most promising areas in the use of artificial intelligence in this area: multiagent systems. The conducted analysis of current decision support systems showed their main competitive advantages and the common weaknesses of many of them, as well as the importance of developing a domestic intelligent decision support system.
{"title":"Using Decision-Support Systems in the Automation of Decision-Making Processes","authors":"V. B. Chechnev","doi":"10.3103/S0005105525700992","DOIUrl":"10.3103/S0005105525700992","url":null,"abstract":"<p>The increasing complexity of decision-making pushes organizations to automate this process. One of key elements of this is decision support systems. This paper examines their theoretical and practical aspects. The formulated definition, as well as the provided list of the main attributes and functions of systems of this type, makes it possible to identify one of the most promising areas in the use of artificial intelligence in this area: multiagent systems. The conducted analysis of current decision support systems showed their main competitive advantages and the common weaknesses of many of them, as well as the importance of developing a domestic intelligent decision support system.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 1","pages":"S67 - S73"},"PeriodicalIF":0.5,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145284495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.3103/S0005105525700979
T. A. Polilova
For many years, the World Wide Web Consortium (W3C) has been promoting the Web Accessibility Initiative (WAI) project, the main goal of which is formulated in the slogan “Making the web accessible.” As part of the WAI initiative, the Web Content Accessibility Guidelines (WCAG) are being developed to help website developers take into account the needs of people with disabilities. GOST R 52872-2019 has been promulgated in the Russian Federation, based on WCAG recommendations. Some provisions of GOST R 52872-2019 are presented in this paper. Law No. 181-FZ, on the social protection of persons with disabilities, which has been in force since 1995, establishes a norm according to which developers of information resources must create conditions for people with disabilities to freely use communications and information. The general provisions of this law are implemented in the directive documents of relevant departments. The paper considers the provisions of the order of the Ministry of Finance of 2023, which determine the procedure for presenting information on the websites of organizations in a form that is convenient for people with vision and hearing problems. The provisions of the Ministry of Finance’s order encourage the developers of websites of organizations subordinate to government bodies at various levels in the Russian Federation to ensure sufficient text contrast, adhere to adaptive design, equip non-text objects with a text layer or comments, simplifying the labor of people with disabilities on the Internet, and contribute to the development of artificial intelligence tools.
多年来,万维网联盟(W3C)一直在推动Web无障碍倡议(WAI)项目,其主要目标是以“使Web无障碍”为口号。作为WAI计划的一部分,现正制订《无障碍网页内容指引》,以协助网站开发者顾及残疾人士的需要。GOST R 52872-2019已根据WCAG的建议在俄罗斯联邦颁布。本文介绍了GOST R 52872-2019的一些规定。自1995年起生效的关于残疾人社会保护的第181-FZ号法律规定了信息资源开发商必须为残疾人自由使用通信和信息创造条件的规范。本法的总则在有关部门的指导性文件中实施。本文考虑了财政部2023年令的规定,该令确定了在组织网站上以方便视力和听力有问题的人的形式显示信息的程序。财政部命令的规定鼓励俄罗斯联邦各级政府机构下属组织的网站开发者确保足够的文本对比度,坚持自适应设计,为非文本对象配备文本层或评论,简化残疾人在互联网上的劳动,并为人工智能工具的发展做出贡献。
{"title":"Accessible Internet: From the WAI Initiative to Russian Practice","authors":"T. A. Polilova","doi":"10.3103/S0005105525700979","DOIUrl":"10.3103/S0005105525700979","url":null,"abstract":"<p>For many years, the World Wide Web Consortium (W3C) has been promoting the Web Accessibility Initiative (WAI) project, the main goal of which is formulated in the slogan “Making the web accessible.” As part of the WAI initiative, the Web Content Accessibility Guidelines (WCAG) are being developed to help website developers take into account the needs of people with disabilities. GOST R 52872-2019 has been promulgated in the Russian Federation, based on WCAG recommendations. Some provisions of GOST R 52872-2019 are presented in this paper. Law No. 181-FZ, on the social protection of persons with disabilities, which has been in force since 1995, establishes a norm according to which developers of information resources must create conditions for people with disabilities to freely use communications and information. The general provisions of this law are implemented in the directive documents of relevant departments. The paper considers the provisions of the order of the Ministry of Finance of 2023, which determine the procedure for presenting information on the websites of organizations in a form that is convenient for people with vision and hearing problems. The provisions of the Ministry of Finance’s order encourage the developers of websites of organizations subordinate to government bodies at various levels in the Russian Federation to ensure sufficient text contrast, adhere to adaptive design, equip non-text objects with a text layer or comments, simplifying the labor of people with disabilities on the Internet, and contribute to the development of artificial intelligence tools.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 1","pages":"S47 - S58"},"PeriodicalIF":0.5,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145284491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.3103/S0005105525700839
A. S. Eremenko
The work is devoted to the design of the concept of the metaverse as a new look at the methods of popularizing scientific knowledge through user interaction with the virtual environment. The features of the metaverse’s construction and technological solutions necessary for its implementation are considered.
{"title":"The Metaverse History of The Earth—A New Look at the Popularization of Geological Knowledge","authors":"A. S. Eremenko","doi":"10.3103/S0005105525700839","DOIUrl":"10.3103/S0005105525700839","url":null,"abstract":"<p>The work is devoted to the design of the concept of the metaverse as a new look at the methods of popularizing scientific knowledge through user interaction with the virtual environment. The features of the metaverse’s construction and technological solutions necessary for its implementation are considered.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 1","pages":"S1 - S7"},"PeriodicalIF":0.5,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145284494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-06DOI: 10.3103/S0005105525700694
A. V. Lapko, V. A. Lapko
A method for evaluating the informative value of arguments for unambiguous stochastic dependence at their specific values under conditions of a priori uncertainty is described. Taking into account the asymptotic properties of a nonparametric collective, a consistent procedure for forming its structure is proposed. The considered collective, by contrast with traditional nonparametric regression, takes into account not only the information contained in the observations of the variables of the reconstructed dependence but also the relationships between them. The peculiarity of the nonparametric collective of linear approximations of the desired dependence is the possibility of its representation in a form sufficient to assess the informative value of arguments according to their specific values. From these positions, a criterion for ranking the arguments of the function being restored according to their significance is defined.
{"title":"A Method for Evaluating the Informative Value of Arguments of a Nonparametric Stochastic Dependence Model with Their Specific Values","authors":"A. V. Lapko, V. A. Lapko","doi":"10.3103/S0005105525700694","DOIUrl":"10.3103/S0005105525700694","url":null,"abstract":"<p>A method for evaluating the informative value of arguments for unambiguous stochastic dependence at their specific values under conditions of a priori uncertainty is described. Taking into account the asymptotic properties of a nonparametric collective, a consistent procedure for forming its structure is proposed. The considered collective, by contrast with traditional nonparametric regression, takes into account not only the information contained in the observations of the variables of the reconstructed dependence but also the relationships between them. The peculiarity of the nonparametric collective of linear approximations of the desired dependence is the possibility of its representation in a form sufficient to assess the informative value of arguments according to their specific values. From these positions, a criterion for ranking the arguments of the function being restored according to their significance is defined.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 4","pages":"252 - 255"},"PeriodicalIF":0.5,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145230263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-06DOI: 10.3103/S0005105525700645
A. V. Mylnikova, L. A. Mylnikov
This paper examines a model for the use of syntactic parsing-based text skeleton structures for the preprocessing of text corpora before they are transferred to MT neural network models to enhance their performance quality. In the paper, a model is suggested for text corpora, which is based on parts-of-speech (POS) tagging and syntactic parsing; this model is implemented on BERT network-based language model and a set of rules. A limited POS tagging dataset is taken in this paper to describe how data are prepared for the training of the model and how its efficiency performance can be improved. POS tagging is used in the paper to obtain syntactic parsing and determine the type of a sentence and word order changes according to the predefined rules. The application of the model, suggested in the paper, together with the MT language models Google and Yandex, allowed MT quality metrics to be increased by 0.1–0.23 according to BLEU and TER for Russian–English and German–English language pairs.
{"title":"Language Models for Texts Preprocessing in Machine Translation","authors":"A. V. Mylnikova, L. A. Mylnikov","doi":"10.3103/S0005105525700645","DOIUrl":"10.3103/S0005105525700645","url":null,"abstract":"<p>This paper examines a model for the use of syntactic parsing-based text skeleton structures for the preprocessing of text corpora before they are transferred to MT neural network models to enhance their performance quality. In the paper, a model is suggested for text corpora, which is based on parts-of-speech (POS) tagging and syntactic parsing; this model is implemented on BERT network-based language model and a set of rules. A limited POS tagging dataset is taken in this paper to describe how data are prepared for the training of the model and how its efficiency performance can be improved. POS tagging is used in the paper to obtain syntactic parsing and determine the type of a sentence and word order changes according to the predefined rules. The application of the model, suggested in the paper, together with the MT language models Google and Yandex, allowed MT quality metrics to be increased by 0.1–0.23 according to BLEU and TER for Russian–English and German–English language pairs.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 4","pages":"256 - 268"},"PeriodicalIF":0.5,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145230176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}