Pub Date : 2025-03-22DOI: 10.1134/S1064562424601999
A. V. Konstantinov, S. R. Kirpichenko, L. V. Utkin
A new model for generating survival trajectories and data based on applying an autoencoder of a specific structure is proposed. It solves three tasks. First, it provides predictions in the form of the expected event time and the survival function for a new feature vector based on the Beran estimator. Second, the model generates additional data based on a given training set that would supplement the original dataset. Third, the most important, it generates a prototype time-dependent trajectory for an object, which characterizes how features of the object could be changed to achieve a different time to an event. The trajectory can be viewed as a type of the counterfactual explanation. The proposed model is robust during training and inference due to a specific weighting scheme incorporated into the variational autoencoder. The model also determines the censored indicators of new generated data by solving a classification task. The paper demonstrates the efficiency and properties of the proposed model using numerical experiments on synthetic and real datasets. The code of the algorithm implementing the proposed model is publicly available.
{"title":"Generating Survival Interpretable Trajectories and Data","authors":"A. V. Konstantinov, S. R. Kirpichenko, L. V. Utkin","doi":"10.1134/S1064562424601999","DOIUrl":"10.1134/S1064562424601999","url":null,"abstract":"<p>A new model for generating survival trajectories and data based on applying an autoencoder of a specific structure is proposed. It solves three tasks. First, it provides predictions in the form of the expected event time and the survival function for a new feature vector based on the Beran estimator. Second, the model generates additional data based on a given training set that would supplement the original dataset. Third, the most important, it generates a prototype time-dependent trajectory for an object, which characterizes how features of the object could be changed to achieve a different time to an event. The trajectory can be viewed as a type of the counterfactual explanation. The proposed model is robust during training and inference due to a specific weighting scheme incorporated into the variational autoencoder. The model also determines the censored indicators of new generated data by solving a classification task. The paper demonstrates the efficiency and properties of the proposed model using numerical experiments on synthetic and real datasets. The code of the algorithm implementing the proposed model is publicly available.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S75 - S86"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424601999.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602099
M. I. Nesterova, A. A. Skrynnik, A. I. Panov
Reinforcement learning encompasses various approaches that involve training an agent on multiple tasks. These approaches include training a general agent capable of executing a wide range of tasks and training a specialized agent focused on mastering a specific skill. Curriculum learning strategically orders tasks to optimize the learning process, enhancing training efficiency and improving overall performance. Researchers developing novel methods must select appropriate environments for evaluation and comparison with other methods. We introduce an overview of environments suitable for assessing curriculum learning methods, highlighting their key differences. This work details task components, modifications, and a classification of existing curriculum learning methods. We aim to provide researchers with valuable insights into the selection and utilization of environments for evaluating curriculum learning approaches.
{"title":"Environments for Automatic Curriculum Learning: A Short Survey","authors":"M. I. Nesterova, A. A. Skrynnik, A. I. Panov","doi":"10.1134/S1064562424602099","DOIUrl":"10.1134/S1064562424602099","url":null,"abstract":"<p>Reinforcement learning encompasses various approaches that involve training an agent on multiple tasks. These approaches include training a general agent capable of executing a wide range of tasks and training a specialized agent focused on mastering a specific skill. Curriculum learning strategically orders tasks to optimize the learning process, enhancing training efficiency and improving overall performance. Researchers developing novel methods must select appropriate environments for evaluation and comparison with other methods. We introduce an overview of environments suitable for assessing curriculum learning methods, highlighting their key differences. This work details task components, modifications, and a classification of existing curriculum learning methods. We aim to provide researchers with valuable insights into the selection and utilization of environments for evaluating curriculum learning approaches.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S223 - S229"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602099.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602166
Z. A. Volovikova, M. P. Kuznetsova, A. A. Skrynnik, A. I. Panov
This article presents a review and comparative analysis of multimodal virtual environments for reinforcement learning. Seven different environments are considered, including the HomeGrid, BabyAI, RTFM, Messenger, Touchdown, Alfred, and IGLU, and research is focused on their peculiarities and requirements to agents. The main attention is paid to such parameters as complexity of text instructions and the dynamic properties of the environment. The conducted analysis identifies the strengths and weaknesses of each environment, which allows determining the optimal conditions for effective agent training, and also emphasizes the need to create more balanced environments combining high requirements to both understanding of language and interaction with the surrounding.
{"title":"Review of Multimodal Environments for Reinforcement Learning","authors":"Z. A. Volovikova, M. P. Kuznetsova, A. A. Skrynnik, A. I. Panov","doi":"10.1134/S1064562424602166","DOIUrl":"10.1134/S1064562424602166","url":null,"abstract":"<p>This article presents a review and comparative analysis of multimodal virtual environments for reinforcement learning. Seven different environments are considered, including the HomeGrid, BabyAI, RTFM, Messenger, Touchdown, Alfred, and IGLU, and research is focused on their peculiarities and requirements to agents. The main attention is paid to such parameters as complexity of text instructions and the dynamic properties of the environment. The conducted analysis identifies the strengths and weaknesses of each environment, which allows determining the optimal conditions for effective agent training, and also emphasizes the need to create more balanced environments combining high requirements to both understanding of language and interaction with the surrounding.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S110 - S116"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602166.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602063
M. R. Kodenko, T. M. Bobrovskaya, R. V. Reshetnikov, K. M. Arzamasov, A. V. Vladzymyrskyy, O. V. Omelyanskaya, Yu. A. Vasilev
Calculation of sample size is one of the basic tasks in the field of correct and objective testing of artificial intelligence (AI) algorithms. Existing approaches, despite their exhaustive theoretical justification, can give results that differ by an order of magnitude under the same initial conditions. Most of the input parameters for such methods are determined by the researcher intuitively or on the basis of relevant literature data in the subject area. Such uncertainty at the research planning stage is associated with a high risk of obtaining biased results, which is especially important to take into account when using AI algorithms for medical diagnosis. Within the framework of this work, an empirical study of the value of the minimum required sample size of radiology diagnostic studies to obtain an objective value of the AUROC metric was conducted. An algorithm for calculating the threshold value of sample size according to the criterion of no statistically significant changes in the metric value in case of increasing this size was developed and implemented in software format. Using datasets containing the results of testing of AI algorithms on mammographic and radiographic studies with the total volume of more than 300 thousand, the empirical threshold for the sample size from 30 to 25 thousand studies with different relative content of pathology—from 10 to 90%—was calculated. The proposed algorithm allows obtaining results invariant to the balance of classes in the sample, the target value of AUROC, the modality of studies, and the AI algorithm. The empirical value of the minimum sufficient sample size for testing the AI algorithm for binary classification, obtained by analyzing over 2 million estimated values, is 400 studies. The results can be used to solve the problems of development and testing of diagnostic tools, including AI algorithms.
{"title":"Empirical Approach to Sample Size Estimation for Testing of AI Algorithms","authors":"M. R. Kodenko, T. M. Bobrovskaya, R. V. Reshetnikov, K. M. Arzamasov, A. V. Vladzymyrskyy, O. V. Omelyanskaya, Yu. A. Vasilev","doi":"10.1134/S1064562424602063","DOIUrl":"10.1134/S1064562424602063","url":null,"abstract":"<p>Calculation of sample size is one of the basic tasks in the field of correct and objective testing of artificial intelligence (AI) algorithms. Existing approaches, despite their exhaustive theoretical justification, can give results that differ by an order of magnitude under the same initial conditions. Most of the input parameters for such methods are determined by the researcher intuitively or on the basis of relevant literature data in the subject area. Such uncertainty at the research planning stage is associated with a high risk of obtaining biased results, which is especially important to take into account when using AI algorithms for medical diagnosis. Within the framework of this work, an empirical study of the value of the minimum required sample size of radiology diagnostic studies to obtain an objective value of the AUROC metric was conducted. An algorithm for calculating the threshold value of sample size according to the criterion of no statistically significant changes in the metric value in case of increasing this size was developed and implemented in software format. Using datasets containing the results of testing of AI algorithms on mammographic and radiographic studies with the total volume of more than 300 thousand, the empirical threshold for the sample size from 30 to 25 thousand studies with different relative content of pathology—from 10 to 90%—was calculated. The proposed algorithm allows obtaining results invariant to the balance of classes in the sample, the target value of AUROC, the modality of studies, and the AI algorithm. The empirical value of the minimum sufficient sample size for testing the AI algorithm for binary classification, obtained by analyzing over 2 million estimated values, is 400 studies. The results can be used to solve the problems of development and testing of diagnostic tools, including AI algorithms.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S62 - S74"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602063.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424702272
The AI Journey Team
{"title":"Introductory Words of AI Journey Team","authors":"The AI Journey Team","doi":"10.1134/S1064562424702272","DOIUrl":"10.1134/S1064562424702272","url":null,"abstract":"","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S1 - S1"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424601963
I. A. Vasilev, I. O. Filimonova, M. I. Petrovskiy, I. V. Mashechkin
Reliability analysis is becoming paramount to the successful operation of systems. This paper considers the problem of hardware failure using hard disc drives and solid state drives as examples. Survivability analysis methods are used to predict hardware degradation by estimating the probability of an event occurring over time. Also, survival models account for incomplete data about the true time to event for censored observations. However, popular statistical methods do not account for features of real data such as the presence of outliers and categorical variables. In this paper, we propose to extend classical survival statistical methods by introducing an interpretable stratifying tree, each leaf of which corresponds to a statistical model. The experimental study is based on evaluating the dependence of the models’ quality as the depth of the tree increases. According to the experimental results, the proposed method outperforms classical statistical models. The results of the study demonstrate the effectiveness of the proposed approach and its potential in the field of reliability of complex technical systems.
{"title":"Stratified Statistical Models in Hardware Reliability Analysis","authors":"I. A. Vasilev, I. O. Filimonova, M. I. Petrovskiy, I. V. Mashechkin","doi":"10.1134/S1064562424601963","DOIUrl":"10.1134/S1064562424601963","url":null,"abstract":"<p>Reliability analysis is becoming paramount to the successful operation of systems. This paper considers the problem of hardware failure using hard disc drives and solid state drives as examples. Survivability analysis methods are used to predict hardware degradation by estimating the probability of an event occurring over time. Also, survival models account for incomplete data about the true time to event for censored observations. However, popular statistical methods do not account for features of real data such as the presence of outliers and categorical variables. In this paper, we propose to extend classical survival statistical methods by introducing an interpretable stratifying tree, each leaf of which corresponds to a statistical model. The experimental study is based on evaluating the dependence of the models’ quality as the depth of the tree increases. According to the experimental results, the proposed method outperforms classical statistical models. The results of the study demonstrate the effectiveness of the proposed approach and its potential in the field of reliability of complex technical systems.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S103 - S109"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424601963.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S106456242460204X
A. M. Dostovalova, A. K. Gorshenin
The paper develops an approach to probability informing deep neural networks, that is, improving their results by using various probability models within architectural elements. We introduce factor analyzers with additive and impulse noise components as such models. The identifiability of the model is proved. The relationship between the parameter estimates by the methods of least squares and maximum likelihood is established, which actually means that the estimates of the parameters of the factor analyzer obtained within the informed block are unbiased and consistent. A mathematical model is used to create a new architectural element that implements the fusion of multiscale image features to improve classification accuracy in the case of a small volume of training data. This problem is typical for various applied tasks, including remote sensing data analysis. Various widely used neural network classifiers (EfficientNet, MobileNet, and Xception), both with and without a new informed block, are tested. It is demonstrated that on the open datasets UC Merced (remote sensing data) and Oxford Flowers (flower images), informed neural networks achieve a significant increase in accuracy for this class of tasks: the largest improvement in Top-1 accuracy was 6.67% (mean accuracy without informing equals 87.3%), while Top-5 accuracy increased by 1.49% (mean base accuracy value is 96.27%).
{"title":"Neural Network Image Classifiers Informed by Factor Analyzers","authors":"A. M. Dostovalova, A. K. Gorshenin","doi":"10.1134/S106456242460204X","DOIUrl":"10.1134/S106456242460204X","url":null,"abstract":"<p>The paper develops an approach to probability informing deep neural networks, that is, improving their results by using various probability models within architectural elements. We introduce factor analyzers with additive and impulse noise components as such models. The identifiability of the model is proved. The relationship between the parameter estimates by the methods of least squares and maximum likelihood is established, which actually means that the estimates of the parameters of the factor analyzer obtained within the informed block are unbiased and consistent. A mathematical model is used to create a new architectural element that implements the fusion of multiscale image features to improve classification accuracy in the case of a small volume of training data. This problem is typical for various applied tasks, including remote sensing data analysis. Various widely used neural network classifiers (EfficientNet, MobileNet, and Xception), both with and without a new informed block, are tested. It is demonstrated that on the open datasets UC Merced (remote sensing data) and Oxford Flowers (flower images), informed neural networks achieve a significant increase in accuracy for this class of tasks: the largest improvement in Top-1 accuracy was 6.67% (mean accuracy without informing equals 87.3%), while Top-5 accuracy increased by 1.49% (mean base accuracy value is 96.27%).</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S35 - S41"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S106456242460204X.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602105
I. A. Chuprov, J. Gao, D. S. Efremenko, F. A. Buzaev, V. V. Zemlyakov
Single-mode optical fibers (SMFs) have become the foundation of modern communication systems. However, their capacity is expected to reach its theoretical limit in the near future. The use of multimode fibers (MMF) is seen as one of the most promising solutions to address this capacity deficit. The multimode nonlinear Schrödinger equation (MMNLSE) describing light propagation in MMF is significantly more complex than the equations for SMF, making numerical simulations of MMF-based systems computationally costly and impractical for most realistic scenarios. In this paper, we apply physics-informed neural networks (PINNs) to solve the MMNLSE. We show that a simple implementation of PINNs does not yield satisfactory results. We investigate the convergence of PINN and propose a novel scaling transformation for the zeroth-order dispersion coefficient that allows PINN to account for all important physical effects. Our calculations show good agreement with the Split-Step Fourier (SSF) method for fiber lengths of up to several hundred meters.
{"title":"Solution of the Multimode Nonlinear Schrödinger Equation Using Physics-Informed Neural Networks","authors":"I. A. Chuprov, J. Gao, D. S. Efremenko, F. A. Buzaev, V. V. Zemlyakov","doi":"10.1134/S1064562424602105","DOIUrl":"10.1134/S1064562424602105","url":null,"abstract":"<p>Single-mode optical fibers (SMFs) have become the foundation of modern communication systems. However, their capacity is expected to reach its theoretical limit in the near future. The use of multimode fibers (MMF) is seen as one of the most promising solutions to address this capacity deficit. The multimode nonlinear Schrödinger equation (MMNLSE) describing light propagation in MMF is significantly more complex than the equations for SMF, making numerical simulations of MMF-based systems computationally costly and impractical for most realistic scenarios. In this paper, we apply physics-informed neural networks (PINNs) to solve the MMNLSE. We show that a simple implementation of PINNs does not yield satisfactory results. We investigate the convergence of PINN and propose a novel scaling transformation for the zeroth-order dispersion coefficient that allows PINN to account for all important physical effects. Our calculations show good agreement with the Split-Step Fourier (SSF) method for fiber lengths of up to several hundred meters.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S15 - S24"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602105.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602373
S. Ignatiev, V. Egiazarian, R. Rakhimov, E. Burnaev
In this work, we present a new deep generative model for disentangling image shape from its appearance through differentiable warping. We propose to use implicit neural representations for modeling the deformation field and show that coordinate-based representations hold the necessary inductive bias. Unlike the previous warping-based approaches, which tend to model only local and small-scale displacements, our method is able to learn complex deformations and is not restricted to reversible mappings. We study the convergence of warping-based generative models and find that the high-frequency nature of the textures leads to shattered learning gradients, slow convergence, and suboptimal solutions. To cope with this problem, we propose to use invertible blurring, which smooths the gradients and leads to improved results. As a way to further facilitate the convergence of warping, we train the deformation module jointly as a vanilla GAN generator to guide the learning process in a self-distillation manner. Our complete pipeline shows decent results on the LSUN churches dataset. Finally, we demonstrate various applications of our model, like composable texture editing, controllable deformation editing, and keypoint detection.
{"title":"Deforming Implicit Neural Representation Generative Adversarial Network for Unsupervised Appearence Editing","authors":"S. Ignatiev, V. Egiazarian, R. Rakhimov, E. Burnaev","doi":"10.1134/S1064562424602373","DOIUrl":"10.1134/S1064562424602373","url":null,"abstract":"<p>In this work, we present a new deep generative model for disentangling image shape from its appearance through differentiable warping. We propose to use implicit neural representations for modeling the deformation field and show that coordinate-based representations hold the necessary inductive bias. Unlike the previous warping-based approaches, which tend to model only local and small-scale displacements, our method is able to learn complex deformations and is not restricted to reversible mappings. We study the convergence of warping-based generative models and find that the high-frequency nature of the textures leads to shattered learning gradients, slow convergence, and suboptimal solutions. To cope with this problem, we propose to use invertible blurring, which smooths the gradients and leads to improved results. As a way to further facilitate the convergence of warping, we train the deformation module jointly as a vanilla GAN generator to guide the learning process in a self-distillation manner. Our complete pipeline shows decent results on the LSUN churches dataset. Finally, we demonstrate various applications of our model, like composable texture editing, controllable deformation editing, and keypoint detection.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S299 - S311"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602373.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602324
V. A. Vasilev, V. S. Arkhipkin, J. D. Agafonova, T. V. Nikulina, E. O. Mironova, A. A. Shichanina, N. A. Gerasimenko, M. A. Shoytov, D. V. Dimitrov
Despite the fact that popular text-to-image generation models cope well with international and general cultural queries, they have a significant knowledge gap regarding individual cultures. This is due to the content of existing large training datasets collected on the Internet, which are predominantly based on Western European or American popular culture. Meanwhile, the lack of cultural adaptation of the model can lead to incorrect results, a decrease in the generation quality, and the spread of stereotypes and offensive content. In an effort to address this issue, we examine the concept of cultural code and recognize the critical importance of its understanding by modern image generation models, an issue that has not been sufficiently addressed in the research community to date. We propose the methodology for collecting and processing the data necessary to form a dataset based on the cultural code, in particular the Russian one. We explore how the collected data affects the quality of generations in the national domain and analyze the effectiveness of our approach using the Kandinsky 3.1 text-to-image model. Human evaluation results demonstrate an increase in the level of awareness of Russian culture in the model.
{"title":"CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation","authors":"V. A. Vasilev, V. S. Arkhipkin, J. D. Agafonova, T. V. Nikulina, E. O. Mironova, A. A. Shichanina, N. A. Gerasimenko, M. A. Shoytov, D. V. Dimitrov","doi":"10.1134/S1064562424602324","DOIUrl":"10.1134/S1064562424602324","url":null,"abstract":"<p>Despite the fact that popular text-to-image generation models cope well with international and general cultural queries, they have a significant knowledge gap regarding individual cultures. This is due to the content of existing large training datasets collected on the Internet, which are predominantly based on Western European or American popular culture. Meanwhile, the lack of cultural adaptation of the model can lead to incorrect results, a decrease in the generation quality, and the spread of stereotypes and offensive content. In an effort to address this issue, we examine the concept of cultural code and recognize the critical importance of its understanding by modern image generation models, an issue that has not been sufficiently addressed in the research community to date. We propose the methodology for collecting and processing the data necessary to form a dataset based on the cultural code, in particular the Russian one. We explore how the collected data affects the quality of generations in the national domain and analyze the effectiveness of our approach using the Kandinsky 3.1 text-to-image model. Human evaluation results demonstrate an increase in the level of awareness of Russian culture in the model.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S137 - S150"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}