Pub Date : 2025-03-22DOI: 10.1134/S1064562424702272
The AI Journey Team
{"title":"Introductory Words of AI Journey Team","authors":"The AI Journey Team","doi":"10.1134/S1064562424702272","DOIUrl":"10.1134/S1064562424702272","url":null,"abstract":"","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S1 - S1"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424601963
I. A. Vasilev, I. O. Filimonova, M. I. Petrovskiy, I. V. Mashechkin
Reliability analysis is becoming paramount to the successful operation of systems. This paper considers the problem of hardware failure using hard disc drives and solid state drives as examples. Survivability analysis methods are used to predict hardware degradation by estimating the probability of an event occurring over time. Also, survival models account for incomplete data about the true time to event for censored observations. However, popular statistical methods do not account for features of real data such as the presence of outliers and categorical variables. In this paper, we propose to extend classical survival statistical methods by introducing an interpretable stratifying tree, each leaf of which corresponds to a statistical model. The experimental study is based on evaluating the dependence of the models’ quality as the depth of the tree increases. According to the experimental results, the proposed method outperforms classical statistical models. The results of the study demonstrate the effectiveness of the proposed approach and its potential in the field of reliability of complex technical systems.
{"title":"Stratified Statistical Models in Hardware Reliability Analysis","authors":"I. A. Vasilev, I. O. Filimonova, M. I. Petrovskiy, I. V. Mashechkin","doi":"10.1134/S1064562424601963","DOIUrl":"10.1134/S1064562424601963","url":null,"abstract":"<p>Reliability analysis is becoming paramount to the successful operation of systems. This paper considers the problem of hardware failure using hard disc drives and solid state drives as examples. Survivability analysis methods are used to predict hardware degradation by estimating the probability of an event occurring over time. Also, survival models account for incomplete data about the true time to event for censored observations. However, popular statistical methods do not account for features of real data such as the presence of outliers and categorical variables. In this paper, we propose to extend classical survival statistical methods by introducing an interpretable stratifying tree, each leaf of which corresponds to a statistical model. The experimental study is based on evaluating the dependence of the models’ quality as the depth of the tree increases. According to the experimental results, the proposed method outperforms classical statistical models. The results of the study demonstrate the effectiveness of the proposed approach and its potential in the field of reliability of complex technical systems.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S103 - S109"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424601963.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S106456242460204X
A. M. Dostovalova, A. K. Gorshenin
The paper develops an approach to probability informing deep neural networks, that is, improving their results by using various probability models within architectural elements. We introduce factor analyzers with additive and impulse noise components as such models. The identifiability of the model is proved. The relationship between the parameter estimates by the methods of least squares and maximum likelihood is established, which actually means that the estimates of the parameters of the factor analyzer obtained within the informed block are unbiased and consistent. A mathematical model is used to create a new architectural element that implements the fusion of multiscale image features to improve classification accuracy in the case of a small volume of training data. This problem is typical for various applied tasks, including remote sensing data analysis. Various widely used neural network classifiers (EfficientNet, MobileNet, and Xception), both with and without a new informed block, are tested. It is demonstrated that on the open datasets UC Merced (remote sensing data) and Oxford Flowers (flower images), informed neural networks achieve a significant increase in accuracy for this class of tasks: the largest improvement in Top-1 accuracy was 6.67% (mean accuracy without informing equals 87.3%), while Top-5 accuracy increased by 1.49% (mean base accuracy value is 96.27%).
{"title":"Neural Network Image Classifiers Informed by Factor Analyzers","authors":"A. M. Dostovalova, A. K. Gorshenin","doi":"10.1134/S106456242460204X","DOIUrl":"10.1134/S106456242460204X","url":null,"abstract":"<p>The paper develops an approach to probability informing deep neural networks, that is, improving their results by using various probability models within architectural elements. We introduce factor analyzers with additive and impulse noise components as such models. The identifiability of the model is proved. The relationship between the parameter estimates by the methods of least squares and maximum likelihood is established, which actually means that the estimates of the parameters of the factor analyzer obtained within the informed block are unbiased and consistent. A mathematical model is used to create a new architectural element that implements the fusion of multiscale image features to improve classification accuracy in the case of a small volume of training data. This problem is typical for various applied tasks, including remote sensing data analysis. Various widely used neural network classifiers (EfficientNet, MobileNet, and Xception), both with and without a new informed block, are tested. It is demonstrated that on the open datasets UC Merced (remote sensing data) and Oxford Flowers (flower images), informed neural networks achieve a significant increase in accuracy for this class of tasks: the largest improvement in Top-1 accuracy was 6.67% (mean accuracy without informing equals 87.3%), while Top-5 accuracy increased by 1.49% (mean base accuracy value is 96.27%).</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S35 - S41"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S106456242460204X.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602105
I. A. Chuprov, J. Gao, D. S. Efremenko, F. A. Buzaev, V. V. Zemlyakov
Single-mode optical fibers (SMFs) have become the foundation of modern communication systems. However, their capacity is expected to reach its theoretical limit in the near future. The use of multimode fibers (MMF) is seen as one of the most promising solutions to address this capacity deficit. The multimode nonlinear Schrödinger equation (MMNLSE) describing light propagation in MMF is significantly more complex than the equations for SMF, making numerical simulations of MMF-based systems computationally costly and impractical for most realistic scenarios. In this paper, we apply physics-informed neural networks (PINNs) to solve the MMNLSE. We show that a simple implementation of PINNs does not yield satisfactory results. We investigate the convergence of PINN and propose a novel scaling transformation for the zeroth-order dispersion coefficient that allows PINN to account for all important physical effects. Our calculations show good agreement with the Split-Step Fourier (SSF) method for fiber lengths of up to several hundred meters.
{"title":"Solution of the Multimode Nonlinear Schrödinger Equation Using Physics-Informed Neural Networks","authors":"I. A. Chuprov, J. Gao, D. S. Efremenko, F. A. Buzaev, V. V. Zemlyakov","doi":"10.1134/S1064562424602105","DOIUrl":"10.1134/S1064562424602105","url":null,"abstract":"<p>Single-mode optical fibers (SMFs) have become the foundation of modern communication systems. However, their capacity is expected to reach its theoretical limit in the near future. The use of multimode fibers (MMF) is seen as one of the most promising solutions to address this capacity deficit. The multimode nonlinear Schrödinger equation (MMNLSE) describing light propagation in MMF is significantly more complex than the equations for SMF, making numerical simulations of MMF-based systems computationally costly and impractical for most realistic scenarios. In this paper, we apply physics-informed neural networks (PINNs) to solve the MMNLSE. We show that a simple implementation of PINNs does not yield satisfactory results. We investigate the convergence of PINN and propose a novel scaling transformation for the zeroth-order dispersion coefficient that allows PINN to account for all important physical effects. Our calculations show good agreement with the Split-Step Fourier (SSF) method for fiber lengths of up to several hundred meters.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S15 - S24"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602105.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602373
S. Ignatiev, V. Egiazarian, R. Rakhimov, E. Burnaev
In this work, we present a new deep generative model for disentangling image shape from its appearance through differentiable warping. We propose to use implicit neural representations for modeling the deformation field and show that coordinate-based representations hold the necessary inductive bias. Unlike the previous warping-based approaches, which tend to model only local and small-scale displacements, our method is able to learn complex deformations and is not restricted to reversible mappings. We study the convergence of warping-based generative models and find that the high-frequency nature of the textures leads to shattered learning gradients, slow convergence, and suboptimal solutions. To cope with this problem, we propose to use invertible blurring, which smooths the gradients and leads to improved results. As a way to further facilitate the convergence of warping, we train the deformation module jointly as a vanilla GAN generator to guide the learning process in a self-distillation manner. Our complete pipeline shows decent results on the LSUN churches dataset. Finally, we demonstrate various applications of our model, like composable texture editing, controllable deformation editing, and keypoint detection.
{"title":"Deforming Implicit Neural Representation Generative Adversarial Network for Unsupervised Appearence Editing","authors":"S. Ignatiev, V. Egiazarian, R. Rakhimov, E. Burnaev","doi":"10.1134/S1064562424602373","DOIUrl":"10.1134/S1064562424602373","url":null,"abstract":"<p>In this work, we present a new deep generative model for disentangling image shape from its appearance through differentiable warping. We propose to use implicit neural representations for modeling the deformation field and show that coordinate-based representations hold the necessary inductive bias. Unlike the previous warping-based approaches, which tend to model only local and small-scale displacements, our method is able to learn complex deformations and is not restricted to reversible mappings. We study the convergence of warping-based generative models and find that the high-frequency nature of the textures leads to shattered learning gradients, slow convergence, and suboptimal solutions. To cope with this problem, we propose to use invertible blurring, which smooths the gradients and leads to improved results. As a way to further facilitate the convergence of warping, we train the deformation module jointly as a vanilla GAN generator to guide the learning process in a self-distillation manner. Our complete pipeline shows decent results on the LSUN churches dataset. Finally, we demonstrate various applications of our model, like composable texture editing, controllable deformation editing, and keypoint detection.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S299 - S311"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602373.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602324
V. A. Vasilev, V. S. Arkhipkin, J. D. Agafonova, T. V. Nikulina, E. O. Mironova, A. A. Shichanina, N. A. Gerasimenko, M. A. Shoytov, D. V. Dimitrov
Despite the fact that popular text-to-image generation models cope well with international and general cultural queries, they have a significant knowledge gap regarding individual cultures. This is due to the content of existing large training datasets collected on the Internet, which are predominantly based on Western European or American popular culture. Meanwhile, the lack of cultural adaptation of the model can lead to incorrect results, a decrease in the generation quality, and the spread of stereotypes and offensive content. In an effort to address this issue, we examine the concept of cultural code and recognize the critical importance of its understanding by modern image generation models, an issue that has not been sufficiently addressed in the research community to date. We propose the methodology for collecting and processing the data necessary to form a dataset based on the cultural code, in particular the Russian one. We explore how the collected data affects the quality of generations in the national domain and analyze the effectiveness of our approach using the Kandinsky 3.1 text-to-image model. Human evaluation results demonstrate an increase in the level of awareness of Russian culture in the model.
{"title":"CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation","authors":"V. A. Vasilev, V. S. Arkhipkin, J. D. Agafonova, T. V. Nikulina, E. O. Mironova, A. A. Shichanina, N. A. Gerasimenko, M. A. Shoytov, D. V. Dimitrov","doi":"10.1134/S1064562424602324","DOIUrl":"10.1134/S1064562424602324","url":null,"abstract":"<p>Despite the fact that popular text-to-image generation models cope well with international and general cultural queries, they have a significant knowledge gap regarding individual cultures. This is due to the content of existing large training datasets collected on the Internet, which are predominantly based on Western European or American popular culture. Meanwhile, the lack of cultural adaptation of the model can lead to incorrect results, a decrease in the generation quality, and the spread of stereotypes and offensive content. In an effort to address this issue, we examine the concept of cultural code and recognize the critical importance of its understanding by modern image generation models, an issue that has not been sufficiently addressed in the research community to date. We propose the methodology for collecting and processing the data necessary to form a dataset based on the cultural code, in particular the Russian one. We explore how the collected data affects the quality of generations in the national domain and analyze the effectiveness of our approach using the Kandinsky 3.1 text-to-image model. Human evaluation results demonstrate an increase in the level of awareness of Russian culture in the model.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S137 - S150"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602385
S. Ivanov, S. Sviridov, E. Burnaev
There is an increasing interest in developing new models for graph classification problem that serves as a common benchmark for evaluation and comparison of GNNs and graph kernels. To ensure a fair comparison of the models several commonly used datasets exist and current assessments and conclusions rely on the validity of these datasets. However, as we show in this paper majority of these datasets contain isomorphic copies of the data points, which can lead to misleading conclusions. For example, the relative ranking of the graph models can change substantially if we remove isomorphic graphs in the test set.
To mitigate this we present several results. We show that explicitly incorporating the knowledge of isomorphism in the datasets can significantly boost the performance of any graph model. Finally, we re-evaluate commonly used graph models on refined graph datasets and provide recommendations for designing new datasets and metrics for graph classification problem.
{"title":"Rethinking Graph Classification Problem in Presence of Isomorphism","authors":"S. Ivanov, S. Sviridov, E. Burnaev","doi":"10.1134/S1064562424602385","DOIUrl":"10.1134/S1064562424602385","url":null,"abstract":"<p>There is an increasing interest in developing new models for graph classification problem that serves as a common benchmark for evaluation and comparison of GNNs and graph kernels. To ensure a fair comparison of the models several commonly used datasets exist and current assessments and conclusions rely on the validity of these datasets. However, as we show in this paper majority of these datasets contain isomorphic copies of the data points, which can lead to misleading conclusions. For example, the relative ranking of the graph models can change substantially if we remove isomorphic graphs in the test set.</p><p>To mitigate this we present several results. We show that explicitly incorporating the knowledge of isomorphism in the datasets can significantly boost the performance of any graph model. Finally, we re-evaluate commonly used graph models on refined graph datasets and provide recommendations for designing new datasets and metrics for graph classification problem.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S312 - S331"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602385.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602191
A. Vatolin, N. Gerasimenko, A. Ianina, K. Vorontsov
Sharing scientific knowledge in the community is an important endeavor. However, most papers are written in English, which makes dissemination of knowledge in countries where English is not spoken by the majority of people harder. Nowadays, machine translation and language models may help to solve this problem, but it is still complicated to train and evaluate models in languages other than English with no or little data in the required language. To address this, we propose the first benchmark for evaluating models on scientific texts in Russian. It consists of papers from Russian electronic library of scientific publications. We also present a set of tasks which can be used to fine-tune various models on our data and provide a detailed comparison between state-of-the-art models on our benchmark.
{"title":"RuSciBench: Open Benchmark for Russian and English Scientific Document Representations","authors":"A. Vatolin, N. Gerasimenko, A. Ianina, K. Vorontsov","doi":"10.1134/S1064562424602191","DOIUrl":"10.1134/S1064562424602191","url":null,"abstract":"<p>Sharing scientific knowledge in the community is an important endeavor. However, most papers are written in English, which makes dissemination of knowledge in countries where English is not spoken by the majority of people harder. Nowadays, machine translation and language models may help to solve this problem, but it is still complicated to train and evaluate models in languages other than English with no or little data in the required language. To address this, we propose the first benchmark for evaluating models on scientific texts in Russian. It consists of papers from Russian electronic library of scientific publications. We also present a set of tasks which can be used to fine-tune various models on our data and provide a detailed comparison between state-of-the-art models on our benchmark.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S251 - S260"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602191.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602221
A. Ermilova, E. Kovtun, D. Berestnev, A. Zaytsev
Deep learning’s emerging role in the financial sector’s decision-making introduces risks of adversarial attacks. A specific threat is a poisoning attack that modifies the training sample to develop a backdoor that persists during model usage. However, data cleaning procedures and routine model checks are easy-to-implement actions that prevent the usage of poisoning attacks. The problem is even more challenging for event sequence models, for which it is hard to design an attack due to the discrete nature of the data. We start with a general investigation of the possibility of poisoning for event sequence models. Then, we propose a concealed poisoning attack that can bypass natural banks’ defences. The empirical investigation shows that the developed poisoned model trained on contaminated data passes the check procedure, being similar to a clean model, and simultaneously contains a simple to-implement backdoor.
{"title":"Hiding Backdoors within Event Sequence Data via Poisoning Attacks","authors":"A. Ermilova, E. Kovtun, D. Berestnev, A. Zaytsev","doi":"10.1134/S1064562424602221","DOIUrl":"10.1134/S1064562424602221","url":null,"abstract":"<p>Deep learning’s emerging role in the financial sector’s decision-making introduces risks of adversarial attacks. A specific threat is a poisoning attack that modifies the training sample to develop a backdoor that persists during model usage. However, data cleaning procedures and routine model checks are easy-to-implement actions that prevent the usage of poisoning attacks. The problem is even more challenging for event sequence models, for which it is hard to design an attack due to the discrete nature of the data. We start with a general investigation of the possibility of poisoning for event sequence models. Then, we propose a concealed poisoning attack that can bypass natural banks’ defences. The empirical investigation shows that the developed poisoned model trained on contaminated data passes the check procedure, being similar to a clean model, and simultaneously contains a simple to-implement backdoor.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S288 - S298"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602221.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-22DOI: 10.1134/S1064562424602038
Y. Dayoub, I. Makarov
Depth estimation is a crucial task across various domains, but the high cost of collecting labeled depth data has led to growing interest in self-supervised monocular depth estimation methods. In this paper, we introduce SwiftDepth++, a lightweight depth estimation model that delivers competitive results while maintaining a low computational budget. The core innovation of SwiftDepth++ lies in its novel depth decoder, which enhances efficiency by rapidly compressing features while preserving essential information. Additionally, we incorporate a teacher-student knowledge distillation framework that guides the student model in refining its predictions. We evaluate SwiftDepth++ on the KITTI and NYU datasets, where it achieves an absolute relative error (Abs_rel) of 10.2% on the KITTI dataset and 22% on the NYU dataset without fine-tuning, all with approximately 6 million parameters. These results demonstrate that SwiftDepth++ not only meets the demands of modern depth estimation tasks but also significantly reduces computational complexity, making it a practical choice for real-world applications.
{"title":"SwiftDepth++: An Efficient and Lightweight Model for Accurate Depth Estimation","authors":"Y. Dayoub, I. Makarov","doi":"10.1134/S1064562424602038","DOIUrl":"10.1134/S1064562424602038","url":null,"abstract":"<p>Depth estimation is a crucial task across various domains, but the high cost of collecting labeled depth data has led to growing interest in self-supervised monocular depth estimation methods. In this paper, we introduce SwiftDepth++, a lightweight depth estimation model that delivers competitive results while maintaining a low computational budget. The core innovation of SwiftDepth++ lies in its novel depth decoder, which enhances efficiency by rapidly compressing features while preserving essential information. Additionally, we incorporate a teacher-student knowledge distillation framework that guides the student model in refining its predictions. We evaluate SwiftDepth++ on the KITTI and NYU datasets, where it achieves an absolute relative error (Abs_rel) of 10.2% on the KITTI dataset and 22% on the NYU dataset without fine-tuning, all with approximately 6 million parameters. These results demonstrate that SwiftDepth++ not only meets the demands of modern depth estimation tasks but also significantly reduces computational complexity, making it a practical choice for real-world applications.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S162 - S171"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602038.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}