Pub Date : 2025-09-17DOI: 10.3103/S1060992X25700092
D. A. Yudin
3D mapping in dynamic environments poses a challenge for modern researchers in robotics and autonomous transportation. There are no universal representations for dynamic 3D scenes that incorporate multimodal data such as images, point clouds, and text. This article takes a step toward solving this problem. It proposes a taxonomy of methods for constructing multimodal 3D maps, classifying contemporary approaches based on scene types and representations, learning methods, and practical applications. Using this taxonomy, a brief structured analysis of recent methods is provided. The article also describes an original modular method called M3DMap, designed for object-aware construction of multimodal 3D maps for both static and dynamic scenes. It consists of several interconnected components: a neural multimodal object segmentation and tracking module; an odometry estimation module, including trainable algorithms; a module for 3D map construction and updating with various implementations depending on the desired scene representation; and a multimodal data retrieval module. The article highlights original implementations of these modules and their advantages in solving various practical tasks, from 3D object grounding to mobile manipulation. Additionally, it presents theoretical propositions demonstrating the positive effect of using multimodal data and modern foundational models in 3D mapping methods. Details of the taxonomy and method implementation are available at https://yuddim.github.io/M3DMap.
{"title":"M3DMap: Object-Aware Multimodal 3D Mapping for Dynamic Environments","authors":"D. A. Yudin","doi":"10.3103/S1060992X25700092","DOIUrl":"10.3103/S1060992X25700092","url":null,"abstract":"<p>3D mapping in dynamic environments poses a challenge for modern researchers in robotics and autonomous transportation. There are no universal representations for dynamic 3D scenes that incorporate multimodal data such as images, point clouds, and text. This article takes a step toward solving this problem. It proposes a taxonomy of methods for constructing multimodal 3D maps, classifying contemporary approaches based on scene types and representations, learning methods, and practical applications. Using this taxonomy, a brief structured analysis of recent methods is provided. The article also describes an original modular method called M3DMap, designed for object-aware construction of multimodal 3D maps for both static and dynamic scenes. It consists of several interconnected components: a neural multimodal object segmentation and tracking module; an odometry estimation module, including trainable algorithms; a module for 3D map construction and updating with various implementations depending on the desired scene representation; and a multimodal data retrieval module. The article highlights original implementations of these modules and their advantages in solving various practical tasks, from 3D object grounding to mobile manipulation. Additionally, it presents theoretical propositions demonstrating the positive effect of using multimodal data and modern foundational models in 3D mapping methods. Details of the taxonomy and method implementation are available at https://yuddim.github.io/M3DMap.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"285 - 312"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X2570016X
M. Arumugam, C. Jayanthi
{"title":"Erratum to: Consumer Behavior Analysis in Social Networking Big Data Using Correlated Extreme Learning","authors":"M. Arumugam, C. Jayanthi","doi":"10.3103/S1060992X2570016X","DOIUrl":"10.3103/S1060992X2570016X","url":null,"abstract":"","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"470 - 470"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X25700109
Huzhenyu Zhang
Cross-floor robotic navigation has become an increasingly critical capability for autonomous systems operating in multi-floor buildings. While 3D scene graphs have demonstrated promise for representing hierarchical spatial relationships, current approaches predominantly address cross-floor navigation by stairs, overlooking the practical challenges of elevator-mediated navigation in modern buildings. This paper presents ElevNav, a novel framework that bridges this gap through two key innovations: (1) automatic construction of semantically-rich 3D scene graphs from RGB-D sequences with estimated camera trajectories, and (2) task decomposition using large language models to translate natural language commands into executable action sequences. Our method addresses elevator interaction through specialized action primitives such as pressing buttons, entering and exiting the elevator, and moving toward target objects. We evaluate ElevNav in complex simulated environments built using Isaac Sim, demonstrating robust performance in multi-floor navigation scenarios. To facilitate further research, we release a new dataset containing elevator environments with corresponding scene graph representations, addressing a critical gap in existing 3D navigation benchmarks, which is open-sourced at: https://github.com/zhanghuzhenyu/elevnav.
{"title":"ElevNav: Large Language Model-Guided Robot Navigation via 3D Scene Graphs in Elevator Environments","authors":"Huzhenyu Zhang","doi":"10.3103/S1060992X25700109","DOIUrl":"10.3103/S1060992X25700109","url":null,"abstract":"<p>Cross-floor robotic navigation has become an increasingly critical capability for autonomous systems operating in multi-floor buildings. While 3D scene graphs have demonstrated promise for representing hierarchical spatial relationships, current approaches predominantly address cross-floor navigation by stairs, overlooking the practical challenges of elevator-mediated navigation in modern buildings. This paper presents <b>ElevNav</b>, a novel framework that bridges this gap through two key innovations: (1) automatic construction of semantically-rich 3D scene graphs from RGB-D sequences with estimated camera trajectories, and (2) task decomposition using large language models to translate natural language commands into executable action sequences. Our method addresses elevator interaction through specialized action primitives such as pressing buttons, entering and exiting the elevator, and moving toward target objects. We evaluate ElevNav in complex simulated environments built using Isaac Sim, demonstrating robust performance in multi-floor navigation scenarios. To facilitate further research, we release a new dataset containing elevator environments with corresponding scene graph representations, addressing a critical gap in existing 3D navigation benchmarks, which is open-sourced at: https://github.com/zhanghuzhenyu/elevnav.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"313 - 322"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X25700134
Y. Lins Joy, S. Jerine
Skin diseases are among the most frequent and pervasive conditions affecting individuals all over the world. The two primary causes of skin cancer are climate change and global warming. If skin conditions are not identified and treated promptly, they may become fatal. Advanced ML and DL approaches on skin diseases often face limitations such as insufficient data diversity, high variability in imaging quality, and challenges in accurately distinguishing between similar-looking conditions. These drawbacks can lead to reduce diagnostic accuracy and generalizability of the models. To overcome the aforementioned challenges, an improved segmentation and hybrid deep learning approach is used to identify numerous kinds of skin disease. Initially, raw images for input are collected from the skin disease image dataset. The collected image is pre-processed with resizing and a Hierarchical Noise Deinterlace Net (HNDN) to remove noise. The pre-processed images are then segmented into different parts or regions using the no new U-Network (nnU-Net). Here, the Marine Predator Algorithm (MPA) is used to choose the nnU-Net learning rate, and batch size optimally. Then, the segmented image is subjected to a hybrid Efficient-capsule network (E-cap Net) and Unified force operation network (UFO-Net) classifier predicting several types of skin disease. An analysis of proposed method’s simulation results indicates that it achieves 97.49% accuracy, 90.06% precision, and 98.56% selectivity. Thus, the proposed method is a most effective method for predicting the multi-type skin disease.
{"title":"Efficient Skin Disease Diagnosis Using Optimized nnU-Net Segmentation and Hybrid E-Cap Net with UFO-Net","authors":"Y. Lins Joy, S. Jerine","doi":"10.3103/S1060992X25700134","DOIUrl":"10.3103/S1060992X25700134","url":null,"abstract":"<p>Skin diseases are among the most frequent and pervasive conditions affecting individuals all over the world. The two primary causes of skin cancer are climate change and global warming. If skin conditions are not identified and treated promptly, they may become fatal. Advanced ML and DL approaches on skin diseases often face limitations such as insufficient data diversity, high variability in imaging quality, and challenges in accurately distinguishing between similar-looking conditions. These drawbacks can lead to reduce diagnostic accuracy and generalizability of the models. To overcome the aforementioned challenges, an improved segmentation and hybrid deep learning approach is used to identify numerous kinds of skin disease. Initially, raw images for input are collected from the skin disease image dataset. The collected image is pre-processed with resizing and a Hierarchical Noise Deinterlace Net (HNDN) to remove noise. The pre-processed images are then segmented into different parts or regions using the no new U-Network (nnU-Net). Here, the Marine Predator Algorithm (MPA) is used to choose the nnU-Net learning rate, and batch size optimally. Then, the segmented image is subjected to a hybrid Efficient-capsule network (E-cap Net) and Unified force operation network (UFO-Net) classifier predicting several types of skin disease. An analysis of proposed method’s simulation results indicates that it achieves 97.49% accuracy, 90.06% precision, and 98.56% selectivity. Thus, the proposed method is a most effective method for predicting the multi-type skin disease.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"402 - 417"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X24600460
Yongheng Chen, Chunyan Yin
In this paper, we propose a novel learner cognitive feature model for personalized guidance and push (LCFLM) that traces the evolution of learners’ knowledge proficiency based on their exercising logs in online learning systems. Specifically, we introduce the exercise-aware dependency hierarchical graph of exercise dependency and pattern dependency that can establish a model of exercise dependency relationships. Additionally, we propose the implementation of a forget gating mechanism, which combines the forgetting features with the knowledge state features to predict a student’s learning performance. The experimental results clearly demonstrate that LCFLM achieves the new state-of-the-art performance, exhibiting an improvement of at least 3% in both AUC and ACC. Furthermore, the LCFLM model has the ability to autonomously uncover the fundamental concepts underlying exercises and provides a visual representation of a student’s evolving knowledge state.
{"title":"Learner Cognitive Feature Model for Learning Resource Personalizing Recommendation","authors":"Yongheng Chen, Chunyan Yin","doi":"10.3103/S1060992X24600460","DOIUrl":"10.3103/S1060992X24600460","url":null,"abstract":"<p>In this paper, we propose a novel learner cognitive feature model for personalized guidance and push (LCFLM) that traces the evolution of learners’ knowledge proficiency based on their exercising logs in online learning systems. Specifically, we introduce the exercise-aware dependency hierarchical graph of exercise dependency and pattern dependency that can establish a model of exercise dependency relationships. Additionally, we propose the implementation of a forget gating mechanism, which combines the forgetting features with the knowledge state features to predict a student’s learning performance. The experimental results clearly demonstrate that LCFLM achieves the new state-of-the-art performance, exhibiting an improvement of at least 3% in both AUC and ACC. Furthermore, the LCFLM model has the ability to autonomously uncover the fundamental concepts underlying exercises and provides a visual representation of a student’s evolving knowledge state.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"457 - 469"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X25700110
V. Devi, P. Amudha
Blockchain is a secure, decentralized ledger system that records transactions in immutable blocks. A smart contract is a self-executing piece of code on the blockchain that automatically enforces agreements when specific conditions are met. Additionally, once deployed, smart contracts are immutable, making it difficult to fix bugs or vulnerabilities without affecting the entire blockchain. Using machine learning and deep learning techniques, vulnerabilities in smart contract code have been effectively identified. The trained net model is tampered with since the algorithms' learning is not safe. Therefore, a fully homomorphic deep learning algorithm has been developed to detect vulnerabilities in smart contract systems for blockchain in order to safeguard user data. Initially, user data is stored on the blockchain based on a consensus algorithm that evaluates the operations of each node using a reputation model. Reputation-based Byzantine Fault Tolerance (RBFT) enhances security by assessing users' reputations to prevent malicious behaviour and ensure fault tolerance. Reputation values, ranging from 0 to 1, are crucial for establishing trust and reliability in the network. To further optimize RBFT performance, the Secretary Bird Optimization Algorithm is employed. Smart contract data is derived from source code, including the control flow graph and operation code. XLNet and Bi-LSTM are used to extract features from the control flow graph and operation code, which are then trained and tested using ElGamal cryptography with a Deep Belief Network to improve vulnerability detection and enhance security in blockchain-based smart contract systems. The proposed approach provides 98.40% accuracy, 95.40% positive predictive value (PPV), and 98.80% selectivity. This proposed approach enhances blockchain-based smart contract systems by improving vulnerability detection and ensuring robust encryption of sensitive data through advanced reputation models and cryptographic techniques.
{"title":"Reputation-Based Byzantine Fault Tolerance and ElGamal Cryptography with Deep Belief Network on Smart Contract for Secure Blockchain","authors":"V. Devi, P. Amudha","doi":"10.3103/S1060992X25700110","DOIUrl":"10.3103/S1060992X25700110","url":null,"abstract":"<p>Blockchain is a secure, decentralized ledger system that records transactions in immutable blocks. A smart contract is a self-executing piece of code on the blockchain that automatically enforces agreements when specific conditions are met. Additionally, once deployed, smart contracts are immutable, making it difficult to fix bugs or vulnerabilities without affecting the entire blockchain. Using machine learning and deep learning techniques, vulnerabilities in smart contract code have been effectively identified. The trained net model is tampered with since the algorithms' learning is not safe. Therefore, a fully homomorphic deep learning algorithm has been developed to detect vulnerabilities in smart contract systems for blockchain in order to safeguard user data. Initially, user data is stored on the blockchain based on a consensus algorithm that evaluates the operations of each node using a reputation model. Reputation-based Byzantine Fault Tolerance (RBFT) enhances security by assessing users' reputations to prevent malicious behaviour and ensure fault tolerance. Reputation values, ranging from 0 to 1, are crucial for establishing trust and reliability in the network. To further optimize RBFT performance, the Secretary Bird Optimization Algorithm is employed. Smart contract data is derived from source code, including the control flow graph and operation code. XLNet and Bi-LSTM are used to extract features from the control flow graph and operation code, which are then trained and tested using ElGamal cryptography with a Deep Belief Network to improve vulnerability detection and enhance security in blockchain-based smart contract systems. The proposed approach provides 98.40% accuracy, 95.40% positive predictive value (PPV), and 98.80% selectivity. This proposed approach enhances blockchain-based smart contract systems by improving vulnerability detection and ensuring robust encryption of sensitive data through advanced reputation models and cryptographic techniques.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"371 - 388"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X25700158
S. D. Poletayev, D. A. Savelyev, G. V. Uspleniev
The effect of the formation of microcylinders during laser treatment (λ = 532 nm) of the surface of a chromium–zirconium dioxide bilayer in a pulsed low-frequency mode is studied. An unusual formation of microcylinders was observed, which was explained by the effect of micro wrinkles. It was found that in this way it is possible to form a quasi-periodic matrix of microcylinders, the size of which is on the order of the diffraction limit, which is approximately 6 times smaller than the effective diameter of the laser spot. It is shown that the matrix of elements can have a fill factor of about 0.5, with a period up to 2.5 times smaller than the diameter of the laser spot. Numerical simulation of diffraction of Gaussian beams and optical vortices with circular polarization on arrays of subwavelength microcylinders has shown that a decrease in the diameter of microcylinders leads to a decrease in the size of the focal spot and light needle for a Gaussian beam, and an increase in height leads to the formation of the main intensity peaks inside the element for both the Gaussian beam and the Laguerre-Gauss mode. Based on the simulation results, a focusing meander matrix of microcylinders with a minimum element size of about 350 nm and a height of 0.21 λ was made.
{"title":"Investigation of the Effect of the Formation of Subwavelength Microcylinders in the Process of Pulsed Laser Action on the Cr/ZrO2 Bilayer","authors":"S. D. Poletayev, D. A. Savelyev, G. V. Uspleniev","doi":"10.3103/S1060992X25700158","DOIUrl":"10.3103/S1060992X25700158","url":null,"abstract":"<p>The effect of the formation of microcylinders during laser treatment (λ = 532 nm) of the surface of a chromium–zirconium dioxide bilayer in a pulsed low-frequency mode is studied. An unusual formation of microcylinders was observed, which was explained by the effect of micro wrinkles. It was found that in this way it is possible to form a quasi-periodic matrix of microcylinders, the size of which is on the order of the diffraction limit, which is approximately 6 times smaller than the effective diameter of the laser spot. It is shown that the matrix of elements can have a fill factor of about 0.5, with a period up to 2.5 times smaller than the diameter of the laser spot. Numerical simulation of diffraction of Gaussian beams and optical vortices with circular polarization on arrays of subwavelength microcylinders has shown that a decrease in the diameter of microcylinders leads to a decrease in the size of the focal spot and light needle for a Gaussian beam, and an increase in height leads to the formation of the main intensity peaks inside the element for both the Gaussian beam and the Laguerre-Gauss mode. Based on the simulation results, a focusing meander matrix of microcylinders with a minimum element size of about 350 nm and a height of 0.21 λ was made.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"418 - 427"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X24601775
Manya Gidwani, Ashwini Rao
Social media rumours significantly challenge societal discourse, demanding effective detection mechanisms. Existing automated rumour detection methods primarily rely on topological data, yet computational complexity and managing large datasets remain formidable obstacles. This study proposes a novel neural network approach utilising graphical structures to address these challenges and enhance rumour detection efficiency. This study suggests a novel neural network approach to improve rumour detection efficiency using graphical structures from the PHEME dataset. The strategy aims to improve classifier performance by transforming tweeting graphs into distinct binary trees, enabling the learning of structural information’s propagation and dispersion. This makes it possible to build meta-tree paths that record and capture local structural information. The model learns global structural representations using BERT on these pathways. The approach also incorporates user relationships and content associations utilizing a bidirectional graph convolutional network encoder to learn node-level representations. The final node-level representation is synthesised by combining user and content embeddings. A fusion approach combines the structural and node-level representations, passing through a fully connected layer and a Softmax layer for rumour detection. This proposed model outperforms the existing models, with an accuracy of over 93% without cross-validation and more than 95% with cross-validation. Experimental validation demonstrates the effectiveness of the suggested approach in rumour detection over social media, offering a promising solution to mitigate the impact of misinformation and rumours in online discourse.
{"title":"Efficient Neural Network Method for Rumour Detection over Social Media","authors":"Manya Gidwani, Ashwini Rao","doi":"10.3103/S1060992X24601775","DOIUrl":"10.3103/S1060992X24601775","url":null,"abstract":"<p>Social media rumours significantly challenge societal discourse, demanding effective detection mechanisms. Existing automated rumour detection methods primarily rely on topological data, yet computational complexity and managing large datasets remain formidable obstacles. This study proposes a novel neural network approach utilising graphical structures to address these challenges and enhance rumour detection efficiency. This study suggests a novel neural network approach to improve rumour detection efficiency using graphical structures from the PHEME dataset. The strategy aims to improve classifier performance by transforming tweeting graphs into distinct binary trees, enabling the learning of structural information’s propagation and dispersion. This makes it possible to build meta-tree paths that record and capture local structural information. The model learns global structural representations using BERT on these pathways. The approach also incorporates user relationships and content associations utilizing a bidirectional graph convolutional network encoder to learn node-level representations. The final node-level representation is synthesised by combining user and content embeddings. A fusion approach combines the structural and node-level representations, passing through a fully connected layer and a Softmax layer for rumour detection. This proposed model outperforms the existing models, with an accuracy of over 93% without cross-validation and more than 95% with cross-validation. Experimental validation demonstrates the effectiveness of the suggested approach in rumour detection over social media, offering a promising solution to mitigate the impact of misinformation and rumours in online discourse.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"428 - 440"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X24601957
G. Algashev, V. Kuzina, A. Kupriyanov
This paper addresses the problem of object keypoint detection from a single image using modern machine learning methods. Keypoint detection has been extensively studied for human pose estimation, and thus, the study compares deep convolutional neural networks that effectively solve this task. Given the challenge of adapting methods to different object types, special attention is paid to automating the preparation of training data. A novel approach is presented, which includes generating datasets based on 3D models, automatically annotating keypoints, and capturing images of objects from various angles, scales, backgrounds, and lighting conditions. The study investigates which modern deep neural networks are the most effective for keypoint detection and explores the applicability of models trained on synthetic data to real-world scenarios.
{"title":"Research on the Solution of the Problem of Detecting Key Points of an Object from a Single Image Using Deep Neural Networks","authors":"G. Algashev, V. Kuzina, A. Kupriyanov","doi":"10.3103/S1060992X24601957","DOIUrl":"10.3103/S1060992X24601957","url":null,"abstract":"<p>This paper addresses the problem of object keypoint detection from a single image using modern machine learning methods. Keypoint detection has been extensively studied for human pose estimation, and thus, the study compares deep convolutional neural networks that effectively solve this task. Given the challenge of adapting methods to different object types, special attention is paid to automating the preparation of training data. A novel approach is presented, which includes generating datasets based on 3D models, automatically annotating keypoints, and capturing images of objects from various angles, scales, backgrounds, and lighting conditions. The study investigates which modern deep neural networks are the most effective for keypoint detection and explores the applicability of models trained on synthetic data to real-world scenarios.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"364 - 370"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X25700171
M. V. Gashnikov
The paper investigates an iterative approximation-based method of compressing discrete multispectral images. Less thinned multispectral images are used to approximate more thinned ones, the degree of thinning decreasing in an iterrative fashion. When a set of thinned multispectral images is used, the data redundency is eliminated by using nonredundant nested covers consisted of specially reduced thinned images. Approximation errors are rounded and stored. The paper considers an algorithm of detection and effective representation of degenerate subsets of rounded iterative approximation errors. The algotithm allows more efficient representation of rounded error subsets and higher data compression ratios. The computational experiment confirms a considerable increase in efficiency of the iterative approximation-based method of discrete multispectral data compression.
{"title":"Сonfluent Regions Packing of Coarsened Errors for Iterative Approximation of Multispectral Images","authors":"M. V. Gashnikov","doi":"10.3103/S1060992X25700171","DOIUrl":"10.3103/S1060992X25700171","url":null,"abstract":"<p>The paper investigates an iterative approximation-based method of compressing discrete multispectral images. Less thinned multispectral images are used to approximate more thinned ones, the degree of thinning decreasing in an iterrative fashion. When a set of thinned multispectral images is used, the data redundency is eliminated by using nonredundant nested covers consisted of specially reduced thinned images. Approximation errors are rounded and stored. The paper considers an algorithm of detection and effective representation of degenerate subsets of rounded iterative approximation errors. The algotithm allows more efficient representation of rounded error subsets and higher data compression ratios. The computational experiment confirms a considerable increase in efficiency of the iterative approximation-based method of discrete multispectral data compression.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"358 - 363"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}