Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601691
M. Dashiev, N. Zheludkov, I. Karandashev
Accurate critical path delay estimation plays a vital role in reducing unnecessary routing iterations and identifying potentially unsuccessful design runs early in the flow. This study proposes an architecture that integrates graph representations derived from digital complex functional blocks netlist and design constraints, leveraging a Multi-head cross-attention mechanism. This architecture significantly improves the accuracy of critical path delay estimation compared to standard tools provided by the OpenROAD EDA. The mean absolute percentage error (MAPE) of the OpenRoad standard tool—openSTA is 12.60%, whereas our algorithm achieves a substantially lower error of 7.57%. A comparison of various architectures was conducted, along with an investigation into the impact of incorporating netlist-derived information.
{"title":"Leveraging Graph Representations to Enhance Critical Path Delay Prediction in Digital Complex Functional Blocks Using Neural Networks","authors":"M. Dashiev, N. Zheludkov, I. Karandashev","doi":"10.3103/S1060992X25601691","DOIUrl":"10.3103/S1060992X25601691","url":null,"abstract":"<p>Accurate critical path delay estimation plays a vital role in reducing unnecessary routing iterations and identifying potentially unsuccessful design runs early in the flow. This study proposes an architecture that integrates graph representations derived from digital complex functional blocks netlist and design constraints, leveraging a Multi-head cross-attention mechanism. This architecture significantly improves the accuracy of critical path delay estimation compared to standard tools provided by the OpenROAD EDA. The mean absolute percentage error (MAPE) of the OpenRoad standard tool—openSTA is 12.60%, whereas our algorithm achieves a substantially lower error of 7.57%. A comparison of various architectures was conducted, along with an investigation into the impact of incorporating netlist-derived information.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S135 - S147"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25700195
H. Shen, V. S. Smolin
The problem of approximating nonlinear vector transformations using neural network algorithms is considered. In addition to approximation, one of the reasons for algorithms reaching local minima rather than global minima of the loss function during optimization is identified: the “switching off” or “death” of a significant number of neurons during training. A multidimensional neural mapping algorithm is proposed, programmatically implemented, and numerically investigated to drastically reduce the influence of this factor on approximation accuracy. The theory and results of numerical experiments on approximation using neural mapping are presented.
{"title":"Deep Mapping Algorithm for More Effective Neural Network Training","authors":"H. Shen, V. S. Smolin","doi":"10.3103/S1060992X25700195","DOIUrl":"10.3103/S1060992X25700195","url":null,"abstract":"<p>The problem of approximating nonlinear vector transformations using neural network algorithms is considered. In addition to approximation, one of the reasons for algorithms reaching local minima rather than global minima of the loss function during optimization is identified: the “switching off” or “death” of a significant number of neurons during training. A multidimensional neural mapping algorithm is proposed, programmatically implemented, and numerically investigated to drastically reduce the influence of this factor on approximation accuracy. The theory and results of numerical experiments on approximation using neural mapping are presented.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S83 - S93"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601708
O. Matykina, D. Yudin
Three-dimensional object detection is essential for autonomous driving and robotics, relying on effective fusion of multimodal data from cameras and radar. This work proposes RCDINO, a multimodal transformer-based model that enhances visual backbone features by fusing them with semantically rich representations from the pretrained DINOv2 foundation model. This approach enriches visual representations and improves the model’s detection performance while preserving compatibility with the baseline architecture. Experiments on the nuScenes dataset demonstrate that RCDINO achieves state-of-the-art performance among radar–camera models, with 56.4 NDS and 48.1 mAP. Our implementation is available at https://github.com/OlgaMatykina/RCDINO.
{"title":"RCDINO: Enhancing Radar–Camera 3D Object Detection with DINOv2 Semantic Features","authors":"O. Matykina, D. Yudin","doi":"10.3103/S1060992X25601708","DOIUrl":"10.3103/S1060992X25601708","url":null,"abstract":"<p>Three-dimensional object detection is essential for autonomous driving and robotics, relying on effective fusion of multimodal data from cameras and radar. This work proposes RCDINO, a multimodal transformer-based model that enhances visual backbone features by fusing them with semantically rich representations from the pretrained DINOv2 foundation model. This approach enriches visual representations and improves the model’s detection performance while preserving compatibility with the baseline architecture. Experiments on the nuScenes dataset demonstrate that RCDINO achieves state-of-the-art performance among radar–camera models, with 56.4 NDS and 48.1 mAP. Our implementation is available at https://github.com/OlgaMatykina/RCDINO.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S47 - S57"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601812
V. Kniaz, V. Knyaz, T. Skrypitsyna, P. Moshkantsev, A. Bordodymov
The rapid reconstruction of partially destroyed cultural heritage objects is crucial in architectural history. Many significant structures have suffered damage from erosion, earthquakes, or human activity, often leaving only the armature intact. Simplified 3D reconstruction techniques using digital cameras and laser rangefinders are essential for these monuments, frequently located in abandoned areas. However, interior surfaces visible through exterior openings complicate reconstruction by introducing outliers in the 3D point cloud. This paper introduces the WireNetV3 model for precise 3D segmentation of wire structures in color images. The model distinguishes between front and interior surfaces, filtering outliers during feature matching. Building on SegFormer 3D and WireNetV2, our approach integrates transformers with task-specific features and introduces a novel loss function, WireSDF, for distance calculation from wire axes. Evaluations on datasets featuring the Shukhov Tower and a church dome demonstrate that WireNetV3 surpasses existing methods in Intersection-over-Union metrics and 3D model accuracy.
{"title":"Wire-Structured Object 3D Point Cloud Filtering Using a Transformer Model","authors":"V. Kniaz, V. Knyaz, T. Skrypitsyna, P. Moshkantsev, A. Bordodymov","doi":"10.3103/S1060992X25601812","DOIUrl":"10.3103/S1060992X25601812","url":null,"abstract":"<p>The rapid reconstruction of partially destroyed cultural heritage objects is crucial in architectural history. Many significant structures have suffered damage from erosion, earthquakes, or human activity, often leaving only the armature intact. Simplified 3D reconstruction techniques using digital cameras and laser rangefinders are essential for these monuments, frequently located in abandoned areas. However, interior surfaces visible through exterior openings complicate reconstruction by introducing outliers in the 3D point cloud. This paper introduces the <span>WireNetV3</span> model for precise 3D segmentation of wire structures in color images. The model distinguishes between front and interior surfaces, filtering outliers during feature matching. Building on SegFormer 3D and <span>WireNetV2</span>, our approach integrates transformers with task-specific features and introduces a novel loss function, WireSDF, for distance calculation from wire axes. Evaluations on datasets featuring the Shukhov Tower and a church dome demonstrate that <span>WireNetV3</span> surpasses existing methods in Intersection-over-Union metrics and 3D model accuracy.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S175 - S184"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601666
V. G. Red’ko, M. S. Burtsev
In the present work, a model of the interaction between learning and evolution at the formation of functional systems is constructed and studied. The behavior of a population of learning agents is analyzed. The agent’s control system consists of a set of functional systems. Each functional system includes a set of elements. The presence or absence of an element in the considered functional system is encoded by binary symbols 1 or 0. Each agent has a genotype and phenotype, which are encoded by chains of binary symbols and represent the combined chains of functional systems. A functional system is completely formed when all its elements are present in it. The more is the number of completely formed functional systems that an agent has, the higher is the agent’s fitness. The evolution of a population of agents consists of generations. During each generation, the genotypes of agents do not change, and the phenotypes are optimized via learning, namely, via the formation of new functional systems. The phenotype of an agent at the beginning of a generation is equal to its genotype. At the end of the generation, the number of functional systems in the agent’s phenotype is determined; the larger is this number, the higher is the agent’s fitness. Agents are selected into a new generation with probabilities that are proportional to their fitness. The descendant agent receives the genotype of the parent agent (with small mutations). Thus, the selection of agents occurs in accordance with their phenotypes, which are optimized by learning, and the genotypes of agents are inherited. The model was studied by computer simulation; the effects of the interaction between learning and evolution in the processes of formation of functional systems were analyzed.
{"title":"Interaction between Learning and Evolution at the Formation of Functional Systems","authors":"V. G. Red’ko, M. S. Burtsev","doi":"10.3103/S1060992X25601666","DOIUrl":"10.3103/S1060992X25601666","url":null,"abstract":"<p>In the present work, a model of the interaction between learning and evolution at the formation of functional systems is constructed and studied. The behavior of a population of learning agents is analyzed. The agent’s control system consists of a set of functional systems. Each functional system includes a set of elements. The presence or absence of an element in the considered functional system is encoded by binary symbols 1 or 0. Each agent has a genotype and phenotype, which are encoded by chains of binary symbols and represent the combined chains of functional systems. A functional system is completely formed when all its elements are present in it. The more is the number of completely formed functional systems that an agent has, the higher is the agent’s fitness. The evolution of a population of agents consists of generations. During each generation, the genotypes of agents do not change, and the phenotypes are optimized via learning, namely, via the formation of new functional systems. The phenotype of an agent at the beginning of a generation is equal to its genotype. At the end of the generation, the number of functional systems in the agent’s phenotype is determined; the larger is this number, the higher is the agent’s fitness. Agents are selected into a new generation with probabilities that are proportional to their fitness. The descendant agent receives the genotype of the parent agent (with small mutations). Thus, the selection of agents occurs in accordance with their phenotypes, which are optimized by learning, and the genotypes of agents are inherited. The model was studied by computer simulation; the effects of the interaction between learning and evolution in the processes of formation of functional systems were analyzed.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S30 - S46"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601654
M. A. Patratskiy, A. K. Kovalev, A. I. Panov
Vision-Language-Action models have demonstrated remarkable capabilities in predicting agent movements within virtual environments and real-world scenarios based on visual observations and textual instructions. Although recent research has focused on enhancing spatial and temporal understanding independently, this paper presents a novel approach that integrates both aspects through visual prompting. We introduce a method that projects visual traces of key points from observations onto depth maps, enabling models to capture both spatial and temporal information simultaneously. The experiments in SimplerEnv show that the mean number of tasks successfully solved increased for 4% compared to SpatialVLA and 19% compared to TraceVLA. Furthermore, we show that this enhancement can be achieved with minimal training data, making it particularly valuable for real-world applications where data collection is challenging. The project page is available at https://ampiromax.github.io/ST-VLA.
{"title":"Spatial Traces: Enhancing VLA Models with Spatial-Temporal Understanding","authors":"M. A. Patratskiy, A. K. Kovalev, A. I. Panov","doi":"10.3103/S1060992X25601654","DOIUrl":"10.3103/S1060992X25601654","url":null,"abstract":"<p>Vision-Language-Action models have demonstrated remarkable capabilities in predicting agent movements within virtual environments and real-world scenarios based on visual observations and textual instructions. Although recent research has focused on enhancing spatial and temporal understanding independently, this paper presents a novel approach that integrates both aspects through visual prompting. We introduce a method that projects visual traces of key points from observations onto depth maps, enabling models to capture both spatial and temporal information simultaneously. The experiments in SimplerEnv show that the mean number of tasks successfully solved increased for 4% compared to SpatialVLA and 19% compared to TraceVLA. Furthermore, we show that this enhancement can be achieved with minimal training data, making it particularly valuable for real-world applications where data collection is challenging. The project page is available at https://ampiromax.github.io/ST-VLA.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S72 - S82"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X24601921
Prithwijit Mukherjee, Anisha Halder Roy
Intelligence quotient (IQ) serves as a statistical gauge for evaluating an individual’s cognitive prowess. Measuring IQ is a formidable undertaking, mainly due to the intricate intricacies of the human brain’s composition. Presently, the assessment of human intelligence relies solely on conventional paper-based psychometric tests. However, these approaches suffer from inherent discrepancies arising from the diversity of test formats and language barriers. The primary objective of this study is to introduce an innovative, deep learning-driven methodology for IQ measurement using Electroencephalogram (EEG) signals. In this investigation, EEG signals are captured from participants during an IQ assessment session. Subsequently, participants' IQ levels are categorized into six distinct tiers, encompassing extremely low IQ, borderline IQ, low average IQ, high average IQ, superior IQ, and very superior IQ, based on their test results. An attention mechanism-based Convolution Neural Network-modified tanh Long-Short-term-Memory (CNN-MTLSTM) model has been meticulously devised for adeptly classifying individuals into the aforementioned IQ categories by using EEG signals. A layer named 'input enhancement layer' is proposed and incorporated in CNN-MTLSTM for enhancing its prediction accuracy. Notably, a CNN is harnessed to automate the process of extracting important information from the extracted EEG features. A new model, i.e., MTLSTM, is proposed, which works as a classifier. The paper’s contributions encompass proposing the novel MTLSTM architecture and leveraging attention mechanism to enhance the classification accuracy of the CNN-MTLSTM model. The innovative CNN-MTLSTM model, incorporating an attention mechanism within the MTLSTM network, attains a remarkable average accuracy of 97.41% in assessing a person’s IQ level.
{"title":"Decoding EEG Data with Deep Learning for Intelligence Quotient Assessment","authors":"Prithwijit Mukherjee, Anisha Halder Roy","doi":"10.3103/S1060992X24601921","DOIUrl":"10.3103/S1060992X24601921","url":null,"abstract":"<p>Intelligence quotient (IQ) serves as a statistical gauge for evaluating an individual’s cognitive prowess. Measuring IQ is a formidable undertaking, mainly due to the intricate intricacies of the human brain’s composition. Presently, the assessment of human intelligence relies solely on conventional paper-based psychometric tests. However, these approaches suffer from inherent discrepancies arising from the diversity of test formats and language barriers. The primary objective of this study is to introduce an innovative, deep learning-driven methodology for IQ measurement using Electroencephalogram (EEG) signals. In this investigation, EEG signals are captured from participants during an IQ assessment session. Subsequently, participants' IQ levels are categorized into six distinct tiers, encompassing extremely low IQ, borderline IQ, low average IQ, high average IQ, superior IQ, and very superior IQ, based on their test results. An attention mechanism-based Convolution Neural Network-modified tanh Long-Short-term-Memory (CNN-MTLSTM) model has been meticulously devised for adeptly classifying individuals into the aforementioned IQ categories by using EEG signals. A layer named 'input enhancement layer' is proposed and incorporated in CNN-MTLSTM for enhancing its prediction accuracy. Notably, a CNN is harnessed to automate the process of extracting important information from the extracted EEG features. A new model, i.e., MTLSTM, is proposed, which works as a classifier. The paper’s contributions encompass proposing the novel MTLSTM architecture and leveraging attention mechanism to enhance the classification accuracy of the CNN-MTLSTM model. The innovative CNN-MTLSTM model, incorporating an attention mechanism within the MTLSTM network, attains a remarkable average accuracy of 97.41% in assessing a person’s IQ level.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"441 - 456"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X25600673
S. Linok, G. Naumov
We propose OVIGo-3DHSG method—Open-Vocabulary Indoor Grounding of objects using 3DHierarchical Scene Graph. OVIGo-3DHSG represents an extensive indoor environment over a Hierarchical Scene Graph derived from sequences of RGB-D frames utilizing a set of open-vocabulary foundation models and sensor data processing. The hierarchical representation explicitly models spatial relations across floors, rooms, locations, and objects. To effectively address complex queries involving spatial reference to other objects, we integrate the hierarchical scene graph with a Large Language Model for multistep reasoning. This integration leverages inter-layer (e.g., room-to-object) and intra-layer (e.g., object-to-object) connections, enhancing spatial contextual understanding. We investigate the semantic and geometry accuracy of hierarchical representation on Habitat Matterport 3D Semantic multi-floor scenes. Our approach demonstrates efficient scene comprehension and robust object grounding compared to existing methods. Overall OVIGo-3DHSG demonstrates strong potential for applications requiring spatial reasoning and understanding of indoor environments. Related materials can be found at https://github.com/linukc/OVIGo-3DHSG.
{"title":"Open-Vocabulary Indoor Object Grounding with 3D Hierarchical Scene Graph","authors":"S. Linok, G. Naumov","doi":"10.3103/S1060992X25600673","DOIUrl":"10.3103/S1060992X25600673","url":null,"abstract":"<p>We propose <b>OVIGo-3DHSG</b> method—<b>O</b>pen-<b>V</b>ocabulary <b>I</b>ndoor <b>G</b>rounding of <b>o</b>bjects using <b>3D</b> <b>H</b>ierarchical <b>S</b>cene <b>G</b>raph. OVIGo-3DHSG represents an extensive indoor environment over a Hierarchical Scene Graph derived from sequences of RGB-D frames utilizing a set of open-vocabulary foundation models and sensor data processing. The hierarchical representation explicitly models spatial relations across floors, rooms, locations, and objects. To effectively address complex queries involving spatial reference to other objects, we integrate the hierarchical scene graph with a Large Language Model for multistep reasoning. This integration leverages inter-layer (e.g., room-to-object) and intra-layer (e.g., object-to-object) connections, enhancing spatial contextual understanding. We investigate the semantic and geometry accuracy of hierarchical representation on Habitat Matterport 3D Semantic multi-floor scenes. Our approach demonstrates efficient scene comprehension and robust object grounding compared to existing methods. Overall OVIGo-3DHSG demonstrates strong potential for applications requiring spatial reasoning and understanding of indoor environments. Related materials can be found at https://github.com/linukc/OVIGo-3DHSG.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"323 - 333"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X25700122
G. M. Kiran, A. Aparna Rajesh, D. Basavesha
Attainable on demand cloud computing makes it feasible to access a centralized shared pool of computing resources. Accurate estimation of cloud workload is necessary for optimal performance and effective use of cloud computing resources. Because cloud workloads are dynamic and unpredictable, this is a problematic problem. In this case, deep learning can provide reliable foundations for workload prediction in data centres when trained appropriately. In the proposed model, efficient workload prediction is executed out using novel deep learning. Efficient management of these hyperparameters may significantly improve the neural network model’s performance. Using the data centre’s workload traces at many consecutive time steps, the suggested approach is shown to be able to estimate Central Processing Unit (CPU) utilization. Collects raw data retrieved from the storage, including the number and type of requests, virtual machine (VMs) costs, and resource usage. Discover patterns and oscillations in the workload trace by preprocessing the data to increase the prediction efficacy of this model. During data pre-processing, the KCR approach, min max normalization, and data cleaning are used to select the important properties from raw data samples, eliminate noise, and normalize them. After that, a sliding window is used for deep learning processing to convert multivariate data into time series with supervised learning. Next, utilize a deep belief network based on green anaconda optimization (GrA-DBN) to attain precise workload forecasting. Comparing the suggested methodology with existing models, experimental results show that it provides a better trade-off between accuracy and training time. The suggested method provides higher performance, with an execution time of 28.5 s and an accuracy rate of 93.60%. According to the simulation results, the GrA-DBN workload prediction method performs better than other algorithms.
{"title":"AS-ODB: Multivariate Attention Supervised Learning Based Optimized DBN Approach for Cloud Workload Prediction","authors":"G. M. Kiran, A. Aparna Rajesh, D. Basavesha","doi":"10.3103/S1060992X25700122","DOIUrl":"10.3103/S1060992X25700122","url":null,"abstract":"<p>Attainable on demand cloud computing makes it feasible to access a centralized shared pool of computing resources. Accurate estimation of cloud workload is necessary for optimal performance and effective use of cloud computing resources. Because cloud workloads are dynamic and unpredictable, this is a problematic problem. In this case, deep learning can provide reliable foundations for workload prediction in data centres when trained appropriately. In the proposed model, efficient workload prediction is executed out using novel deep learning. Efficient management of these hyperparameters may significantly improve the neural network model’s performance. Using the data centre’s workload traces at many consecutive time steps, the suggested approach is shown to be able to estimate Central Processing Unit (CPU) utilization. Collects raw data retrieved from the storage, including the number and type of requests, virtual machine (VMs) costs, and resource usage. Discover patterns and oscillations in the workload trace by preprocessing the data to increase the prediction efficacy of this model. During data pre-processing, the KCR approach, min max normalization, and data cleaning are used to select the important properties from raw data samples, eliminate noise, and normalize them. After that, a sliding window is used for deep learning processing to convert multivariate data into time series with supervised learning. Next, utilize a deep belief network based on green anaconda optimization (GrA-DBN) to attain precise workload forecasting. Comparing the suggested methodology with existing models, experimental results show that it provides a better trade-off between accuracy and training time. The suggested method provides higher performance, with an execution time of 28.5 s and an accuracy rate of 93.60%. According to the simulation results, the GrA-DBN workload prediction method performs better than other algorithms.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"389 - 401"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X25700092
D. A. Yudin
3D mapping in dynamic environments poses a challenge for modern researchers in robotics and autonomous transportation. There are no universal representations for dynamic 3D scenes that incorporate multimodal data such as images, point clouds, and text. This article takes a step toward solving this problem. It proposes a taxonomy of methods for constructing multimodal 3D maps, classifying contemporary approaches based on scene types and representations, learning methods, and practical applications. Using this taxonomy, a brief structured analysis of recent methods is provided. The article also describes an original modular method called M3DMap, designed for object-aware construction of multimodal 3D maps for both static and dynamic scenes. It consists of several interconnected components: a neural multimodal object segmentation and tracking module; an odometry estimation module, including trainable algorithms; a module for 3D map construction and updating with various implementations depending on the desired scene representation; and a multimodal data retrieval module. The article highlights original implementations of these modules and their advantages in solving various practical tasks, from 3D object grounding to mobile manipulation. Additionally, it presents theoretical propositions demonstrating the positive effect of using multimodal data and modern foundational models in 3D mapping methods. Details of the taxonomy and method implementation are available at https://yuddim.github.io/M3DMap.
{"title":"M3DMap: Object-Aware Multimodal 3D Mapping for Dynamic Environments","authors":"D. A. Yudin","doi":"10.3103/S1060992X25700092","DOIUrl":"10.3103/S1060992X25700092","url":null,"abstract":"<p>3D mapping in dynamic environments poses a challenge for modern researchers in robotics and autonomous transportation. There are no universal representations for dynamic 3D scenes that incorporate multimodal data such as images, point clouds, and text. This article takes a step toward solving this problem. It proposes a taxonomy of methods for constructing multimodal 3D maps, classifying contemporary approaches based on scene types and representations, learning methods, and practical applications. Using this taxonomy, a brief structured analysis of recent methods is provided. The article also describes an original modular method called M3DMap, designed for object-aware construction of multimodal 3D maps for both static and dynamic scenes. It consists of several interconnected components: a neural multimodal object segmentation and tracking module; an odometry estimation module, including trainable algorithms; a module for 3D map construction and updating with various implementations depending on the desired scene representation; and a multimodal data retrieval module. The article highlights original implementations of these modules and their advantages in solving various practical tasks, from 3D object grounding to mobile manipulation. Additionally, it presents theoretical propositions demonstrating the positive effect of using multimodal data and modern foundational models in 3D mapping methods. Details of the taxonomy and method implementation are available at https://yuddim.github.io/M3DMap.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"285 - 312"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}