Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601770
A. V. Demidovskij, A. I. Trutnev
Large Language Models (LLMs) are widely employed across a broad range of applications due to their versatility and state-of-the-art performance. However, as usage scenarios grow, there is a pressing demand for task-specific adaptation of LLMs through fine-tuning. While full fine-tuning (FT) remains the most preferred in terms of quality, its high memory and computation requirements limit its practical use, especially for LLMs. Parameter-efficient fine-tuning (PEFT) techniques, such as LoRA, mitigate this issue by updating a small subset of model parameters. However, it requires an extensive number of resources due to backpropagation. In contrast, zeroth-order (ZO) optimization methods, which approximate gradients using only forward passes, offer an attractive alternative for memory-constrained environments by eliminating the need for backpropagation, thus reducing memory overhead to inference-level footprints. Over the 2024–2025 year, several ZO techniques have been proposed, aiming to balance efficiency and performance. This paper introduces the comparative analysis of 12 zeroth-order optimization methods applied for the LLM fine-tuning task by memory utilization, quality, fine-tuning time, and convergence. According to the results, the best method in terms of memory reduction is ZO-SGD-Sign: 42.82% memory reduction; the best quality and fine-tuning time across zeroth-order methods compared to SGD is achieved with LoHO: 0.6% quality drop and 11.73% fine-tuning time increase, while no ZO method currently matches the Adam and AdamW convergence efficiency.
{"title":"Performance Study of Modern Zeroth-Order Optimization Methods for LLM Fine-Tuning","authors":"A. V. Demidovskij, A. I. Trutnev","doi":"10.3103/S1060992X25601770","DOIUrl":"10.3103/S1060992X25601770","url":null,"abstract":"<p>Large Language Models (LLMs) are widely employed across a broad range of applications due to their versatility and state-of-the-art performance. However, as usage scenarios grow, there is a pressing demand for task-specific adaptation of LLMs through fine-tuning. While full fine-tuning (FT) remains the most preferred in terms of quality, its high memory and computation requirements limit its practical use, especially for LLMs. Parameter-efficient fine-tuning (PEFT) techniques, such as LoRA, mitigate this issue by updating a small subset of model parameters. However, it requires an extensive number of resources due to backpropagation. In contrast, zeroth-order (ZO) optimization methods, which approximate gradients using only forward passes, offer an attractive alternative for memory-constrained environments by eliminating the need for backpropagation, thus reducing memory overhead to inference-level footprints. Over the 2024–2025 year, several ZO techniques have been proposed, aiming to balance efficiency and performance. This paper introduces the comparative analysis of 12 zeroth-order optimization methods applied for the LLM fine-tuning task by memory utilization, quality, fine-tuning time, and convergence. According to the results, the best method in terms of memory reduction is ZO-SGD-Sign: 42.82% memory reduction; the best quality and fine-tuning time across zeroth-order methods compared to SGD is achieved with LoHO: 0.6% quality drop and 11.73% fine-tuning time increase, while no ZO method currently matches the Adam and AdamW convergence efficiency.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S16 - S29"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601733
M. Kairov, A. Bulatov, Yu. Kuratov
A fundamental limitation of Transformer-based models is their quadratic computational complexity with respect to input length, which limits their applicability to long-context tasks. Recurrent Memory Transformer (RMT) addresses this by introducing a memory mechanism that enables segment-wise recurrent processing. However, RMT relies on a multi-stage training curriculum that increases computational costs and complexity during fine-tuning. In this work, we propose the Recurrent Memory Transformer with a Memory Stream (RMT-MS), a novel architecture with layer-wise memory states and horizontal memory connections across segments. These mechanisms increase memory capacity and improve information flow, reducing the need for curriculum learning. We evaluate RMT-MS alongside RMT and ARMT on three long-context tasks: associative retrieval, BABILong QA1, and QA3. Our experiments show that RMT-MS achieves strong performance in single-stage training, matching curriculum-trained baselines on simpler tasks, and narrowing the gap on more complex ones. These results highlight the potential of RMT-MS for efficient long-context modeling without costly training schedules.
{"title":"Memory Stream: Enhancing Information Flow in Recurrent Memory Transformers for Efficient Long-Context Training","authors":"M. Kairov, A. Bulatov, Yu. Kuratov","doi":"10.3103/S1060992X25601733","DOIUrl":"10.3103/S1060992X25601733","url":null,"abstract":"<p>A fundamental limitation of Transformer-based models is their quadratic computational complexity with respect to input length, which limits their applicability to long-context tasks. Recurrent Memory Transformer (RMT) addresses this by introducing a memory mechanism that enables segment-wise recurrent processing. However, RMT relies on a multi-stage training curriculum that increases computational costs and complexity during fine-tuning. In this work, we propose the Recurrent Memory Transformer with a Memory Stream (RMT-MS), a novel architecture with layer-wise memory states and horizontal memory connections across segments. These mechanisms increase memory capacity and improve information flow, reducing the need for curriculum learning. We evaluate RMT-MS alongside RMT and ARMT on three long-context tasks: associative retrieval, BABILong QA1, and QA3. Our experiments show that RMT-MS achieves strong performance in single-stage training, matching curriculum-trained baselines on simpler tasks, and narrowing the gap on more complex ones. These results highlight the potential of RMT-MS for efficient long-context modeling without costly training schedules.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S158 - S165"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601691
M. Dashiev, N. Zheludkov, I. Karandashev
Accurate critical path delay estimation plays a vital role in reducing unnecessary routing iterations and identifying potentially unsuccessful design runs early in the flow. This study proposes an architecture that integrates graph representations derived from digital complex functional blocks netlist and design constraints, leveraging a Multi-head cross-attention mechanism. This architecture significantly improves the accuracy of critical path delay estimation compared to standard tools provided by the OpenROAD EDA. The mean absolute percentage error (MAPE) of the OpenRoad standard tool—openSTA is 12.60%, whereas our algorithm achieves a substantially lower error of 7.57%. A comparison of various architectures was conducted, along with an investigation into the impact of incorporating netlist-derived information.
{"title":"Leveraging Graph Representations to Enhance Critical Path Delay Prediction in Digital Complex Functional Blocks Using Neural Networks","authors":"M. Dashiev, N. Zheludkov, I. Karandashev","doi":"10.3103/S1060992X25601691","DOIUrl":"10.3103/S1060992X25601691","url":null,"abstract":"<p>Accurate critical path delay estimation plays a vital role in reducing unnecessary routing iterations and identifying potentially unsuccessful design runs early in the flow. This study proposes an architecture that integrates graph representations derived from digital complex functional blocks netlist and design constraints, leveraging a Multi-head cross-attention mechanism. This architecture significantly improves the accuracy of critical path delay estimation compared to standard tools provided by the OpenROAD EDA. The mean absolute percentage error (MAPE) of the OpenRoad standard tool—openSTA is 12.60%, whereas our algorithm achieves a substantially lower error of 7.57%. A comparison of various architectures was conducted, along with an investigation into the impact of incorporating netlist-derived information.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S135 - S147"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25700195
H. Shen, V. S. Smolin
The problem of approximating nonlinear vector transformations using neural network algorithms is considered. In addition to approximation, one of the reasons for algorithms reaching local minima rather than global minima of the loss function during optimization is identified: the “switching off” or “death” of a significant number of neurons during training. A multidimensional neural mapping algorithm is proposed, programmatically implemented, and numerically investigated to drastically reduce the influence of this factor on approximation accuracy. The theory and results of numerical experiments on approximation using neural mapping are presented.
{"title":"Deep Mapping Algorithm for More Effective Neural Network Training","authors":"H. Shen, V. S. Smolin","doi":"10.3103/S1060992X25700195","DOIUrl":"10.3103/S1060992X25700195","url":null,"abstract":"<p>The problem of approximating nonlinear vector transformations using neural network algorithms is considered. In addition to approximation, one of the reasons for algorithms reaching local minima rather than global minima of the loss function during optimization is identified: the “switching off” or “death” of a significant number of neurons during training. A multidimensional neural mapping algorithm is proposed, programmatically implemented, and numerically investigated to drastically reduce the influence of this factor on approximation accuracy. The theory and results of numerical experiments on approximation using neural mapping are presented.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S83 - S93"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601708
O. Matykina, D. Yudin
Three-dimensional object detection is essential for autonomous driving and robotics, relying on effective fusion of multimodal data from cameras and radar. This work proposes RCDINO, a multimodal transformer-based model that enhances visual backbone features by fusing them with semantically rich representations from the pretrained DINOv2 foundation model. This approach enriches visual representations and improves the model’s detection performance while preserving compatibility with the baseline architecture. Experiments on the nuScenes dataset demonstrate that RCDINO achieves state-of-the-art performance among radar–camera models, with 56.4 NDS and 48.1 mAP. Our implementation is available at https://github.com/OlgaMatykina/RCDINO.
{"title":"RCDINO: Enhancing Radar–Camera 3D Object Detection with DINOv2 Semantic Features","authors":"O. Matykina, D. Yudin","doi":"10.3103/S1060992X25601708","DOIUrl":"10.3103/S1060992X25601708","url":null,"abstract":"<p>Three-dimensional object detection is essential for autonomous driving and robotics, relying on effective fusion of multimodal data from cameras and radar. This work proposes RCDINO, a multimodal transformer-based model that enhances visual backbone features by fusing them with semantically rich representations from the pretrained DINOv2 foundation model. This approach enriches visual representations and improves the model’s detection performance while preserving compatibility with the baseline architecture. Experiments on the nuScenes dataset demonstrate that RCDINO achieves state-of-the-art performance among radar–camera models, with 56.4 NDS and 48.1 mAP. Our implementation is available at https://github.com/OlgaMatykina/RCDINO.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S47 - S57"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601812
V. Kniaz, V. Knyaz, T. Skrypitsyna, P. Moshkantsev, A. Bordodymov
The rapid reconstruction of partially destroyed cultural heritage objects is crucial in architectural history. Many significant structures have suffered damage from erosion, earthquakes, or human activity, often leaving only the armature intact. Simplified 3D reconstruction techniques using digital cameras and laser rangefinders are essential for these monuments, frequently located in abandoned areas. However, interior surfaces visible through exterior openings complicate reconstruction by introducing outliers in the 3D point cloud. This paper introduces the WireNetV3 model for precise 3D segmentation of wire structures in color images. The model distinguishes between front and interior surfaces, filtering outliers during feature matching. Building on SegFormer 3D and WireNetV2, our approach integrates transformers with task-specific features and introduces a novel loss function, WireSDF, for distance calculation from wire axes. Evaluations on datasets featuring the Shukhov Tower and a church dome demonstrate that WireNetV3 surpasses existing methods in Intersection-over-Union metrics and 3D model accuracy.
{"title":"Wire-Structured Object 3D Point Cloud Filtering Using a Transformer Model","authors":"V. Kniaz, V. Knyaz, T. Skrypitsyna, P. Moshkantsev, A. Bordodymov","doi":"10.3103/S1060992X25601812","DOIUrl":"10.3103/S1060992X25601812","url":null,"abstract":"<p>The rapid reconstruction of partially destroyed cultural heritage objects is crucial in architectural history. Many significant structures have suffered damage from erosion, earthquakes, or human activity, often leaving only the armature intact. Simplified 3D reconstruction techniques using digital cameras and laser rangefinders are essential for these monuments, frequently located in abandoned areas. However, interior surfaces visible through exterior openings complicate reconstruction by introducing outliers in the 3D point cloud. This paper introduces the <span>WireNetV3</span> model for precise 3D segmentation of wire structures in color images. The model distinguishes between front and interior surfaces, filtering outliers during feature matching. Building on SegFormer 3D and <span>WireNetV2</span>, our approach integrates transformers with task-specific features and introduces a novel loss function, WireSDF, for distance calculation from wire axes. Evaluations on datasets featuring the Shukhov Tower and a church dome demonstrate that <span>WireNetV3</span> surpasses existing methods in Intersection-over-Union metrics and 3D model accuracy.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S175 - S184"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601666
V. G. Red’ko, M. S. Burtsev
In the present work, a model of the interaction between learning and evolution at the formation of functional systems is constructed and studied. The behavior of a population of learning agents is analyzed. The agent’s control system consists of a set of functional systems. Each functional system includes a set of elements. The presence or absence of an element in the considered functional system is encoded by binary symbols 1 or 0. Each agent has a genotype and phenotype, which are encoded by chains of binary symbols and represent the combined chains of functional systems. A functional system is completely formed when all its elements are present in it. The more is the number of completely formed functional systems that an agent has, the higher is the agent’s fitness. The evolution of a population of agents consists of generations. During each generation, the genotypes of agents do not change, and the phenotypes are optimized via learning, namely, via the formation of new functional systems. The phenotype of an agent at the beginning of a generation is equal to its genotype. At the end of the generation, the number of functional systems in the agent’s phenotype is determined; the larger is this number, the higher is the agent’s fitness. Agents are selected into a new generation with probabilities that are proportional to their fitness. The descendant agent receives the genotype of the parent agent (with small mutations). Thus, the selection of agents occurs in accordance with their phenotypes, which are optimized by learning, and the genotypes of agents are inherited. The model was studied by computer simulation; the effects of the interaction between learning and evolution in the processes of formation of functional systems were analyzed.
{"title":"Interaction between Learning and Evolution at the Formation of Functional Systems","authors":"V. G. Red’ko, M. S. Burtsev","doi":"10.3103/S1060992X25601666","DOIUrl":"10.3103/S1060992X25601666","url":null,"abstract":"<p>In the present work, a model of the interaction between learning and evolution at the formation of functional systems is constructed and studied. The behavior of a population of learning agents is analyzed. The agent’s control system consists of a set of functional systems. Each functional system includes a set of elements. The presence or absence of an element in the considered functional system is encoded by binary symbols 1 or 0. Each agent has a genotype and phenotype, which are encoded by chains of binary symbols and represent the combined chains of functional systems. A functional system is completely formed when all its elements are present in it. The more is the number of completely formed functional systems that an agent has, the higher is the agent’s fitness. The evolution of a population of agents consists of generations. During each generation, the genotypes of agents do not change, and the phenotypes are optimized via learning, namely, via the formation of new functional systems. The phenotype of an agent at the beginning of a generation is equal to its genotype. At the end of the generation, the number of functional systems in the agent’s phenotype is determined; the larger is this number, the higher is the agent’s fitness. Agents are selected into a new generation with probabilities that are proportional to their fitness. The descendant agent receives the genotype of the parent agent (with small mutations). Thus, the selection of agents occurs in accordance with their phenotypes, which are optimized by learning, and the genotypes of agents are inherited. The model was studied by computer simulation; the effects of the interaction between learning and evolution in the processes of formation of functional systems were analyzed.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S30 - S46"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601654
M. A. Patratskiy, A. K. Kovalev, A. I. Panov
Vision-Language-Action models have demonstrated remarkable capabilities in predicting agent movements within virtual environments and real-world scenarios based on visual observations and textual instructions. Although recent research has focused on enhancing spatial and temporal understanding independently, this paper presents a novel approach that integrates both aspects through visual prompting. We introduce a method that projects visual traces of key points from observations onto depth maps, enabling models to capture both spatial and temporal information simultaneously. The experiments in SimplerEnv show that the mean number of tasks successfully solved increased for 4% compared to SpatialVLA and 19% compared to TraceVLA. Furthermore, we show that this enhancement can be achieved with minimal training data, making it particularly valuable for real-world applications where data collection is challenging. The project page is available at https://ampiromax.github.io/ST-VLA.
{"title":"Spatial Traces: Enhancing VLA Models with Spatial-Temporal Understanding","authors":"M. A. Patratskiy, A. K. Kovalev, A. I. Panov","doi":"10.3103/S1060992X25601654","DOIUrl":"10.3103/S1060992X25601654","url":null,"abstract":"<p>Vision-Language-Action models have demonstrated remarkable capabilities in predicting agent movements within virtual environments and real-world scenarios based on visual observations and textual instructions. Although recent research has focused on enhancing spatial and temporal understanding independently, this paper presents a novel approach that integrates both aspects through visual prompting. We introduce a method that projects visual traces of key points from observations onto depth maps, enabling models to capture both spatial and temporal information simultaneously. The experiments in SimplerEnv show that the mean number of tasks successfully solved increased for 4% compared to SpatialVLA and 19% compared to TraceVLA. Furthermore, we show that this enhancement can be achieved with minimal training data, making it particularly valuable for real-world applications where data collection is challenging. The project page is available at https://ampiromax.github.io/ST-VLA.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S72 - S82"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X24601921
Prithwijit Mukherjee, Anisha Halder Roy
Intelligence quotient (IQ) serves as a statistical gauge for evaluating an individual’s cognitive prowess. Measuring IQ is a formidable undertaking, mainly due to the intricate intricacies of the human brain’s composition. Presently, the assessment of human intelligence relies solely on conventional paper-based psychometric tests. However, these approaches suffer from inherent discrepancies arising from the diversity of test formats and language barriers. The primary objective of this study is to introduce an innovative, deep learning-driven methodology for IQ measurement using Electroencephalogram (EEG) signals. In this investigation, EEG signals are captured from participants during an IQ assessment session. Subsequently, participants' IQ levels are categorized into six distinct tiers, encompassing extremely low IQ, borderline IQ, low average IQ, high average IQ, superior IQ, and very superior IQ, based on their test results. An attention mechanism-based Convolution Neural Network-modified tanh Long-Short-term-Memory (CNN-MTLSTM) model has been meticulously devised for adeptly classifying individuals into the aforementioned IQ categories by using EEG signals. A layer named 'input enhancement layer' is proposed and incorporated in CNN-MTLSTM for enhancing its prediction accuracy. Notably, a CNN is harnessed to automate the process of extracting important information from the extracted EEG features. A new model, i.e., MTLSTM, is proposed, which works as a classifier. The paper’s contributions encompass proposing the novel MTLSTM architecture and leveraging attention mechanism to enhance the classification accuracy of the CNN-MTLSTM model. The innovative CNN-MTLSTM model, incorporating an attention mechanism within the MTLSTM network, attains a remarkable average accuracy of 97.41% in assessing a person’s IQ level.
{"title":"Decoding EEG Data with Deep Learning for Intelligence Quotient Assessment","authors":"Prithwijit Mukherjee, Anisha Halder Roy","doi":"10.3103/S1060992X24601921","DOIUrl":"10.3103/S1060992X24601921","url":null,"abstract":"<p>Intelligence quotient (IQ) serves as a statistical gauge for evaluating an individual’s cognitive prowess. Measuring IQ is a formidable undertaking, mainly due to the intricate intricacies of the human brain’s composition. Presently, the assessment of human intelligence relies solely on conventional paper-based psychometric tests. However, these approaches suffer from inherent discrepancies arising from the diversity of test formats and language barriers. The primary objective of this study is to introduce an innovative, deep learning-driven methodology for IQ measurement using Electroencephalogram (EEG) signals. In this investigation, EEG signals are captured from participants during an IQ assessment session. Subsequently, participants' IQ levels are categorized into six distinct tiers, encompassing extremely low IQ, borderline IQ, low average IQ, high average IQ, superior IQ, and very superior IQ, based on their test results. An attention mechanism-based Convolution Neural Network-modified tanh Long-Short-term-Memory (CNN-MTLSTM) model has been meticulously devised for adeptly classifying individuals into the aforementioned IQ categories by using EEG signals. A layer named 'input enhancement layer' is proposed and incorporated in CNN-MTLSTM for enhancing its prediction accuracy. Notably, a CNN is harnessed to automate the process of extracting important information from the extracted EEG features. A new model, i.e., MTLSTM, is proposed, which works as a classifier. The paper’s contributions encompass proposing the novel MTLSTM architecture and leveraging attention mechanism to enhance the classification accuracy of the CNN-MTLSTM model. The innovative CNN-MTLSTM model, incorporating an attention mechanism within the MTLSTM network, attains a remarkable average accuracy of 97.41% in assessing a person’s IQ level.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"441 - 456"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.3103/S1060992X25600673
S. Linok, G. Naumov
We propose OVIGo-3DHSG method—Open-Vocabulary Indoor Grounding of objects using 3DHierarchical Scene Graph. OVIGo-3DHSG represents an extensive indoor environment over a Hierarchical Scene Graph derived from sequences of RGB-D frames utilizing a set of open-vocabulary foundation models and sensor data processing. The hierarchical representation explicitly models spatial relations across floors, rooms, locations, and objects. To effectively address complex queries involving spatial reference to other objects, we integrate the hierarchical scene graph with a Large Language Model for multistep reasoning. This integration leverages inter-layer (e.g., room-to-object) and intra-layer (e.g., object-to-object) connections, enhancing spatial contextual understanding. We investigate the semantic and geometry accuracy of hierarchical representation on Habitat Matterport 3D Semantic multi-floor scenes. Our approach demonstrates efficient scene comprehension and robust object grounding compared to existing methods. Overall OVIGo-3DHSG demonstrates strong potential for applications requiring spatial reasoning and understanding of indoor environments. Related materials can be found at https://github.com/linukc/OVIGo-3DHSG.
{"title":"Open-Vocabulary Indoor Object Grounding with 3D Hierarchical Scene Graph","authors":"S. Linok, G. Naumov","doi":"10.3103/S1060992X25600673","DOIUrl":"10.3103/S1060992X25600673","url":null,"abstract":"<p>We propose <b>OVIGo-3DHSG</b> method—<b>O</b>pen-<b>V</b>ocabulary <b>I</b>ndoor <b>G</b>rounding of <b>o</b>bjects using <b>3D</b> <b>H</b>ierarchical <b>S</b>cene <b>G</b>raph. OVIGo-3DHSG represents an extensive indoor environment over a Hierarchical Scene Graph derived from sequences of RGB-D frames utilizing a set of open-vocabulary foundation models and sensor data processing. The hierarchical representation explicitly models spatial relations across floors, rooms, locations, and objects. To effectively address complex queries involving spatial reference to other objects, we integrate the hierarchical scene graph with a Large Language Model for multistep reasoning. This integration leverages inter-layer (e.g., room-to-object) and intra-layer (e.g., object-to-object) connections, enhancing spatial contextual understanding. We investigate the semantic and geometry accuracy of hierarchical representation on Habitat Matterport 3D Semantic multi-floor scenes. Our approach demonstrates efficient scene comprehension and robust object grounding compared to existing methods. Overall OVIGo-3DHSG demonstrates strong potential for applications requiring spatial reasoning and understanding of indoor environments. Related materials can be found at https://github.com/linukc/OVIGo-3DHSG.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"323 - 333"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}