首页 > 最新文献

Optical Memory and Neural Networks最新文献

英文 中文
Performance Study of Modern Zeroth-Order Optimization Methods for LLM Fine-Tuning LLM微调的现代零阶优化方法性能研究
IF 0.8 Q4 OPTICS Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601770
A. V. Demidovskij, A. I. Trutnev

Large Language Models (LLMs) are widely employed across a broad range of applications due to their versatility and state-of-the-art performance. However, as usage scenarios grow, there is a pressing demand for task-specific adaptation of LLMs through fine-tuning. While full fine-tuning (FT) remains the most preferred in terms of quality, its high memory and computation requirements limit its practical use, especially for LLMs. Parameter-efficient fine-tuning (PEFT) techniques, such as LoRA, mitigate this issue by updating a small subset of model parameters. However, it requires an extensive number of resources due to backpropagation. In contrast, zeroth-order (ZO) optimization methods, which approximate gradients using only forward passes, offer an attractive alternative for memory-constrained environments by eliminating the need for backpropagation, thus reducing memory overhead to inference-level footprints. Over the 2024–2025 year, several ZO techniques have been proposed, aiming to balance efficiency and performance. This paper introduces the comparative analysis of 12 zeroth-order optimization methods applied for the LLM fine-tuning task by memory utilization, quality, fine-tuning time, and convergence. According to the results, the best method in terms of memory reduction is ZO-SGD-Sign: 42.82% memory reduction; the best quality and fine-tuning time across zeroth-order methods compared to SGD is achieved with LoHO: 0.6% quality drop and 11.73% fine-tuning time increase, while no ZO method currently matches the Adam and AdamW convergence efficiency.

大型语言模型(llm)由于其通用性和最先进的性能而广泛应用于广泛的应用程序。然而,随着使用场景的增长,迫切需要通过微调对llm进行特定于任务的调整。虽然完全微调(FT)在质量方面仍然是最受欢迎的,但其高内存和计算需求限制了其实际应用,特别是对于llm。参数高效微调(PEFT)技术,如LoRA,通过更新一小部分模型参数来缓解这个问题。然而,由于反向传播,它需要大量的资源。相比之下,零阶(ZO)优化方法仅使用前向传递近似梯度,通过消除反向传播的需要,从而减少对推断级占用的内存开销,为内存受限的环境提供了一个有吸引力的替代方案。在2024-2025年间,已经提出了几种ZO技术,旨在平衡效率和性能。本文介绍了用于LLM微调任务的12种零阶优化方法在内存利用率、质量、微调时间和收敛性方面的比较分析。结果表明,降低内存的最佳方法为ZO-SGD-Sign:降低42.82%的内存;与SGD方法相比,LoHO方法的质量和微调时间达到了零阶方法的最佳水平:质量下降0.6%,微调时间增加11.73%,而目前没有ZO方法的收敛效率与Adam和AdamW方法相匹配。
{"title":"Performance Study of Modern Zeroth-Order Optimization Methods for LLM Fine-Tuning","authors":"A. V. Demidovskij,&nbsp;A. I. Trutnev","doi":"10.3103/S1060992X25601770","DOIUrl":"10.3103/S1060992X25601770","url":null,"abstract":"<p>Large Language Models (LLMs) are widely employed across a broad range of applications due to their versatility and state-of-the-art performance. However, as usage scenarios grow, there is a pressing demand for task-specific adaptation of LLMs through fine-tuning. While full fine-tuning (FT) remains the most preferred in terms of quality, its high memory and computation requirements limit its practical use, especially for LLMs. Parameter-efficient fine-tuning (PEFT) techniques, such as LoRA, mitigate this issue by updating a small subset of model parameters. However, it requires an extensive number of resources due to backpropagation. In contrast, zeroth-order (ZO) optimization methods, which approximate gradients using only forward passes, offer an attractive alternative for memory-constrained environments by eliminating the need for backpropagation, thus reducing memory overhead to inference-level footprints. Over the 2024–2025 year, several ZO techniques have been proposed, aiming to balance efficiency and performance. This paper introduces the comparative analysis of 12 zeroth-order optimization methods applied for the LLM fine-tuning task by memory utilization, quality, fine-tuning time, and convergence. According to the results, the best method in terms of memory reduction is ZO-SGD-Sign: 42.82% memory reduction; the best quality and fine-tuning time across zeroth-order methods compared to SGD is achieved with LoHO: 0.6% quality drop and 11.73% fine-tuning time increase, while no ZO method currently matches the Adam and AdamW convergence efficiency.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S16 - S29"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory Stream: Enhancing Information Flow in Recurrent Memory Transformers for Efficient Long-Context Training 记忆流:在循环记忆转换器中增强信息流以实现高效的长语境训练
IF 0.8 Q4 OPTICS Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601733
M. Kairov, A. Bulatov, Yu. Kuratov

A fundamental limitation of Transformer-based models is their quadratic computational complexity with respect to input length, which limits their applicability to long-context tasks. Recurrent Memory Transformer (RMT) addresses this by introducing a memory mechanism that enables segment-wise recurrent processing. However, RMT relies on a multi-stage training curriculum that increases computational costs and complexity during fine-tuning. In this work, we propose the Recurrent Memory Transformer with a Memory Stream (RMT-MS), a novel architecture with layer-wise memory states and horizontal memory connections across segments. These mechanisms increase memory capacity and improve information flow, reducing the need for curriculum learning. We evaluate RMT-MS alongside RMT and ARMT on three long-context tasks: associative retrieval, BABILong QA1, and QA3. Our experiments show that RMT-MS achieves strong performance in single-stage training, matching curriculum-trained baselines on simpler tasks, and narrowing the gap on more complex ones. These results highlight the potential of RMT-MS for efficient long-context modeling without costly training schedules.

基于transformer的模型的一个基本限制是它们相对于输入长度的二次计算复杂性,这限制了它们对长上下文任务的适用性。递归内存转换器(RMT)通过引入一种支持分段递归处理的内存机制来解决这个问题。然而,RMT依赖于多阶段的培训课程,这增加了计算成本和微调过程中的复杂性。在这项工作中,我们提出了带有记忆流的循环记忆变压器(RMT-MS),这是一种具有分层记忆状态和跨段水平记忆连接的新架构。这些机制增加了记忆容量,改善了信息流,减少了对课程学习的需求。我们评估了RMT- ms与RMT和ARMT在三个长上下文任务上的作用:关联检索、BABILong QA1和QA3。我们的实验表明,RMT-MS在单阶段训练中取得了很强的表现,在更简单的任务上匹配课程训练的基线,并在更复杂的任务上缩小了差距。这些结果突出了RMT-MS在没有昂贵的训练计划的情况下有效的长上下文建模的潜力。
{"title":"Memory Stream: Enhancing Information Flow in Recurrent Memory Transformers for Efficient Long-Context Training","authors":"M. Kairov,&nbsp;A. Bulatov,&nbsp;Yu. Kuratov","doi":"10.3103/S1060992X25601733","DOIUrl":"10.3103/S1060992X25601733","url":null,"abstract":"<p>A fundamental limitation of Transformer-based models is their quadratic computational complexity with respect to input length, which limits their applicability to long-context tasks. Recurrent Memory Transformer (RMT) addresses this by introducing a memory mechanism that enables segment-wise recurrent processing. However, RMT relies on a multi-stage training curriculum that increases computational costs and complexity during fine-tuning. In this work, we propose the Recurrent Memory Transformer with a Memory Stream (RMT-MS), a novel architecture with layer-wise memory states and horizontal memory connections across segments. These mechanisms increase memory capacity and improve information flow, reducing the need for curriculum learning. We evaluate RMT-MS alongside RMT and ARMT on three long-context tasks: associative retrieval, BABILong QA1, and QA3. Our experiments show that RMT-MS achieves strong performance in single-stage training, matching curriculum-trained baselines on simpler tasks, and narrowing the gap on more complex ones. These results highlight the potential of RMT-MS for efficient long-context modeling without costly training schedules.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S158 - S165"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Graph Representations to Enhance Critical Path Delay Prediction in Digital Complex Functional Blocks Using Neural Networks 利用图表示增强数字复杂功能块关键路径延迟预测的神经网络
IF 0.8 Q4 OPTICS Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601691
M. Dashiev, N. Zheludkov, I. Karandashev

Accurate critical path delay estimation plays a vital role in reducing unnecessary routing iterations and identifying potentially unsuccessful design runs early in the flow. This study proposes an architecture that integrates graph representations derived from digital complex functional blocks netlist and design constraints, leveraging a Multi-head cross-attention mechanism. This architecture significantly improves the accuracy of critical path delay estimation compared to standard tools provided by the OpenROAD EDA. The mean absolute percentage error (MAPE) of the OpenRoad standard tool—openSTA is 12.60%, whereas our algorithm achieves a substantially lower error of 7.57%. A comparison of various architectures was conducted, along with an investigation into the impact of incorporating netlist-derived information.

准确的关键路径延迟估计在减少不必要的路由迭代和识别流早期可能不成功的设计运行方面起着至关重要的作用。本研究提出了一种架构,该架构集成了源自数字复杂功能块网表和设计约束的图形表示,利用了多头交叉注意机制。与OpenROAD EDA提供的标准工具相比,该架构显著提高了关键路径延迟估计的准确性。OpenRoad标准工具opensta的平均绝对百分比误差(MAPE)为12.60%,而我们的算法实现的误差要低得多,为7.57%。我们对各种架构进行了比较,并对合并netlist派生信息的影响进行了调查。
{"title":"Leveraging Graph Representations to Enhance Critical Path Delay Prediction in Digital Complex Functional Blocks Using Neural Networks","authors":"M. Dashiev,&nbsp;N. Zheludkov,&nbsp;I. Karandashev","doi":"10.3103/S1060992X25601691","DOIUrl":"10.3103/S1060992X25601691","url":null,"abstract":"<p>Accurate critical path delay estimation plays a vital role in reducing unnecessary routing iterations and identifying potentially unsuccessful design runs early in the flow. This study proposes an architecture that integrates graph representations derived from digital complex functional blocks netlist and design constraints, leveraging a Multi-head cross-attention mechanism. This architecture significantly improves the accuracy of critical path delay estimation compared to standard tools provided by the OpenROAD EDA. The mean absolute percentage error (MAPE) of the OpenRoad standard tool—openSTA is 12.60%, whereas our algorithm achieves a substantially lower error of 7.57%. A comparison of various architectures was conducted, along with an investigation into the impact of incorporating netlist-derived information.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S135 - S147"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Mapping Algorithm for More Effective Neural Network Training 深度映射算法用于更有效的神经网络训练
IF 0.8 Q4 OPTICS Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25700195
H. Shen, V. S. Smolin

The problem of approximating nonlinear vector transformations using neural network algorithms is considered. In addition to approximation, one of the reasons for algorithms reaching local minima rather than global minima of the loss function during optimization is identified: the “switching off” or “death” of a significant number of neurons during training. A multidimensional neural mapping algorithm is proposed, programmatically implemented, and numerically investigated to drastically reduce the influence of this factor on approximation accuracy. The theory and results of numerical experiments on approximation using neural mapping are presented.

研究了用神经网络算法逼近非线性向量变换的问题。除了近似之外,还确定了算法在优化过程中达到损失函数的局部最小值而不是全局最小值的原因之一:在训练过程中大量神经元的“关闭”或“死亡”。提出了一种多维神经映射算法,并对其进行了编程实现和数值研究,以大幅度降低该因素对近似精度的影响。给出了神经映射逼近的理论和数值实验结果。
{"title":"Deep Mapping Algorithm for More Effective Neural Network Training","authors":"H. Shen,&nbsp;V. S. Smolin","doi":"10.3103/S1060992X25700195","DOIUrl":"10.3103/S1060992X25700195","url":null,"abstract":"<p>The problem of approximating nonlinear vector transformations using neural network algorithms is considered. In addition to approximation, one of the reasons for algorithms reaching local minima rather than global minima of the loss function during optimization is identified: the “switching off” or “death” of a significant number of neurons during training. A multidimensional neural mapping algorithm is proposed, programmatically implemented, and numerically investigated to drastically reduce the influence of this factor on approximation accuracy. The theory and results of numerical experiments on approximation using neural mapping are presented.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S83 - S93"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RCDINO: Enhancing Radar–Camera 3D Object Detection with DINOv2 Semantic Features RCDINO:利用DINOv2语义特征增强雷达-相机三维目标检测
IF 0.8 Q4 OPTICS Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601708
O. Matykina, D. Yudin

Three-dimensional object detection is essential for autonomous driving and robotics, relying on effective fusion of multimodal data from cameras and radar. This work proposes RCDINO, a multimodal transformer-based model that enhances visual backbone features by fusing them with semantically rich representations from the pretrained DINOv2 foundation model. This approach enriches visual representations and improves the model’s detection performance while preserving compatibility with the baseline architecture. Experiments on the nuScenes dataset demonstrate that RCDINO achieves state-of-the-art performance among radar–camera models, with 56.4 NDS and 48.1 mAP. Our implementation is available at https://github.com/OlgaMatykina/RCDINO.

三维物体检测对于自动驾驶和机器人技术至关重要,它依赖于来自摄像头和雷达的多模态数据的有效融合。这项工作提出了RCDINO,一个基于多模态变压器的模型,通过将视觉骨干特征与来自预训练DINOv2基础模型的语义丰富的表示融合来增强视觉骨干特征。这种方法丰富了可视化表示,提高了模型的检测性能,同时保持了与基线体系结构的兼容性。在nuScenes数据集上的实验表明,RCDINO在雷达-相机模型中达到了最先进的性能,NDS为56.4,mAP为48.1。我们的实现可以在https://github.com/OlgaMatykina/RCDINO上获得。
{"title":"RCDINO: Enhancing Radar–Camera 3D Object Detection with DINOv2 Semantic Features","authors":"O. Matykina,&nbsp;D. Yudin","doi":"10.3103/S1060992X25601708","DOIUrl":"10.3103/S1060992X25601708","url":null,"abstract":"<p>Three-dimensional object detection is essential for autonomous driving and robotics, relying on effective fusion of multimodal data from cameras and radar. This work proposes RCDINO, a multimodal transformer-based model that enhances visual backbone features by fusing them with semantically rich representations from the pretrained DINOv2 foundation model. This approach enriches visual representations and improves the model’s detection performance while preserving compatibility with the baseline architecture. Experiments on the nuScenes dataset demonstrate that RCDINO achieves state-of-the-art performance among radar–camera models, with 56.4 NDS and 48.1 mAP. Our implementation is available at https://github.com/OlgaMatykina/RCDINO.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S47 - S57"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wire-Structured Object 3D Point Cloud Filtering Using a Transformer Model 使用变压器模型的线结构对象3D点云过滤
IF 0.8 Q4 OPTICS Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601812
V. Kniaz, V. Knyaz, T. Skrypitsyna, P. Moshkantsev, A. Bordodymov

The rapid reconstruction of partially destroyed cultural heritage objects is crucial in architectural history. Many significant structures have suffered damage from erosion, earthquakes, or human activity, often leaving only the armature intact. Simplified 3D reconstruction techniques using digital cameras and laser rangefinders are essential for these monuments, frequently located in abandoned areas. However, interior surfaces visible through exterior openings complicate reconstruction by introducing outliers in the 3D point cloud. This paper introduces the WireNetV3 model for precise 3D segmentation of wire structures in color images. The model distinguishes between front and interior surfaces, filtering outliers during feature matching. Building on SegFormer 3D and WireNetV2, our approach integrates transformers with task-specific features and introduces a novel loss function, WireSDF, for distance calculation from wire axes. Evaluations on datasets featuring the Shukhov Tower and a church dome demonstrate that WireNetV3 surpasses existing methods in Intersection-over-Union metrics and 3D model accuracy.

快速重建部分毁坏的文化遗产在建筑史上是至关重要的。许多重要的建筑都遭受了侵蚀、地震或人类活动的破坏,通常只留下了完好无损的电枢。使用数码相机和激光测距仪的简化3D重建技术对于这些经常位于废弃地区的纪念碑至关重要。然而,通过外部开口可见的内部表面通过在3D点云中引入异常值使重建复杂化。本文介绍了用于彩色图像中线材结构精确三维分割的WireNetV3模型。该模型区分前表面和内表面,在特征匹配过程中过滤异常值。在SegFormer 3D和WireNetV2的基础上,我们的方法集成了具有特定任务功能的变压器,并引入了一种新的损失函数WireSDF,用于从线轴计算距离。对以舒霍夫塔和教堂穹顶为特征的数据集的评估表明,WireNetV3在交叉联盟度量和3D模型精度方面超越了现有方法。
{"title":"Wire-Structured Object 3D Point Cloud Filtering Using a Transformer Model","authors":"V. Kniaz,&nbsp;V. Knyaz,&nbsp;T. Skrypitsyna,&nbsp;P. Moshkantsev,&nbsp;A. Bordodymov","doi":"10.3103/S1060992X25601812","DOIUrl":"10.3103/S1060992X25601812","url":null,"abstract":"<p>The rapid reconstruction of partially destroyed cultural heritage objects is crucial in architectural history. Many significant structures have suffered damage from erosion, earthquakes, or human activity, often leaving only the armature intact. Simplified 3D reconstruction techniques using digital cameras and laser rangefinders are essential for these monuments, frequently located in abandoned areas. However, interior surfaces visible through exterior openings complicate reconstruction by introducing outliers in the 3D point cloud. This paper introduces the <span>WireNetV3</span> model for precise 3D segmentation of wire structures in color images. The model distinguishes between front and interior surfaces, filtering outliers during feature matching. Building on SegFormer 3D and <span>WireNetV2</span>, our approach integrates transformers with task-specific features and introduces a novel loss function, WireSDF, for distance calculation from wire axes. Evaluations on datasets featuring the Shukhov Tower and a church dome demonstrate that <span>WireNetV3</span> surpasses existing methods in Intersection-over-Union metrics and 3D model accuracy.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S175 - S184"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interaction between Learning and Evolution at the Formation of Functional Systems 学习与进化在功能系统形成中的相互作用
IF 0.8 Q4 OPTICS Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601666
V. G. Red’ko, M. S. Burtsev

In the present work, a model of the interaction between learning and evolution at the formation of functional systems is constructed and studied. The behavior of a population of learning agents is analyzed. The agent’s control system consists of a set of functional systems. Each functional system includes a set of elements. The presence or absence of an element in the considered functional system is encoded by binary symbols 1 or 0. Each agent has a genotype and phenotype, which are encoded by chains of binary symbols and represent the combined chains of functional systems. A functional system is completely formed when all its elements are present in it. The more is the number of completely formed functional systems that an agent has, the higher is the agent’s fitness. The evolution of a population of agents consists of generations. During each generation, the genotypes of agents do not change, and the phenotypes are optimized via learning, namely, via the formation of new functional systems. The phenotype of an agent at the beginning of a generation is equal to its genotype. At the end of the generation, the number of functional systems in the agent’s phenotype is determined; the larger is this number, the higher is the agent’s fitness. Agents are selected into a new generation with probabilities that are proportional to their fitness. The descendant agent receives the genotype of the parent agent (with small mutations). Thus, the selection of agents occurs in accordance with their phenotypes, which are optimized by learning, and the genotypes of agents are inherited. The model was studied by computer simulation; the effects of the interaction between learning and evolution in the processes of formation of functional systems were analyzed.

在本工作中,构建和研究了功能系统形成过程中学习和进化之间相互作用的模型。分析了一群学习智能体的行为。agent的控制系统由一组功能系统组成。每个功能系统都包含一组元素。所考虑的功能系统中元素的存在或不存在由二进制符号1或0编码。每种制剂都有基因型和表型,它们由二进制符号链编码,代表功能系统的组合链。当一个功能系统的所有元素都存在于其中时,它就完全形成了。一个智能体拥有的完全形成的功能系统越多,智能体的适应度就越高。一个主体群体的进化是由几代人组成的。在每一代中,试剂的基因型不会改变,表型通过学习优化,即通过形成新的功能系统。一种物质在一代开始时的表型等于它的基因型。在一代结束时,确定代理表型中功能系统的数量;这个数字越大,智能体的适应度就越高。智能体被选择进入新一代,其概率与它们的适应度成正比。后代代理接受亲本代理的基因型(具有小突变)。因此,药物的选择是根据其表型进行的,并通过学习进行优化,并且药物的基因型是遗传的。通过计算机仿真对模型进行了研究;分析了学习与进化在功能系统形成过程中的交互作用。
{"title":"Interaction between Learning and Evolution at the Formation of Functional Systems","authors":"V. G. Red’ko,&nbsp;M. S. Burtsev","doi":"10.3103/S1060992X25601666","DOIUrl":"10.3103/S1060992X25601666","url":null,"abstract":"<p>In the present work, a model of the interaction between learning and evolution at the formation of functional systems is constructed and studied. The behavior of a population of learning agents is analyzed. The agent’s control system consists of a set of functional systems. Each functional system includes a set of elements. The presence or absence of an element in the considered functional system is encoded by binary symbols 1 or 0. Each agent has a genotype and phenotype, which are encoded by chains of binary symbols and represent the combined chains of functional systems. A functional system is completely formed when all its elements are present in it. The more is the number of completely formed functional systems that an agent has, the higher is the agent’s fitness. The evolution of a population of agents consists of generations. During each generation, the genotypes of agents do not change, and the phenotypes are optimized via learning, namely, via the formation of new functional systems. The phenotype of an agent at the beginning of a generation is equal to its genotype. At the end of the generation, the number of functional systems in the agent’s phenotype is determined; the larger is this number, the higher is the agent’s fitness. Agents are selected into a new generation with probabilities that are proportional to their fitness. The descendant agent receives the genotype of the parent agent (with small mutations). Thus, the selection of agents occurs in accordance with their phenotypes, which are optimized by learning, and the genotypes of agents are inherited. The model was studied by computer simulation; the effects of the interaction between learning and evolution in the processes of formation of functional systems were analyzed.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S30 - S46"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial Traces: Enhancing VLA Models with Spatial-Temporal Understanding 空间轨迹:增强VLA模型的时空理解
IF 0.8 Q4 OPTICS Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601654
M. A. Patratskiy, A. K. Kovalev, A. I. Panov

Vision-Language-Action models have demonstrated remarkable capabilities in predicting agent movements within virtual environments and real-world scenarios based on visual observations and textual instructions. Although recent research has focused on enhancing spatial and temporal understanding independently, this paper presents a novel approach that integrates both aspects through visual prompting. We introduce a method that projects visual traces of key points from observations onto depth maps, enabling models to capture both spatial and temporal information simultaneously. The experiments in SimplerEnv show that the mean number of tasks successfully solved increased for 4% compared to SpatialVLA and 19% compared to TraceVLA. Furthermore, we show that this enhancement can be achieved with minimal training data, making it particularly valuable for real-world applications where data collection is challenging. The project page is available at https://ampiromax.github.io/ST-VLA.

基于视觉观察和文本指令,视觉-语言-动作模型在预测虚拟环境和现实世界场景中的智能体运动方面表现出了卓越的能力。虽然最近的研究主要集中在提高空间和时间的理解上,但本文提出了一种通过视觉提示将这两个方面结合起来的新方法。我们介绍了一种方法,将观测到的关键点的视觉轨迹投影到深度图上,使模型能够同时捕获空间和时间信息。SimplerEnv中的实验表明,与SpatialVLA相比,成功解决的任务平均数量增加了4%,与TraceVLA相比增加了19%。此外,我们表明这种增强可以用最少的训练数据实现,这对于数据收集具有挑战性的实际应用程序特别有价值。项目页面可在https://ampiromax.github.io/ST-VLA上找到。
{"title":"Spatial Traces: Enhancing VLA Models with Spatial-Temporal Understanding","authors":"M. A. Patratskiy,&nbsp;A. K. Kovalev,&nbsp;A. I. Panov","doi":"10.3103/S1060992X25601654","DOIUrl":"10.3103/S1060992X25601654","url":null,"abstract":"<p>Vision-Language-Action models have demonstrated remarkable capabilities in predicting agent movements within virtual environments and real-world scenarios based on visual observations and textual instructions. Although recent research has focused on enhancing spatial and temporal understanding independently, this paper presents a novel approach that integrates both aspects through visual prompting. We introduce a method that projects visual traces of key points from observations onto depth maps, enabling models to capture both spatial and temporal information simultaneously. The experiments in SimplerEnv show that the mean number of tasks successfully solved increased for 4% compared to SpatialVLA and 19% compared to TraceVLA. Furthermore, we show that this enhancement can be achieved with minimal training data, making it particularly valuable for real-world applications where data collection is challenging. The project page is available at https://ampiromax.github.io/ST-VLA.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S72 - S82"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoding EEG Data with Deep Learning for Intelligence Quotient Assessment 基于深度学习的脑电数据解码与智商评估
IF 0.8 Q4 OPTICS Pub Date : 2025-09-17 DOI: 10.3103/S1060992X24601921
Prithwijit Mukherjee,  Anisha Halder Roy

Intelligence quotient (IQ) serves as a statistical gauge for evaluating an individual’s cognitive prowess. Measuring IQ is a formidable undertaking, mainly due to the intricate intricacies of the human brain’s composition. Presently, the assessment of human intelligence relies solely on conventional paper-based psychometric tests. However, these approaches suffer from inherent discrepancies arising from the diversity of test formats and language barriers. The primary objective of this study is to introduce an innovative, deep learning-driven methodology for IQ measurement using Electroencephalogram (EEG) signals. In this investigation, EEG signals are captured from participants during an IQ assessment session. Subsequently, participants' IQ levels are categorized into six distinct tiers, encompassing extremely low IQ, borderline IQ, low average IQ, high average IQ, superior IQ, and very superior IQ, based on their test results. An attention mechanism-based Convolution Neural Network-modified tanh Long-Short-term-Memory (CNN-MTLSTM) model has been meticulously devised for adeptly classifying individuals into the aforementioned IQ categories by using EEG signals. A layer named 'input enhancement layer' is proposed and incorporated in CNN-MTLSTM for enhancing its prediction accuracy. Notably, a CNN is harnessed to automate the process of extracting important information from the extracted EEG features. A new model, i.e., MTLSTM, is proposed, which works as a classifier. The paper’s contributions encompass proposing the novel MTLSTM architecture and leveraging attention mechanism to enhance the classification accuracy of the CNN-MTLSTM model. The innovative CNN-MTLSTM model, incorporating an attention mechanism within the MTLSTM network, attains a remarkable average accuracy of 97.41% in assessing a person’s IQ level.

智商(IQ)是评估个人认知能力的统计指标。测量智商是一项艰巨的任务,主要是由于人类大脑的组成错综复杂。目前,人类智力的评估完全依赖于传统的纸质心理测试。然而,由于测试形式的多样性和语言障碍,这些方法存在固有的差异。本研究的主要目的是介绍一种创新的、深度学习驱动的方法,用于使用脑电图(EEG)信号进行智商测量。在这项研究中,在智商评估过程中,从参与者身上捕获脑电图信号。随后,根据测试结果,参与者的智商水平被分为六个不同的等级,包括极低智商、边缘智商、低平均智商、高平均智商、高智商和超高智商。本文精心设计了一个基于注意机制的卷积神经网络修正长短期记忆(CNN-MTLSTM)模型,利用脑电图信号熟练地将个体划分为上述智商类别。为了提高CNN-MTLSTM的预测精度,提出了一层“输入增强层”并将其加入到CNN-MTLSTM中。值得注意的是,利用CNN从提取的EEG特征中自动提取重要信息。提出了一种新的分类器模型MTLSTM。本文的贡献包括提出新的MTLSTM架构和利用注意力机制来提高CNN-MTLSTM模型的分类精度。创新的CNN-MTLSTM模型在MTLSTM网络中加入了注意机制,在评估一个人的智商水平时达到了97.41%的平均准确率。
{"title":"Decoding EEG Data with Deep Learning for Intelligence Quotient Assessment","authors":"Prithwijit Mukherjee,&nbsp; Anisha Halder Roy","doi":"10.3103/S1060992X24601921","DOIUrl":"10.3103/S1060992X24601921","url":null,"abstract":"<p>Intelligence quotient (IQ) serves as a statistical gauge for evaluating an individual’s cognitive prowess. Measuring IQ is a formidable undertaking, mainly due to the intricate intricacies of the human brain’s composition. Presently, the assessment of human intelligence relies solely on conventional paper-based psychometric tests. However, these approaches suffer from inherent discrepancies arising from the diversity of test formats and language barriers. The primary objective of this study is to introduce an innovative, deep learning-driven methodology for IQ measurement using Electroencephalogram (EEG) signals. In this investigation, EEG signals are captured from participants during an IQ assessment session. Subsequently, participants' IQ levels are categorized into six distinct tiers, encompassing extremely low IQ, borderline IQ, low average IQ, high average IQ, superior IQ, and very superior IQ, based on their test results. An attention mechanism-based Convolution Neural Network-modified tanh Long-Short-term-Memory (CNN-MTLSTM) model has been meticulously devised for adeptly classifying individuals into the aforementioned IQ categories by using EEG signals. A layer named 'input enhancement layer' is proposed and incorporated in CNN-MTLSTM for enhancing its prediction accuracy. Notably, a CNN is harnessed to automate the process of extracting important information from the extracted EEG features. A new model, i.e., MTLSTM, is proposed, which works as a classifier. The paper’s contributions encompass proposing the novel MTLSTM architecture and leveraging attention mechanism to enhance the classification accuracy of the CNN-MTLSTM model. The innovative CNN-MTLSTM model, incorporating an attention mechanism within the MTLSTM network, attains a remarkable average accuracy of 97.41% in assessing a person’s IQ level.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"441 - 456"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open-Vocabulary Indoor Object Grounding with 3D Hierarchical Scene Graph 基于三维分层场景图的开放词汇室内物体接地
IF 0.8 Q4 OPTICS Pub Date : 2025-09-17 DOI: 10.3103/S1060992X25600673
S. Linok, G. Naumov

We propose OVIGo-3DHSG method—Open-Vocabulary Indoor Grounding of objects using 3D Hierarchical Scene Graph. OVIGo-3DHSG represents an extensive indoor environment over a Hierarchical Scene Graph derived from sequences of RGB-D frames utilizing a set of open-vocabulary foundation models and sensor data processing. The hierarchical representation explicitly models spatial relations across floors, rooms, locations, and objects. To effectively address complex queries involving spatial reference to other objects, we integrate the hierarchical scene graph with a Large Language Model for multistep reasoning. This integration leverages inter-layer (e.g., room-to-object) and intra-layer (e.g., object-to-object) connections, enhancing spatial contextual understanding. We investigate the semantic and geometry accuracy of hierarchical representation on Habitat Matterport 3D Semantic multi-floor scenes. Our approach demonstrates efficient scene comprehension and robust object grounding compared to existing methods. Overall OVIGo-3DHSG demonstrates strong potential for applications requiring spatial reasoning and understanding of indoor environments. Related materials can be found at https://github.com/linukc/OVIGo-3DHSG.

我们提出了OVIGo-3DHSG方法——基于三维层次场景图的开放词汇室内物体接地。OVIGo-3DHSG利用一组开放词汇基础模型和传感器数据处理,在RGB-D帧序列派生的分层场景图上表示广泛的室内环境。分层表示显式地对跨楼层、房间、位置和对象的空间关系进行建模。为了有效地处理涉及到其他对象的空间引用的复杂查询,我们将分层场景图与用于多步推理的大型语言模型相结合。这种集成利用了层间(例如,房间到对象)和层内(例如,对象到对象)的连接,增强了空间上下文的理解。研究了Habitat Matterport三维语义多层场景的分层表示的语义和几何精度。与现有方法相比,我们的方法展示了高效的场景理解和鲁棒的对象基础。总体而言,OVIGo-3DHSG在需要空间推理和室内环境理解的应用中显示出强大的潜力。相关资料可在https://github.com/linukc/OVIGo-3DHSG找到。
{"title":"Open-Vocabulary Indoor Object Grounding with 3D Hierarchical Scene Graph","authors":"S. Linok,&nbsp;G. Naumov","doi":"10.3103/S1060992X25600673","DOIUrl":"10.3103/S1060992X25600673","url":null,"abstract":"<p>We propose <b>OVIGo-3DHSG</b> method—<b>O</b>pen-<b>V</b>ocabulary <b>I</b>ndoor <b>G</b>rounding of <b>o</b>bjects using <b>3D</b> <b>H</b>ierarchical <b>S</b>cene <b>G</b>raph. OVIGo-3DHSG represents an extensive indoor environment over a Hierarchical Scene Graph derived from sequences of RGB-D frames utilizing a set of open-vocabulary foundation models and sensor data processing. The hierarchical representation explicitly models spatial relations across floors, rooms, locations, and objects. To effectively address complex queries involving spatial reference to other objects, we integrate the hierarchical scene graph with a Large Language Model for multistep reasoning. This integration leverages inter-layer (e.g., room-to-object) and intra-layer (e.g., object-to-object) connections, enhancing spatial contextual understanding. We investigate the semantic and geometry accuracy of hierarchical representation on Habitat Matterport 3D Semantic multi-floor scenes. Our approach demonstrates efficient scene comprehension and robust object grounding compared to existing methods. Overall OVIGo-3DHSG demonstrates strong potential for applications requiring spatial reasoning and understanding of indoor environments. Related materials can be found at https://github.com/linukc/OVIGo-3DHSG.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"323 - 333"},"PeriodicalIF":0.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Optical Memory and Neural Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1