Optical Memory and Neural Networks最新文献

英文中文

Quantitative EEG Decomposition and Silver Howl Optimization for Multi-Stage Autism Spectrum Disorder Classification 基于脑电定量分解和银嚎优化的多阶段自闭症谱系障碍分类

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks

Pub Date : 2025-12-24 DOI: 10.3103/S1060992X25600454

Sherin M Wilson, K. S. Kannan

A complicated neurodevelopmental disorder, autism spectrum disorder (ASD) is represented by difficulties with cognition and behavior. Early and accurate diagnosis is crucial for effective intervention. However, existing machine learning methods for ASD detection face limitations, including inefficiencies in EEG signal noise removal, challenges in feature extraction, and difficulties in stage-wise classification. To address these challenges, the SilverHowl-QDecomp Framework is proposed to enhance EEG-based ASD classification through advanced signal processing and feature extraction techniques. The LaplaZ Filter effectively minimizes noise while preserving critical signal components and normalization techniques ensure data consistency. Furthermore, the proposed feature extraction method captures nonlinear and dynamic EEG characteristics, improving classification accuracy by isolating essential features and reducing computational complexity. To enhance ASD stage classification, the SilverHowl Classifier was introduced, implementing the BCIAUT-P300 dataset and leveraging optimized hyperparameters to achieve better discrimination between ASD stages. With an accuracy of 0.985 and a precision of 0.98572, this method performs better than conventional techniques, thereby offering a more reliable and precise classification framework. The proposed method contributes to personalized ASD interventions by enabling more accurate and stage-specific diagnoses.

自闭症谱系障碍（ASD）是一种复杂的神经发育障碍，表现为认知和行为困难。早期准确诊断对有效干预至关重要。然而，现有的用于ASD检测的机器学习方法存在局限性，包括脑电图信号噪声去除效率低下、特征提取方面的挑战以及阶段分类方面的困难。为了应对这些挑战，SilverHowl-QDecomp框架被提出，通过先进的信号处理和特征提取技术来增强基于脑电图的ASD分类。LaplaZ滤波器有效地减少噪声，同时保留关键信号成分和归一化技术，确保数据一致性。此外，所提出的特征提取方法捕获了非线性和动态的脑电特征，通过分离本质特征和降低计算复杂度来提高分类精度。为了加强ASD分期分类，引入了SilverHowl分类器，实现了BCIAUT-P300数据集，并利用优化的超参数来更好地区分ASD分期。该方法的准确率为0.985，精密度为0.98572，优于传统的分类方法，从而提供了一个更可靠、更精确的分类框架。该方法通过实现更准确和阶段特异性的诊断，有助于个性化ASD干预。

{"title":"Quantitative EEG Decomposition and Silver Howl Optimization for Multi-Stage Autism Spectrum Disorder Classification","authors":"Sherin M Wilson, K. S. Kannan","doi":"10.3103/S1060992X25600454","DOIUrl":"10.3103/S1060992X25600454","url":null,"abstract":"A complicated neurodevelopmental disorder, autism spectrum disorder (ASD) is represented by difficulties with cognition and behavior. Early and accurate diagnosis is crucial for effective intervention. However, existing machine learning methods for ASD detection face limitations, including inefficiencies in EEG signal noise removal, challenges in feature extraction, and difficulties in stage-wise classification. To address these challenges, the SilverHowl-QDecomp Framework is proposed to enhance EEG-based ASD classification through advanced signal processing and feature extraction techniques. The LaplaZ Filter effectively minimizes noise while preserving critical signal components and normalization techniques ensure data consistency. Furthermore, the proposed feature extraction method captures nonlinear and dynamic EEG characteristics, improving classification accuracy by isolating essential features and reducing computational complexity. To enhance ASD stage classification, the SilverHowl Classifier was introduced, implementing the BCIAUT-P300 dataset and leveraging optimized hyperparameters to achieve better discrimination between ASD stages. With an accuracy of 0.985 and a precision of 0.98572, this method performs better than conventional techniques, thereby offering a more reliable and precise classification framework. The proposed method contributes to personalized ASD interventions by enabling more accurate and stage-specific diagnoses.","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 4","pages":"528 - 545"},"PeriodicalIF":0.8,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Novel Activation Sparsification Approach for Large Language Models 大型语言模型的激活稀疏化新方法

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks

Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601794

A. V. Demidovskij, E. O. Burmistrova, E. I. Zharikov

Large Language Models (LLMs) require a lot of computational resources for inference. That is why the latest advancements in hardware design may offer many possibilities for speeding the LLM up. For example, TPU optimize calculations on data, transformed into the Coordinate sparse tensor format. The SparseCore processing unit that performs the calculations is heavily tailored for the extremely sparse embeddings of Deep Learning Recommendation Models. The other example of the enhanced hardware is Sparse Tensor Cores, that offer support for (n:m) data structure ((n) zeroes out of every subsequent (m) elements), that allows to drastically reduce the calculations by compressing the original matrix into a dense one. Methods like Wanda and SliceGPT prepare LLM weights to harness the power of the latter. However, as the weights are the most crucial assets of any model, it appears to be a good idea to modify the activations instead. This article introduces a novel dynamic sparsification algorithm called KurSparse , which proposes fine-grained n : m sparsity pattern, that affects only a portion of channels. This portion is selected with kurtosis threshold (zeta ). The proposed method shows significant reduction in MAC operations by 3.1x with average quality drop for LLaMA-3.1-8B model less than 2%.

大型语言模型（llm）需要大量的计算资源来进行推理。这就是为什么硬件设计的最新进展可能为加速LLM提供许多可能性。例如，TPU对数据进行优化计算，转换成坐标稀疏张量格式。执行计算的SparseCore处理单元是为深度学习推荐模型的极度稀疏嵌入量身定制的。增强硬件的另一个例子是稀疏张量核心，它提供了对(n:m)数据结构的支持（(n)从随后的每个(m)元素中取零），它允许通过将原始矩阵压缩成密集矩阵来大幅减少计算。像Wanda和SliceGPT这样的方法准备了LLM权重，以利用后者的力量。然而，由于权重是任何模型中最重要的资产，因此修改激活似乎是一个好主意。本文介绍了一种名为KurSparse的新型动态稀疏化算法，该算法提出了细粒度的n: m稀疏化模式，该模式仅影响部分通道。这部分是用峰度阈值(zeta )选择的。结果表明，LLaMA-3.1-8B模型的MAC操作显著减少了3.1倍，平均质量下降小于2%.

{"title":"Novel Activation Sparsification Approach for Large Language Models","authors":"A. V. Demidovskij, E. O. Burmistrova, E. I. Zharikov","doi":"10.3103/S1060992X25601794","DOIUrl":"10.3103/S1060992X25601794","url":null,"abstract":"Large Language Models (LLMs) require a lot of computational resources for inference. That is why the latest advancements in hardware design may offer many possibilities for speeding the LLM up. For example, TPU optimize calculations on data, transformed into the Coordinate sparse tensor format. The SparseCore processing unit that performs the calculations is heavily tailored for the extremely sparse embeddings of Deep Learning Recommendation Models. The other example of the enhanced hardware is Sparse Tensor Cores, that offer support for (n:m) data structure ((n) zeroes out of every subsequent (m) elements), that allows to drastically reduce the calculations by compressing the original matrix into a dense one. Methods like Wanda and SliceGPT prepare LLM weights to harness the power of the latter. However, as the weights are the most crucial assets of any model, it appears to be a good idea to modify the activations instead. This article introduces a novel dynamic sparsification algorithm called KurSparse , which proposes fine-grained n : m sparsity pattern, that affects only a portion of channels. This portion is selected with kurtosis threshold (zeta ). The proposed method shows significant reduction in MAC operations by 3.1x with average quality drop for LLaMA-3.1-8B model less than 2%.","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S166 - S174"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Continual Learning with Columnar Spiking Neural Networks 柱状脉冲神经网络的持续学习

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks

Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601629

D. Larionov, N. Bazenkov, M. Kiselev

Continual learning is a key feature of biological neural systems, but artificial neural networks often suffer from catastrophic forgetting. Instead of backpropagation, biologically plausible learning algorithms may enable stable continual learning. This study proposes columnar-organized spiking neural networks (SNNs) with local learning rules for continual learning and catastrophic forgetting. Using CoLaNET (Columnar Layered Network), we show that its microcolumns adapt most efficiently to new tasks when they lack shared structure with prior learning. We demonstrate how CoLaNET hyperparameters govern the trade-off between retaining old knowledge (stability) and acquiring new information (plasticity). We evaluate CoLaNET on two benchmarks: Permuted MNIST (ten sequential pixel-permuted tasks) and a two-task MNIST/EMNIST setup. Our model learns ten sequential tasks effectively, maintaining 92% accuracy on each. It shows low forgetting, with only 4% performance degradation on the first task after training on nine subsequent tasks.

持续学习是生物神经系统的一个关键特征，但人工神经网络经常遭受灾难性遗忘。而不是反向传播，生物学上合理的学习算法可以实现稳定的持续学习。本研究提出了具有局部学习规则的柱状组织尖峰神经网络（snn），用于持续学习和灾难性遗忘。使用CoLaNET（柱状分层网络），我们发现当它们缺乏与先前学习的共享结构时，其微柱最有效地适应新任务。我们展示了CoLaNET超参数如何控制保留旧知识（稳定性）和获取新信息（可塑性）之间的权衡。我们在两个基准上评估CoLaNET：排列MNIST（十个连续的像素排列任务）和双任务MNIST/EMNIST设置。我们的模型有效地学习了十个连续的任务，每个任务都保持了92%的准确率。它显示出较低的遗忘程度，在完成9个后续任务的训练后，第一个任务的表现只下降了4%。

引用次数: 0

Transfer Learning Approach Based on Generative Adaptation of Low-Dimensional Latent Representation 基于低维潜在表征生成适应的迁移学习方法

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks

Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25700213

M. M. Leonov, A. A. Soroka, A. G. Trofimov

We propose a universal framework for neural network composition based on generative adaptation in a low-dimensional latent space. The method connects two pretrained deep neural networks by introducing an adapter trained with a Wasserstein GAN, enabling knowledge transfer across domains without modifying the original models. We facilitate efficient alignment between neural layers with different semantics and dimensionalities by encoding intermediate representations into a fixed-size latent space via autoencoders. Furthermore, we introduce an improved clustering-based algorithm to detect optimal connection points for both networks and reduce the computational cost. Experiments with models combining pretrained ResNet and DistilBERT networks for image classification and regression tasks demonstrate the validity and advantages of our approach in cross-modal tasks. The adapter achieves high performance with minimal overhead, enabling flexible reuse of pretrained models in new domains without modification of their weights.

我们提出了一种基于低维潜在空间生成适应的神经网络组成通用框架。该方法通过引入用Wasserstein GAN训练的适配器连接两个预训练的深度神经网络，在不修改原始模型的情况下实现跨域知识转移。我们通过自动编码器将中间表示编码到固定大小的潜在空间中，从而促进具有不同语义和维度的神经层之间的有效对齐。此外，我们还引入了一种改进的基于聚类的算法来检测两个网络的最佳连接点，从而降低了计算成本。结合预训练的ResNet和蒸馏伯特网络进行图像分类和回归任务的模型实验证明了我们的方法在跨模态任务中的有效性和优势。适配器以最小的开销实现高性能，支持在新域中灵活地重用预训练模型，而无需修改其权重。

引用次数: 0

Interpretation of Kolmogorov–Arnold Networks Using the Example of Solving the Inverse Problem of Photoluminescence Spectroscopy 用解光致发光光谱反问题为例解释Kolmogorov-Arnold网络

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks

Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25602052

G. Kupriyanov, I. Isaev, K. Laptinskiy, T. Dolenko, S. Dolenko

Kolmogorov–Arnold networks (KANs) are not only notable for their approximation capabilities but also for their potential in model interpretability. This work focuses on the study of the interpretative capabilities of KAN using the example of solving the luminescent spectroscopy inverse problem to create a multimodal carbon nanosensor for metal ions in water. The improved visual interpretation, which considers interrelation of the inputs and of the features processed by the model using color gradation, made it possible to identify the basic principles of KAN operation and collocate them with physical experimental observations. A modification of KAN with an architecturally integrated interpretation mechanism is proposed: λ-KAN. Mathematically proved interpretative capabilities of the λ‑KAN were confirmed on the inverse problem of luminescent spectroscopy. λ-KAN combines approximation capabilities at the level of neural network approaches with a transparent interpretation comparable to linear regression, which makes it a promising machine learning architecture for using in tasks requiring valid interpretation mechanisms. The code used in this work is posted on GitHub.

Kolmogorov-Arnold网络（KANs）不仅因其近似能力而闻名，而且因其在模型可解释性方面的潜力而闻名。本文以解决发光光谱反问题为例，重点研究了KAN的解释能力，以创建水中金属离子的多模态碳纳米传感器。改进的视觉解译，考虑了输入和模型使用颜色渐变处理的特征之间的相互关系，使得识别KAN操作的基本原理并将其与物理实验观测相匹配成为可能。提出了一种基于结构集成解释机制的KAN改进方案：λ-KAN。λ - KAN在发光光谱反演问题上的解释能力得到了数学证明。λ-KAN结合了神经网络方法级别的近似能力和与线性回归相当的透明解释，这使得它成为一种有前途的机器学习架构，可用于需要有效解释机制的任务。这项工作中使用的代码发布在GitHub上。

{"title":"Interpretation of Kolmogorov–Arnold Networks Using the Example of Solving the Inverse Problem of Photoluminescence Spectroscopy","authors":"G. Kupriyanov, I. Isaev, K. Laptinskiy, T. Dolenko, S. Dolenko","doi":"10.3103/S1060992X25602052","DOIUrl":"10.3103/S1060992X25602052","url":null,"abstract":"Kolmogorov–Arnold networks (KANs) are not only notable for their approximation capabilities but also for their potential in model interpretability. This work focuses on the study of the interpretative capabilities of KAN using the example of solving the luminescent spectroscopy inverse problem to create a multimodal carbon nanosensor for metal ions in water. The improved visual interpretation, which considers interrelation of the inputs and of the features processed by the model using color gradation, made it possible to identify the basic principles of KAN operation and collocate them with physical experimental observations. A modification of KAN with an architecturally integrated interpretation mechanism is proposed: λ-KAN. Mathematically proved interpretative capabilities of the λ‑KAN were confirmed on the inverse problem of luminescent spectroscopy. λ-KAN combines approximation capabilities at the level of neural network approaches with a transparent interpretation comparable to linear regression, which makes it a promising machine learning architecture for using in tasks requiring valid interpretation mechanisms. The code used in this work is posted on GitHub.","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S125 - S134"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neural Networks-Based Routing Congestion Prediction Using Initial Layout Parameters 基于神经网络的初始布局参数路由拥塞预测

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks

Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601538

M. Saibodalov, M. Dashiev, I. Karandashev, N. Zheludkov, E. Kocheva

This paper considers the problem of congestion map prediction at the pre-routing stage of VLSI layout design of digital blocks by applying neural network models. Early prediction of congestion will allow the VLSI design engineer to modify floorplan, macro placement and input-output port placement to prevent interconnect routing issues at later stages. This, in turn, reduces the number of EDA tool runs and the overall circuit design runtime. In this work we propose the use of initial layout parameters as input channels in the U-Net architecture, which was not considered in other works. These parameters enhance the model’s ability to predict routing congestion with greater accuracy. As a result, we achieved a Pearson correlation with target maps of around 0.83, indicating strong model performance.

本文应用神经网络模型研究了VLSI数字块布局设计中路由预阶段的拥塞图预测问题。对拥塞的早期预测将允许VLSI设计工程师修改平面图、宏布局和输入输出端口布局，以防止后期互连路由问题。这反过来又减少了EDA工具的运行次数和整个电路设计的运行时间。在这项工作中，我们提出在U-Net架构中使用初始布局参数作为输入通道，这在其他工作中没有考虑到。这些参数增强了模型更准确地预测路由拥塞的能力。结果，我们实现了与目标地图的Pearson相关性约为0.83，表明模型性能良好。

引用次数: 0

Use of Sigma-Pi-Neural Networks for Approximation of the Optimality Criterion in the J-SNAC Scheme for Aircraft Motion Control 应用sigma - pi神经网络逼近飞机运动控制J-SNAC方案的最优准则

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks

Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601976

Yu. V. Tiumentsev, R. A. Tskhai

Currently, there are a large number of tasks to be carried out by aircraft. The complicating factor in this case is incomplete and inaccurate knowledge of the properties of the object under investigation and the conditions in which it operates. In particular, during the flight may arise various abnormal situations such as equipment failures and structural damage that need to be remedied by reconfiguring the control system or controls of the aircraft. The aircraft control system should be able to work effectively in these conditions by rapidly changing the parameters and/or structure of the control laws. Adaptive control techniques allow this requirement to be met. One of the approaches to the synthesis of adaptive laws for dynamic systems control is the application of machine learning methods. The article proposes to use for this purpose one of the variants of the adaptive critic method, namely the J-SNAC scheme. The algorithm implemented by this scheme is considered. A distinctive feature of the proposed J-SNAC variant is the use of sigma-pi network to implement the critic included in this scheme. Data from the computational experiment carried out in relation to the lateral motion of a maneuverable aircraft demonstrates the efficiency and prospects of using sigma-pi-net in J-SNAC.

目前，有大量的任务需要由飞机来执行。在这种情况下，复杂的因素是不完整和不准确的知识的性质的对象的调查和它的操作条件。特别是，在飞行过程中可能出现各种异常情况，如设备故障和结构损坏，需要通过重新配置控制系统或飞机的控制来补救。飞机控制系统应该能够通过快速改变控制律的参数和/或结构在这些条件下有效地工作。自适应控制技术可以满足这一要求。综合动态系统控制自适应规律的方法之一是应用机器学习方法。为此，本文建议使用自适应批评方法的一种变体，即J-SNAC方案。考虑了该方案实现的算法。提出的J-SNAC变体的一个显著特征是使用sigma-pi网络来实现该方案中包含的批评。对某机动飞机的横向运动进行了计算实验，验证了在J-SNAC中使用sigma-pi-net的有效性和前景。

{"title":"Use of Sigma-Pi-Neural Networks for Approximation of the Optimality Criterion in the J-SNAC Scheme for Aircraft Motion Control","authors":"Yu. V. Tiumentsev, R. A. Tskhai","doi":"10.3103/S1060992X25601976","DOIUrl":"10.3103/S1060992X25601976","url":null,"abstract":"Currently, there are a large number of tasks to be carried out by aircraft. The complicating factor in this case is incomplete and inaccurate knowledge of the properties of the object under investigation and the conditions in which it operates. In particular, during the flight may arise various abnormal situations such as equipment failures and structural damage that need to be remedied by reconfiguring the control system or controls of the aircraft. The aircraft control system should be able to work effectively in these conditions by rapidly changing the parameters and/or structure of the control laws. Adaptive control techniques allow this requirement to be met. One of the approaches to the synthesis of adaptive laws for dynamic systems control is the application of machine learning methods. The article proposes to use for this purpose one of the variants of the adaptive critic method, namely the J-SNAC scheme. The algorithm implemented by this scheme is considered. A distinctive feature of the proposed J-SNAC variant is the use of sigma-pi network to implement the critic included in this scheme. Data from the computational experiment carried out in relation to the lateral motion of a maneuverable aircraft demonstrates the efficiency and prospects of using sigma-pi-net in J-SNAC.","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S102 - S114"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Addressing Data Scarcity in Spectroscopy with Variational Autoencoders 用变分自编码器解决光谱学中的数据短缺问题

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks

Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25700201

A. Mushchina, I. Isaev, O. Sarmanova, T. Dolenko, S. Dolenko

Solving inverse problems in many areas of natural science, including spectroscopy, is often a challenge due to well-known properties of such problems, including nonlinearity, high input dimension, and being ill-posed or incorrect. One of the approaches that may deal with these problems is the use of machine learning methods, e.g. artificial neural networks. However, machine learning methods require a large amount of representative data, which is often hard and expensive to obtain in experiment. An alternative may be generation of additional data with generative neural network systems, e.g. variational autoencoders. In this study, we investigate feasibility of such approach, its merits and difficulties of its use at the example of optical absorption spectroscopy of multicomponent solutions of inorganic salts applied to determine the concentrations of the components of a solution.

在包括光谱学在内的许多自然科学领域中，求解逆问题往往是一个挑战，因为这些问题具有众所周知的特性，包括非线性、高输入维数、病态或不正确。处理这些问题的方法之一是使用机器学习方法，例如人工神经网络。然而，机器学习方法需要大量的代表性数据，而这些数据在实验中往往很难获得，成本也很高。另一种选择可能是用生成神经网络系统生成额外的数据，例如变分自编码器。本文以无机盐多组分溶液的光学吸收光谱法测定溶液中各组分的浓度为例，探讨了该方法的可行性、优点和使用困难。

引用次数: 0

Semantic Space Embedding of Speech Act Intensions 言语行为意图的语义空间嵌入

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks

Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601769

A. V. Samsonovich, D. L. Khabarov, N. A. Belyaev

This work presents a semantic map of intensions, understood here as relational connotations of speech acts. The result is a tool that consists of a dataset of intensions, its embedding in a semantic space, and a graph of relations among the intensions, plus a neural network trained to recognize given intensions in utterances. The tool can be used for creating formal representations of social relational aspects of speech acts in a dialogue. The method of constructing the map is based on using OpenAI ChatGPT, fine-tuning a large language model (LLM), linear algebra, and graph theory. The constructed model of semantic space of intensions extends beyond the popular settings for sentiment or tonality analysis of texts in natural language. As a general model applicable to virtually any paradigm of social interaction, it can be used for constructing specialized models of limited paradigms. Therefore, the developed tool can enable efficient integration of LLMs with cognitive architectures, such as eBICA, for building socially emotional conversational agents.

这项工作提出了一个语义地图的内涵，这里理解为言语行为的关系内涵。结果是一个工具，包括一个意向数据集，它在语义空间中的嵌入，意向之间的关系图，以及一个经过训练的神经网络，以识别话语中的给定意向。该工具可用于创建对话中言语行为的社会关系方面的正式表示。构建地图的方法是基于OpenAI ChatGPT，微调大型语言模型（LLM），线性代数和图论。所构建的语义空间语义空间模型超越了自然语言文本情感或调性分析的常用设置。作为一种适用于几乎任何社会互动范式的通用模型，它可以用于构建有限范式的专门模型。因此，开发的工具可以使法学硕士与认知架构（如eBICA）有效集成，以构建社会情感会话代理。

引用次数: 0

Performance Study of Modern Zeroth-Order Optimization Methods for LLM Fine-Tuning LLM微调的现代零阶优化方法性能研究

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks

Pub Date : 2025-12-19 DOI: 10.3103/S1060992X25601770

A. V. Demidovskij, A. I. Trutnev

Large Language Models (LLMs) are widely employed across a broad range of applications due to their versatility and state-of-the-art performance. However, as usage scenarios grow, there is a pressing demand for task-specific adaptation of LLMs through fine-tuning. While full fine-tuning (FT) remains the most preferred in terms of quality, its high memory and computation requirements limit its practical use, especially for LLMs. Parameter-efficient fine-tuning (PEFT) techniques, such as LoRA, mitigate this issue by updating a small subset of model parameters. However, it requires an extensive number of resources due to backpropagation. In contrast, zeroth-order (ZO) optimization methods, which approximate gradients using only forward passes, offer an attractive alternative for memory-constrained environments by eliminating the need for backpropagation, thus reducing memory overhead to inference-level footprints. Over the 2024–2025 year, several ZO techniques have been proposed, aiming to balance efficiency and performance. This paper introduces the comparative analysis of 12 zeroth-order optimization methods applied for the LLM fine-tuning task by memory utilization, quality, fine-tuning time, and convergence. According to the results, the best method in terms of memory reduction is ZO-SGD-Sign: 42.82% memory reduction; the best quality and fine-tuning time across zeroth-order methods compared to SGD is achieved with LoHO: 0.6% quality drop and 11.73% fine-tuning time increase, while no ZO method currently matches the Adam and AdamW convergence efficiency.

大型语言模型（llm）由于其通用性和最先进的性能而广泛应用于广泛的应用程序。然而，随着使用场景的增长，迫切需要通过微调对llm进行特定于任务的调整。虽然完全微调（FT）在质量方面仍然是最受欢迎的，但其高内存和计算需求限制了其实际应用，特别是对于llm。参数高效微调（PEFT）技术，如LoRA，通过更新一小部分模型参数来缓解这个问题。然而，由于反向传播，它需要大量的资源。相比之下，零阶（ZO）优化方法仅使用前向传递近似梯度，通过消除反向传播的需要，从而减少对推断级占用的内存开销，为内存受限的环境提供了一个有吸引力的替代方案。在2024-2025年间，已经提出了几种ZO技术，旨在平衡效率和性能。本文介绍了用于LLM微调任务的12种零阶优化方法在内存利用率、质量、微调时间和收敛性方面的比较分析。结果表明，降低内存的最佳方法为ZO-SGD-Sign：降低42.82%的内存；与SGD方法相比，LoHO方法的质量和微调时间达到了零阶方法的最佳水平：质量下降0.6%，微调时间增加11.73%，而目前没有ZO方法的收敛效率与Adam和AdamW方法相匹配。

{"title":"Performance Study of Modern Zeroth-Order Optimization Methods for LLM Fine-Tuning","authors":"A. V. Demidovskij, A. I. Trutnev","doi":"10.3103/S1060992X25601770","DOIUrl":"10.3103/S1060992X25601770","url":null,"abstract":"Large Language Models (LLMs) are widely employed across a broad range of applications due to their versatility and state-of-the-art performance. However, as usage scenarios grow, there is a pressing demand for task-specific adaptation of LLMs through fine-tuning. While full fine-tuning (FT) remains the most preferred in terms of quality, its high memory and computation requirements limit its practical use, especially for LLMs. Parameter-efficient fine-tuning (PEFT) techniques, such as LoRA, mitigate this issue by updating a small subset of model parameters. However, it requires an extensive number of resources due to backpropagation. In contrast, zeroth-order (ZO) optimization methods, which approximate gradients using only forward passes, offer an attractive alternative for memory-constrained environments by eliminating the need for backpropagation, thus reducing memory overhead to inference-level footprints. Over the 2024–2025 year, several ZO techniques have been proposed, aiming to balance efficiency and performance. This paper introduces the comparative analysis of 12 zeroth-order optimization methods applied for the LLM fine-tuning task by memory utilization, quality, fine-tuning time, and convergence. According to the results, the best method in terms of memory reduction is ZO-SGD-Sign: 42.82% memory reduction; the best quality and fine-tuning time across zeroth-order methods compared to SGD is achieved with LoHO: 0.6% quality drop and 11.73% fine-tuning time increase, while no ZO method currently matches the Adam and AdamW convergence efficiency.","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S16 - S29"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Optical Memory and Neural Networks

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀