Pub Date : 2025-12-24DOI: 10.3103/S1060992X25600454
Sherin M Wilson, K. S. Kannan
A complicated neurodevelopmental disorder, autism spectrum disorder (ASD) is represented by difficulties with cognition and behavior. Early and accurate diagnosis is crucial for effective intervention. However, existing machine learning methods for ASD detection face limitations, including inefficiencies in EEG signal noise removal, challenges in feature extraction, and difficulties in stage-wise classification. To address these challenges, the SilverHowl-QDecomp Framework is proposed to enhance EEG-based ASD classification through advanced signal processing and feature extraction techniques. The LaplaZ Filter effectively minimizes noise while preserving critical signal components and normalization techniques ensure data consistency. Furthermore, the proposed feature extraction method captures nonlinear and dynamic EEG characteristics, improving classification accuracy by isolating essential features and reducing computational complexity. To enhance ASD stage classification, the SilverHowl Classifier was introduced, implementing the BCIAUT-P300 dataset and leveraging optimized hyperparameters to achieve better discrimination between ASD stages. With an accuracy of 0.985 and a precision of 0.98572, this method performs better than conventional techniques, thereby offering a more reliable and precise classification framework. The proposed method contributes to personalized ASD interventions by enabling more accurate and stage-specific diagnoses.
{"title":"Quantitative EEG Decomposition and Silver Howl Optimization for Multi-Stage Autism Spectrum Disorder Classification","authors":"Sherin M Wilson, K. S. Kannan","doi":"10.3103/S1060992X25600454","DOIUrl":"10.3103/S1060992X25600454","url":null,"abstract":"<p>A complicated neurodevelopmental disorder, autism spectrum disorder (ASD) is represented by difficulties with cognition and behavior. Early and accurate diagnosis is crucial for effective intervention. However, existing machine learning methods for ASD detection face limitations, including inefficiencies in EEG signal noise removal, challenges in feature extraction, and difficulties in stage-wise classification. To address these challenges, the SilverHowl-QDecomp Framework is proposed to enhance EEG-based ASD classification through advanced signal processing and feature extraction techniques. The LaplaZ Filter effectively minimizes noise while preserving critical signal components and normalization techniques ensure data consistency. Furthermore, the proposed feature extraction method captures nonlinear and dynamic EEG characteristics, improving classification accuracy by isolating essential features and reducing computational complexity. To enhance ASD stage classification, the SilverHowl Classifier was introduced, implementing the BCIAUT-P300 dataset and leveraging optimized hyperparameters to achieve better discrimination between ASD stages. With an accuracy of 0.985 and a precision of 0.98572, this method performs better than conventional techniques, thereby offering a more reliable and precise classification framework. The proposed method contributes to personalized ASD interventions by enabling more accurate and stage-specific diagnoses.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 4","pages":"528 - 545"},"PeriodicalIF":0.8,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601794
A. V. Demidovskij, E. O. Burmistrova, E. I. Zharikov
Large Language Models (LLMs) require a lot of computational resources for inference. That is why the latest advancements in hardware design may offer many possibilities for speeding the LLM up. For example, TPU optimize calculations on data, transformed into the Coordinate sparse tensor format. The SparseCore processing unit that performs the calculations is heavily tailored for the extremely sparse embeddings of Deep Learning Recommendation Models. The other example of the enhanced hardware is Sparse Tensor Cores, that offer support for (n:m) data structure ((n) zeroes out of every subsequent (m) elements), that allows to drastically reduce the calculations by compressing the original matrix into a dense one. Methods like Wanda and SliceGPT prepare LLM weights to harness the power of the latter. However, as the weights are the most crucial assets of any model, it appears to be a good idea to modify the activations instead. This article introduces a novel dynamic sparsification algorithm called KurSparse , which proposes fine-grained n : m sparsity pattern, that affects only a portion of channels. This portion is selected with kurtosis threshold (zeta ). The proposed method shows significant reduction in MAC operations by 3.1x with average quality drop for LLaMA-3.1-8B model less than 2%.
{"title":"Novel Activation Sparsification Approach for Large Language Models","authors":"A. V. Demidovskij, E. O. Burmistrova, E. I. Zharikov","doi":"10.3103/S1060992X25601794","DOIUrl":"10.3103/S1060992X25601794","url":null,"abstract":"<p>Large Language Models (LLMs) require a lot of computational resources for inference. That is why the latest advancements in hardware design may offer many possibilities for speeding the LLM up. For example, TPU optimize calculations on data, transformed into the Coordinate sparse tensor format. The SparseCore processing unit that performs the calculations is heavily tailored for the extremely sparse embeddings of Deep Learning Recommendation Models. The other example of the enhanced hardware is Sparse Tensor Cores, that offer support for <span>(n:m)</span> data structure (<span>(n)</span> zeroes out of every subsequent <span>(m)</span> elements), that allows to drastically reduce the calculations by compressing the original matrix into a dense one. Methods like Wanda and SliceGPT prepare LLM weights to harness the power of the latter. However, as the weights are the most crucial assets of any model, it appears to be a good idea to modify the activations instead. This article introduces a novel dynamic sparsification algorithm called KurSparse , which proposes fine-grained <i>n</i> : <i>m</i> sparsity pattern, that affects only a portion of channels. This portion is selected with kurtosis threshold <span>(zeta )</span>. The proposed method shows significant reduction in MAC operations by 3.1x with average quality drop for LLaMA-3.1-8B model less than 2%.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S166 - S174"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601629
D. Larionov, N. Bazenkov, M. Kiselev
Continual learning is a key feature of biological neural systems, but artificial neural networks often suffer from catastrophic forgetting. Instead of backpropagation, biologically plausible learning algorithms may enable stable continual learning. This study proposes columnar-organized spiking neural networks (SNNs) with local learning rules for continual learning and catastrophic forgetting. Using CoLaNET (Columnar Layered Network), we show that its microcolumns adapt most efficiently to new tasks when they lack shared structure with prior learning. We demonstrate how CoLaNET hyperparameters govern the trade-off between retaining old knowledge (stability) and acquiring new information (plasticity). We evaluate CoLaNET on two benchmarks: Permuted MNIST (ten sequential pixel-permuted tasks) and a two-task MNIST/EMNIST setup. Our model learns ten sequential tasks effectively, maintaining 92% accuracy on each. It shows low forgetting, with only 4% performance degradation on the first task after training on nine subsequent tasks.
{"title":"Continual Learning with Columnar Spiking Neural Networks","authors":"D. Larionov, N. Bazenkov, M. Kiselev","doi":"10.3103/S1060992X25601629","DOIUrl":"10.3103/S1060992X25601629","url":null,"abstract":"<p>Continual learning is a key feature of biological neural systems, but artificial neural networks often suffer from catastrophic forgetting. Instead of backpropagation, biologically plausible learning algorithms may enable stable continual learning. This study proposes columnar-organized spiking neural networks (SNNs) with local learning rules for continual learning and catastrophic forgetting. Using CoLaNET (Columnar Layered Network), we show that its microcolumns adapt most efficiently to new tasks when they lack shared structure with prior learning. We demonstrate how CoLaNET hyperparameters govern the trade-off between retaining old knowledge (stability) and acquiring new information (plasticity). We evaluate CoLaNET on two benchmarks: Permuted MNIST (ten sequential pixel-permuted tasks) and a two-task MNIST/EMNIST setup. Our model learns ten sequential tasks effectively, maintaining 92% accuracy on each. It shows low forgetting, with only 4% performance degradation on the first task after training on nine subsequent tasks.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S58 - S71"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25700213
M. M. Leonov, A. A. Soroka, A. G. Trofimov
We propose a universal framework for neural network composition based on generative adaptation in a low-dimensional latent space. The method connects two pretrained deep neural networks by introducing an adapter trained with a Wasserstein GAN, enabling knowledge transfer across domains without modifying the original models. We facilitate efficient alignment between neural layers with different semantics and dimensionalities by encoding intermediate representations into a fixed-size latent space via autoencoders. Furthermore, we introduce an improved clustering-based algorithm to detect optimal connection points for both networks and reduce the computational cost. Experiments with models combining pretrained ResNet and DistilBERT networks for image classification and regression tasks demonstrate the validity and advantages of our approach in cross-modal tasks. The adapter achieves high performance with minimal overhead, enabling flexible reuse of pretrained models in new domains without modification of their weights.
{"title":"Transfer Learning Approach Based on Generative Adaptation of Low-Dimensional Latent Representation","authors":"M. M. Leonov, A. A. Soroka, A. G. Trofimov","doi":"10.3103/S1060992X25700213","DOIUrl":"10.3103/S1060992X25700213","url":null,"abstract":"<p>We propose a universal framework for neural network composition based on generative adaptation in a low-dimensional latent space. The method connects two pretrained deep neural networks by introducing an adapter trained with a Wasserstein GAN, enabling knowledge transfer across domains without modifying the original models. We facilitate efficient alignment between neural layers with different semantics and dimensionalities by encoding intermediate representations into a fixed-size latent space via autoencoders. Furthermore, we introduce an improved clustering-based algorithm to detect optimal connection points for both networks and reduce the computational cost. Experiments with models combining pretrained ResNet and DistilBERT networks for image classification and regression tasks demonstrate the validity and advantages of our approach in cross-modal tasks. The adapter achieves high performance with minimal overhead, enabling flexible reuse of pretrained models in new domains without modification of their weights.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S148 - S157"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25602052
G. Kupriyanov, I. Isaev, K. Laptinskiy, T. Dolenko, S. Dolenko
Kolmogorov–Arnold networks (KANs) are not only notable for their approximation capabilities but also for their potential in model interpretability. This work focuses on the study of the interpretative capabilities of KAN using the example of solving the luminescent spectroscopy inverse problem to create a multimodal carbon nanosensor for metal ions in water. The improved visual interpretation, which considers interrelation of the inputs and of the features processed by the model using color gradation, made it possible to identify the basic principles of KAN operation and collocate them with physical experimental observations. A modification of KAN with an architecturally integrated interpretation mechanism is proposed: λ-KAN. Mathematically proved interpretative capabilities of the λ‑KAN were confirmed on the inverse problem of luminescent spectroscopy. λ-KAN combines approximation capabilities at the level of neural network approaches with a transparent interpretation comparable to linear regression, which makes it a promising machine learning architecture for using in tasks requiring valid interpretation mechanisms. The code used in this work is posted on GitHub.
{"title":"Interpretation of Kolmogorov–Arnold Networks Using the Example of Solving the Inverse Problem of Photoluminescence Spectroscopy","authors":"G. Kupriyanov, I. Isaev, K. Laptinskiy, T. Dolenko, S. Dolenko","doi":"10.3103/S1060992X25602052","DOIUrl":"10.3103/S1060992X25602052","url":null,"abstract":"<p>Kolmogorov–Arnold networks (KANs) are not only notable for their approximation capabilities but also for their potential in model interpretability. This work focuses on the study of the interpretative capabilities of KAN using the example of solving the luminescent spectroscopy inverse problem to create a multimodal carbon nanosensor for metal ions in water. The improved visual interpretation, which considers interrelation of the inputs and of the features processed by the model using color gradation, made it possible to identify the basic principles of KAN operation and collocate them with physical experimental observations. A modification of KAN with an architecturally integrated interpretation mechanism is proposed: λ-KAN. Mathematically proved interpretative capabilities of the λ‑KAN were confirmed on the inverse problem of luminescent spectroscopy. λ-KAN combines approximation capabilities at the level of neural network approaches with a transparent interpretation comparable to linear regression, which makes it a promising machine learning architecture for using in tasks requiring valid interpretation mechanisms. The code used in this work is posted on GitHub.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S125 - S134"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601538
M. Saibodalov, M. Dashiev, I. Karandashev, N. Zheludkov, E. Kocheva
This paper considers the problem of congestion map prediction at the pre-routing stage of VLSI layout design of digital blocks by applying neural network models. Early prediction of congestion will allow the VLSI design engineer to modify floorplan, macro placement and input-output port placement to prevent interconnect routing issues at later stages. This, in turn, reduces the number of EDA tool runs and the overall circuit design runtime. In this work we propose the use of initial layout parameters as input channels in the U-Net architecture, which was not considered in other works. These parameters enhance the model’s ability to predict routing congestion with greater accuracy. As a result, we achieved a Pearson correlation with target maps of around 0.83, indicating strong model performance.
{"title":"Neural Networks-Based Routing Congestion Prediction Using Initial Layout Parameters","authors":"M. Saibodalov, M. Dashiev, I. Karandashev, N. Zheludkov, E. Kocheva","doi":"10.3103/S1060992X25601538","DOIUrl":"10.3103/S1060992X25601538","url":null,"abstract":"<p>This paper considers the problem of congestion map prediction at the pre-routing stage of VLSI layout design of digital blocks by applying neural network models. Early prediction of congestion will allow the VLSI design engineer to modify floorplan, macro placement and input-output port placement to prevent interconnect routing issues at later stages. This, in turn, reduces the number of EDA tool runs and the overall circuit design runtime. In this work we propose the use of initial layout parameters as input channels in the U-Net architecture, which was not considered in other works. These parameters enhance the model’s ability to predict routing congestion with greater accuracy. As a result, we achieved a Pearson correlation with target maps of around 0.83, indicating strong model performance.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S94 - S101"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601976
Yu. V. Tiumentsev, R. A. Tskhai
Currently, there are a large number of tasks to be carried out by aircraft. The complicating factor in this case is incomplete and inaccurate knowledge of the properties of the object under investigation and the conditions in which it operates. In particular, during the flight may arise various abnormal situations such as equipment failures and structural damage that need to be remedied by reconfiguring the control system or controls of the aircraft. The aircraft control system should be able to work effectively in these conditions by rapidly changing the parameters and/or structure of the control laws. Adaptive control techniques allow this requirement to be met. One of the approaches to the synthesis of adaptive laws for dynamic systems control is the application of machine learning methods. The article proposes to use for this purpose one of the variants of the adaptive critic method, namely the J-SNAC scheme. The algorithm implemented by this scheme is considered. A distinctive feature of the proposed J-SNAC variant is the use of sigma-pi network to implement the critic included in this scheme. Data from the computational experiment carried out in relation to the lateral motion of a maneuverable aircraft demonstrates the efficiency and prospects of using sigma-pi-net in J-SNAC.
{"title":"Use of Sigma-Pi-Neural Networks for Approximation of the Optimality Criterion in the J-SNAC Scheme for Aircraft Motion Control","authors":"Yu. V. Tiumentsev, R. A. Tskhai","doi":"10.3103/S1060992X25601976","DOIUrl":"10.3103/S1060992X25601976","url":null,"abstract":"<p>Currently, there are a large number of tasks to be carried out by aircraft. The complicating factor in this case is incomplete and inaccurate knowledge of the properties of the object under investigation and the conditions in which it operates. In particular, during the flight may arise various abnormal situations such as equipment failures and structural damage that need to be remedied by reconfiguring the control system or controls of the aircraft. The aircraft control system should be able to work effectively in these conditions by rapidly changing the parameters and/or structure of the control laws. Adaptive control techniques allow this requirement to be met. One of the approaches to the synthesis of adaptive laws for dynamic systems control is the application of machine learning methods. The article proposes to use for this purpose one of the variants of the adaptive critic method, namely the J-SNAC scheme. The algorithm implemented by this scheme is considered. A distinctive feature of the proposed J-SNAC variant is the use of sigma-pi network to implement the critic included in this scheme. Data from the computational experiment carried out in relation to the lateral motion of a maneuverable aircraft demonstrates the efficiency and prospects of using sigma-pi-net in J-SNAC.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S102 - S114"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25700201
A. Mushchina, I. Isaev, O. Sarmanova, T. Dolenko, S. Dolenko
Solving inverse problems in many areas of natural science, including spectroscopy, is often a challenge due to well-known properties of such problems, including nonlinearity, high input dimension, and being ill-posed or incorrect. One of the approaches that may deal with these problems is the use of machine learning methods, e.g. artificial neural networks. However, machine learning methods require a large amount of representative data, which is often hard and expensive to obtain in experiment. An alternative may be generation of additional data with generative neural network systems, e.g. variational autoencoders. In this study, we investigate feasibility of such approach, its merits and difficulties of its use at the example of optical absorption spectroscopy of multicomponent solutions of inorganic salts applied to determine the concentrations of the components of a solution.
{"title":"Addressing Data Scarcity in Spectroscopy with Variational Autoencoders","authors":"A. Mushchina, I. Isaev, O. Sarmanova, T. Dolenko, S. Dolenko","doi":"10.3103/S1060992X25700201","DOIUrl":"10.3103/S1060992X25700201","url":null,"abstract":"<p>Solving inverse problems in many areas of natural science, including spectroscopy, is often a challenge due to well-known properties of such problems, including nonlinearity, high input dimension, and being ill-posed or incorrect. One of the approaches that may deal with these problems is the use of machine learning methods, e.g. artificial neural networks. However, machine learning methods require a large amount of representative data, which is often hard and expensive to obtain in experiment. An alternative may be generation of additional data with generative neural network systems, e.g. variational autoencoders. In this study, we investigate feasibility of such approach, its merits and difficulties of its use at the example of optical absorption spectroscopy of multicomponent solutions of inorganic salts applied to determine the concentrations of the components of a solution.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S115 - S124"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601769
A. V. Samsonovich, D. L. Khabarov, N. A. Belyaev
This work presents a semantic map of intensions, understood here as relational connotations of speech acts. The result is a tool that consists of a dataset of intensions, its embedding in a semantic space, and a graph of relations among the intensions, plus a neural network trained to recognize given intensions in utterances. The tool can be used for creating formal representations of social relational aspects of speech acts in a dialogue. The method of constructing the map is based on using OpenAI ChatGPT, fine-tuning a large language model (LLM), linear algebra, and graph theory. The constructed model of semantic space of intensions extends beyond the popular settings for sentiment or tonality analysis of texts in natural language. As a general model applicable to virtually any paradigm of social interaction, it can be used for constructing specialized models of limited paradigms. Therefore, the developed tool can enable efficient integration of LLMs with cognitive architectures, such as eBICA, for building socially emotional conversational agents.
{"title":"Semantic Space Embedding of Speech Act Intensions","authors":"A. V. Samsonovich, D. L. Khabarov, N. A. Belyaev","doi":"10.3103/S1060992X25601769","DOIUrl":"10.3103/S1060992X25601769","url":null,"abstract":"<p>This work presents a semantic map of intensions, understood here as relational connotations of speech acts. The result is a tool that consists of a dataset of intensions, its embedding in a semantic space, and a graph of relations among the intensions, plus a neural network trained to recognize given intensions in utterances. The tool can be used for creating formal representations of social relational aspects of speech acts in a dialogue. The method of constructing the map is based on using OpenAI ChatGPT, fine-tuning a large language model (LLM), linear algebra, and graph theory. The constructed model of semantic space of intensions extends beyond the popular settings for sentiment or tonality analysis of texts in natural language. As a general model applicable to virtually any paradigm of social interaction, it can be used for constructing specialized models of limited paradigms. Therefore, the developed tool can enable efficient integration of LLMs with cognitive architectures, such as eBICA, for building socially emotional conversational agents.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S1 - S15"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.3103/S1060992X25601770
A. V. Demidovskij, A. I. Trutnev
Large Language Models (LLMs) are widely employed across a broad range of applications due to their versatility and state-of-the-art performance. However, as usage scenarios grow, there is a pressing demand for task-specific adaptation of LLMs through fine-tuning. While full fine-tuning (FT) remains the most preferred in terms of quality, its high memory and computation requirements limit its practical use, especially for LLMs. Parameter-efficient fine-tuning (PEFT) techniques, such as LoRA, mitigate this issue by updating a small subset of model parameters. However, it requires an extensive number of resources due to backpropagation. In contrast, zeroth-order (ZO) optimization methods, which approximate gradients using only forward passes, offer an attractive alternative for memory-constrained environments by eliminating the need for backpropagation, thus reducing memory overhead to inference-level footprints. Over the 2024–2025 year, several ZO techniques have been proposed, aiming to balance efficiency and performance. This paper introduces the comparative analysis of 12 zeroth-order optimization methods applied for the LLM fine-tuning task by memory utilization, quality, fine-tuning time, and convergence. According to the results, the best method in terms of memory reduction is ZO-SGD-Sign: 42.82% memory reduction; the best quality and fine-tuning time across zeroth-order methods compared to SGD is achieved with LoHO: 0.6% quality drop and 11.73% fine-tuning time increase, while no ZO method currently matches the Adam and AdamW convergence efficiency.
{"title":"Performance Study of Modern Zeroth-Order Optimization Methods for LLM Fine-Tuning","authors":"A. V. Demidovskij, A. I. Trutnev","doi":"10.3103/S1060992X25601770","DOIUrl":"10.3103/S1060992X25601770","url":null,"abstract":"<p>Large Language Models (LLMs) are widely employed across a broad range of applications due to their versatility and state-of-the-art performance. However, as usage scenarios grow, there is a pressing demand for task-specific adaptation of LLMs through fine-tuning. While full fine-tuning (FT) remains the most preferred in terms of quality, its high memory and computation requirements limit its practical use, especially for LLMs. Parameter-efficient fine-tuning (PEFT) techniques, such as LoRA, mitigate this issue by updating a small subset of model parameters. However, it requires an extensive number of resources due to backpropagation. In contrast, zeroth-order (ZO) optimization methods, which approximate gradients using only forward passes, offer an attractive alternative for memory-constrained environments by eliminating the need for backpropagation, thus reducing memory overhead to inference-level footprints. Over the 2024–2025 year, several ZO techniques have been proposed, aiming to balance efficiency and performance. This paper introduces the comparative analysis of 12 zeroth-order optimization methods applied for the LLM fine-tuning task by memory utilization, quality, fine-tuning time, and convergence. According to the results, the best method in terms of memory reduction is ZO-SGD-Sign: 42.82% memory reduction; the best quality and fine-tuning time across zeroth-order methods compared to SGD is achieved with LoHO: 0.6% quality drop and 11.73% fine-tuning time increase, while no ZO method currently matches the Adam and AdamW convergence efficiency.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 1","pages":"S16 - S29"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}