首页 > 最新文献

Neural Networks最新文献

英文 中文
A cortico-cerebellar neural model for task control under incomplete instructions 不完全指令下任务控制的皮质-小脑神经模型。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1016/j.neunet.2026.108648
Lanyun Cui , Ying Yu , Qingyun Wang , Guanrong Chen
Cerebellar-inspired motor control systems have been widely explored in robotics to achieve biologically plausible movement generation. However, most existing models rely heavily on high-dimensional instruction inputs during training, diverging from the input-efficient control observed in biological systems. In humans, effective motor learning often based on sparse or incomplete external feedback. It is possibly attributed to the interaction between multiple brain regions, especially the cortex and the cerebellum. In this study, we present a hierarchical cortico-cerebellar neural network model that investigates the neural mechanisms enabling motor control under incomplete or low-dimensional instructions. The evaluation results, measured by two complementary levels of evaluation metrics, demonstrate that the cortico-cerebellar model reduces dependency on external instruction without compromising trajectory smoothness. The model features a division of roles: the cortical network handles high-level action selection, while the cerebellar network executes motor commands by torque control, directly operating on a planar arm. Additionally, the cortex exhibits enhanced exploration indirectly driven by the stochastic characteristics of cerebellar torque control. Our results show that cortico-cerebellar coordination can facilitate robust and flexible control even with sparse instruction signals, suggesting a potential mechanism by which biological systems achieve efficient behavior under informational constraints.
以小脑为灵感的运动控制系统在机器人技术中得到了广泛的探索,以实现生物学上合理的运动生成。然而,大多数现有模型在训练过程中严重依赖高维指令输入,偏离了在生物系统中观察到的输入效率控制。在人类中,有效的运动学习通常基于稀疏或不完整的外部反馈。这可能归因于大脑多个区域,特别是皮层和小脑之间的相互作用。在这项研究中,我们提出了一个分层皮质-小脑神经网络模型,该模型研究了在不完整或低维指令下实现运动控制的神经机制。通过两个互补级别的评估指标测量的评估结果表明,皮质-小脑模型在不影响轨迹平滑的情况下减少了对外部指令的依赖。该模型具有角色划分的特点:皮质网络处理高级动作选择,而小脑网络通过扭矩控制执行运动命令,直接在平面手臂上操作。此外,小脑转矩控制的随机特性间接驱动了皮层探索能力的增强。我们的研究结果表明,即使在稀疏的指令信号下,皮质-小脑协调也可以促进鲁棒和灵活的控制,这表明生物系统在信息约束下实现有效行为的潜在机制。
{"title":"A cortico-cerebellar neural model for task control under incomplete instructions","authors":"Lanyun Cui ,&nbsp;Ying Yu ,&nbsp;Qingyun Wang ,&nbsp;Guanrong Chen","doi":"10.1016/j.neunet.2026.108648","DOIUrl":"10.1016/j.neunet.2026.108648","url":null,"abstract":"<div><div>Cerebellar-inspired motor control systems have been widely explored in robotics to achieve biologically plausible movement generation. However, most existing models rely heavily on high-dimensional instruction inputs during training, diverging from the input-efficient control observed in biological systems. In humans, effective motor learning often based on sparse or incomplete external feedback. It is possibly attributed to the interaction between multiple brain regions, especially the cortex and the cerebellum. In this study, we present a hierarchical cortico-cerebellar neural network model that investigates the neural mechanisms enabling motor control under incomplete or low-dimensional instructions. The evaluation results, measured by two complementary levels of evaluation metrics, demonstrate that the cortico-cerebellar model reduces dependency on external instruction without compromising trajectory smoothness. The model features a division of roles: the cortical network handles high-level action selection, while the cerebellar network executes motor commands by torque control, directly operating on a planar arm. Additionally, the cortex exhibits enhanced exploration indirectly driven by the stochastic characteristics of cerebellar torque control. Our results show that cortico-cerebellar coordination can facilitate robust and flexible control even with sparse instruction signals, suggesting a potential mechanism by which biological systems achieve efficient behavior under informational constraints.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108648"},"PeriodicalIF":6.3,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement learning via conservative agent for environments with random delays 基于保守代理的随机延迟环境强化学习。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1016/j.neunet.2026.108645
Jongsoo Lee , Jangwon Kim , Jiseok Jeong , Soohee Han
Real-world reinforcement learning applications are often subject to unavoidable delayed feedback from the environment. Under such conditions, the standard state representation may no longer induce Markovian dynamics unless additional information is incorporated at decision time, which introduces significant challenges for both learning and control. While numerous delay-compensation methods have been proposed for environments with constant delays, those with random delays remain largely unexplored due to their inherent variability and unpredictability. In this study, we propose a robust agent for decision-making under bounded random delays, termed the conservative agent. This agent reformulates the random-delay environment into a constant-delay surrogate, which enables any constant-delay method to be directly extended to random-delay environments without modifying their algorithmic structure. Apart from a maximum delay, the conservative agent does not require prior knowledge of the underlying delay distribution and maintains performance invariant to changes in the delay distribution as long as the maximum delay remains unchanged. We present a theoretical analysis of conservative agent and evaluate its performance on diverse continuous control tasks from the MuJoCo benchmarks. Empirical results demonstrate that it significantly outperforms existing baselines in terms of both asymptotic performance and sample efficiency.
现实世界的强化学习应用经常受到来自环境的不可避免的延迟反馈的影响。在这种情况下,除非在决策时加入额外的信息,否则标准状态表示可能不再诱导马尔可夫动态,这对学习和控制都带来了重大挑战。虽然许多延迟补偿方法已被提出用于具有恒定延迟的环境,但那些具有随机延迟的环境由于其固有的可变性和不可预测性而在很大程度上仍未被探索。在本研究中,我们提出了一种鲁棒的决策代理,称为保守代理。该代理将随机延迟环境重新表述为恒定延迟代理,这使得任何恒定延迟方法都可以直接扩展到随机延迟环境,而无需修改其算法结构。除了最大延迟之外,保守代理不需要预先知道潜在的延迟分布,只要最大延迟保持不变,就可以保持延迟分布变化的性能不变。我们提出了保守代理的理论分析,并从MuJoCo基准评估了其在各种连续控制任务上的性能。实证结果表明,在渐近性能和样本效率方面,它明显优于现有的基线。
{"title":"Reinforcement learning via conservative agent for environments with random delays","authors":"Jongsoo Lee ,&nbsp;Jangwon Kim ,&nbsp;Jiseok Jeong ,&nbsp;Soohee Han","doi":"10.1016/j.neunet.2026.108645","DOIUrl":"10.1016/j.neunet.2026.108645","url":null,"abstract":"<div><div>Real-world reinforcement learning applications are often subject to unavoidable delayed feedback from the environment. Under such conditions, the standard state representation may no longer induce Markovian dynamics unless additional information is incorporated at decision time, which introduces significant challenges for both learning and control. While numerous delay-compensation methods have been proposed for environments with constant delays, those with random delays remain largely unexplored due to their inherent variability and unpredictability. In this study, we propose a robust agent for decision-making under bounded random delays, termed the <em>conservative agent</em>. This agent reformulates the random-delay environment into a constant-delay surrogate, which enables any constant-delay method to be directly extended to random-delay environments without modifying their algorithmic structure. Apart from a maximum delay, the conservative agent does not require prior knowledge of the underlying delay distribution and maintains performance invariant to changes in the delay distribution as long as the maximum delay remains unchanged. We present a theoretical analysis of conservative agent and evaluate its performance on diverse continuous control tasks from the MuJoCo benchmarks. Empirical results demonstrate that it significantly outperforms existing baselines in terms of both asymptotic performance and sample efficiency.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108645"},"PeriodicalIF":6.3,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved exponential stability of time delay neural networks via separated-matrix-based integral inequalities 利用分离矩阵积分不等式改进时滞神经网络的指数稳定性。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1016/j.neunet.2026.108643
Yuanyuan Zhang, Xinzuo Ma, Seakweng Vong
This paper studies the exponential stability of neural networks with time delays. A separated-matrix-based integral inequality is proposed to incorporate more delay information. It not only reflects the information of each component in the state-related vector but also considers the cross terms among the three components, significantly reducing the inherent conservativeness of traditional methods. By constructing a Lyapunov-Krasovskii functional with separation-matrix-based integral and a linear matrix inequality framework via quadratic negative definiteness, less conservative stability criteria are established. Two numerical examples demonstrate the method superiority in maximum allowable delay bounds and computational efficiency compared to existing approaches.
研究了具有时滞的神经网络的指数稳定性。提出了一种基于分离矩阵的积分不等式来包含更多的延迟信息。它不仅反映了状态相关向量中每个分量的信息,而且考虑了三个分量之间的交叉项,大大降低了传统方法固有的保守性。通过构造基于分离矩阵积分的Lyapunov-Krasovskii泛函和基于二次负确定性的线性矩阵不等式框架,建立了低保守性稳定性判据。两个算例表明,该方法在最大允许延迟界和计算效率方面优于现有方法。
{"title":"Improved exponential stability of time delay neural networks via separated-matrix-based integral inequalities","authors":"Yuanyuan Zhang,&nbsp;Xinzuo Ma,&nbsp;Seakweng Vong","doi":"10.1016/j.neunet.2026.108643","DOIUrl":"10.1016/j.neunet.2026.108643","url":null,"abstract":"<div><div>This paper studies the exponential stability of neural networks with time delays. A separated-matrix-based integral inequality is proposed to incorporate more delay information. It not only reflects the information of each component in the state-related vector but also considers the cross terms among the three components, significantly reducing the inherent conservativeness of traditional methods. By constructing a Lyapunov-Krasovskii functional with separation-matrix-based integral and a linear matrix inequality framework via quadratic negative definiteness, less conservative stability criteria are established. Two numerical examples demonstrate the method superiority in maximum allowable delay bounds and computational efficiency compared to existing approaches.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108643"},"PeriodicalIF":6.3,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HpMiX: A Disease ceRNA biomarker prediction framework driven by graph topology-constrained Mixup and hypergraph residual enhancement HpMiX:一个基于图拓扑约束混合和超图残差增强的疾病ceRNA生物标志物预测框架。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1016/j.neunet.2026.108662
Xinfei Wang , Lan Huang , Yan Wang , Renchu Guan , Zhuhong You , Fengfeng Zhou , Yuqing Li , Yuan Fu
The competing endogenous RNA (ceRNA) regulatory network (CENA) plays a critical role in elucidating the molecular mechanisms of diseases. However, existing computational methods primarily focus on modeling local topological structures of biological networks, struggling to capture high-order regulatory relationships and global topological structures, thus limiting a deeper understanding of complex regulatory interactions.
To address this, we propose HpMiX, a Graph Topology-Constrained Mixup (GTCM) and hypergraph residual enhancement learning framework for the discovery of disease-related ceRNA biomarkers. This framework first constructs a CENA network encompassing multi-molecule associations, including miRNA, lncRNA, circRNA, and mRNA, and models higher-order regulatory relationships using K-hop hyperedges. Biologically meaningful initial features are then extracted from CENA via a multi-structure hypergraph weighted random walk method (MHWRW), integrating prior biological knowledge and regulatory information. Subsequently, graph topology-constrained Mixup and multi-head attention, combined with a residual hypergraph neural network, are employed to generate robust node embeddings with both local and global context, enabling the identification of potential disease-ceRNA biomarkers.
Prediction results across multiple disease biomarkers demonstrate that HpMiX significantly outperforms state-of-the-art methods, validating its effectiveness in biological regulatory network representation learning. Case studies further confirm that the framework can effectively identify differentially expressed ceRNAs in diseases, highlighting its potential as a tool for pre-screening high-probability disease biomarkers.
竞争内源性RNA (ceRNA)调控网络(CENA)在阐明疾病的分子机制中起着至关重要的作用。然而,现有的计算方法主要侧重于对生物网络的局部拓扑结构进行建模,难以捕捉高阶调控关系和全局拓扑结构,从而限制了对复杂调控相互作用的更深入理解。为了解决这个问题,我们提出了HpMiX,一个图拓扑约束混合(GTCM)和超图残差增强学习框架,用于发现与疾病相关的ceRNA生物标志物。该框架首先构建了一个包含多分子关联的CENA网络,包括miRNA、lncRNA、circRNA和mRNA,并使用K-hop超边缘模拟高阶调控关系。然后通过多结构超图加权随机漫步方法(MHWRW)从CENA中提取具有生物学意义的初始特征,整合先前的生物学知识和调控信息。随后,图拓扑约束的Mixup和多头注意,结合残差超图神经网络,用于生成具有局部和全局上下文的鲁棒节点嵌入,从而能够识别潜在的疾病- cerna生物标志物。多种疾病生物标志物的预测结果表明,HpMiX显著优于最先进的方法,验证了其在生物调节网络表示学习中的有效性。案例研究进一步证实,该框架可以有效识别疾病中差异表达的cerna,突出了其作为预筛选高概率疾病生物标志物工具的潜力。
{"title":"HpMiX: A Disease ceRNA biomarker prediction framework driven by graph topology-constrained Mixup and hypergraph residual enhancement","authors":"Xinfei Wang ,&nbsp;Lan Huang ,&nbsp;Yan Wang ,&nbsp;Renchu Guan ,&nbsp;Zhuhong You ,&nbsp;Fengfeng Zhou ,&nbsp;Yuqing Li ,&nbsp;Yuan Fu","doi":"10.1016/j.neunet.2026.108662","DOIUrl":"10.1016/j.neunet.2026.108662","url":null,"abstract":"<div><div>The competing endogenous RNA (ceRNA) regulatory network (CENA) plays a critical role in elucidating the molecular mechanisms of diseases. However, existing computational methods primarily focus on modeling local topological structures of biological networks, struggling to capture high-order regulatory relationships and global topological structures, thus limiting a deeper understanding of complex regulatory interactions.</div><div>To address this, we propose HpMiX, a Graph Topology-Constrained Mixup (GTCM) and hypergraph residual enhancement learning framework for the discovery of disease-related ceRNA biomarkers. This framework first constructs a CENA network encompassing multi-molecule associations, including miRNA, lncRNA, circRNA, and mRNA, and models higher-order regulatory relationships using K-hop hyperedges. Biologically meaningful initial features are then extracted from CENA via a multi-structure hypergraph weighted random walk method (MHWRW), integrating prior biological knowledge and regulatory information. Subsequently, graph topology-constrained Mixup and multi-head attention, combined with a residual hypergraph neural network, are employed to generate robust node embeddings with both local and global context, enabling the identification of potential disease-ceRNA biomarkers.</div><div>Prediction results across multiple disease biomarkers demonstrate that HpMiX significantly outperforms state-of-the-art methods, validating its effectiveness in biological regulatory network representation learning. Case studies further confirm that the framework can effectively identify differentially expressed ceRNAs in diseases, highlighting its potential as a tool for pre-screening high-probability disease biomarkers.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108662"},"PeriodicalIF":6.3,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distilling structural knowledge from CNNs to vision transformers for data-efficient visual recognition 从cnn中提取结构知识到视觉转换器,用于数据高效的视觉识别。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1016/j.neunet.2026.108601
Dingyao Chen , Xiao Teng , Xingyu Shen , Xun Yang , Long Lan
Knowledge distillation (KD) is an effective strategy to transfer learned representations from a pre-trained teacher model to a smaller student model. Current methods for knowledge transfer from convolutional neural networks (CNNs) to vision transformers (ViTs) mainly align output logits. However, such approaches often overlook the rich semantic structures encoded in CNN features, thereby restricting ViTs from effectively inheriting the inductive biases inherent in convolutional architectures. To this end, this paper proposes a Feature-based CNN-to-ViT Structural Knowledge Distillation framework, dubbed FSKD, which combines the semantic structural knowledge embedded in CNN (teacher) features with the strength of ViT (student) in capturing long-range dependencies. Specifically, this framework includes a feature alignment module to bridge the representational gap between CNN and ViT features, and it incorporates a global feature alignment loss. Additionally, we develop patch-wise and attention-wise distillation losses to transfer inter-patch similarity and attention distribution, facilitating semantic structural knowledge transfer from CNNs to ViTs. Experimental results demonstrate that the proposed method considerably enhances ViT performance in visual recognition tasks, particularly under scenarios with limited data. Code is available at Github.
知识蒸馏(Knowledge distillation, KD)是一种将学习到的表征从预训练的教师模型转移到较小的学生模型的有效策略。目前从卷积神经网络(cnn)到视觉变压器(vit)的知识转移方法主要是对齐输出逻辑。然而,这些方法往往忽略了CNN特征中编码的丰富语义结构,从而限制了vit有效地继承卷积架构中固有的归纳偏差。为此,本文提出了一种基于特征的CNN- To -ViT结构知识蒸馏框架(FSKD),该框架将CNN(教师)特征中嵌入的语义结构知识与ViT(学生)在捕获远程依赖关系方面的优势结合起来。具体来说,该框架包括一个特征对齐模块,以弥合CNN和ViT特征之间的表征差距,并且它包含了一个全局特征对齐损失。此外,我们开发了patch-wise和attention-wise蒸馏损失来转移patch间的相似性和注意力分布,促进语义结构知识从cnn到vit的转移。实验结果表明,该方法显著提高了ViT在视觉识别任务中的性能,特别是在数据有限的情况下。代码可在Github上获得。
{"title":"Distilling structural knowledge from CNNs to vision transformers for data-efficient visual recognition","authors":"Dingyao Chen ,&nbsp;Xiao Teng ,&nbsp;Xingyu Shen ,&nbsp;Xun Yang ,&nbsp;Long Lan","doi":"10.1016/j.neunet.2026.108601","DOIUrl":"10.1016/j.neunet.2026.108601","url":null,"abstract":"<div><div>Knowledge distillation (KD) is an effective strategy to transfer learned representations from a pre-trained <span>teacher</span> model to a smaller <span>student</span> model. Current methods for knowledge transfer from convolutional neural networks (CNNs) to vision transformers (ViTs) mainly align output logits. However, such approaches often overlook the rich semantic structures encoded in CNN features, thereby restricting ViTs from effectively inheriting the inductive biases inherent in convolutional architectures. To this end, this paper proposes a <strong>F</strong>eature-based CNN-to-ViT <strong>S</strong>tructural <strong>K</strong>nowledge <strong>D</strong>istillation framework, dubbed <strong>FSKD</strong>, which combines the semantic structural knowledge embedded in CNN (<span>teacher</span>) features with the strength of ViT (<span>student</span>) in capturing long-range dependencies. Specifically, this framework includes a feature alignment module to bridge the representational gap between CNN and ViT features, and it incorporates a global feature alignment loss. Additionally, we develop patch-wise and attention-wise distillation losses to transfer inter-patch similarity and attention distribution, facilitating semantic structural knowledge transfer from CNNs to ViTs. Experimental results demonstrate that the proposed method considerably enhances ViT performance in visual recognition tasks, particularly under scenarios with limited data. Code is available at <span><span>Github</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108601"},"PeriodicalIF":6.3,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Matrix-Transformation based Low-Rank Adaptation (MTLoRA): A brain-Inspired method for parameter-Efficient fine-Tuning 基于矩阵变换的低秩自适应(MTLoRA):一种基于大脑的参数高效微调方法
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-25 DOI: 10.1016/j.neunet.2026.108642
Yao Liang , Yuwei Wang , Yang Li , Yi Zeng
Parameter-efficient fine-tuning (PEFT) reduces the compute and memory demands of adapting large language models, yet standard low-rank adapters (e.g., LoRA) can lag full fine-tuning in performance and stability because they restrict updates to a fixed rank-r subspace. We propose Matrix-Transformation based Low-Rank Adaptation (MTLoRA), a brain-inspired extension that inserts a learnable r × r transformation T into the low-rank update (ΔW=BTA). By endowing the subspace with data-adapted geometry (e.g., rotations, scalings, and shears), MTLoRA reparameterizes the rank-r hypothesis class, improving its conditioning and inductive bias at negligible O(r2) overhead, and recovers LoRA when T=Ir. We instantiate four structures for T—SHIM (T=C), ICFM (T=CC), CTCM (T=CD), and DTSM (T=C+D)—providing complementary inductive biases (change of basis, PSD metric, staged mixing, dual superposition). An optimization analysis shows that T acts as a learned preconditioner within the subspace, yielding spectral-norm step-size bounds and operator-norm variance contraction that stabilize training. Empirically, MTLoRA delivers consistent gains while preserving PEFT efficiency: on GLUE (General Language Understanding Evaluation) with DeBERTaV3-base, MTLoRA improves the average over LoRA by (+2.0) points (86.9 → 88.9) and matches AdaLoRA (88.9) without any pruning schedule; on natural language generation with GPT-2 Medium, it raises BLEU on DART by (+0.95) and on WebNLG by (+0.56); and in multimodal instruction tuning with LLaVA-1.5-7B, DTSM attains the best average (69.91) with  ∼ 4.7% trainable parameters, outperforming full fine-tuning and strong PEFT baselines. These results indicate that learning geometry inside the low-rank subspace improves both effectiveness and stability, making MTLoRA a practical, plug-compatible alternative to LoRA for large-model fine-tuning.
参数有效的微调(PEFT)减少了适应大型语言模型的计算和内存需求,但是标准的低级别适配器(例如,LoRA)可能会在性能和稳定性方面滞后于完全的微调,因为它们将更新限制在固定的rank-r子空间。我们提出了基于矩阵变换的低秩自适应(MTLoRA),这是一种基于大脑的扩展,它将可学习r × r变换T插入到低秩更新(ΔW=BTA)中。通过赋予子空间数据适应几何(例如,旋转、缩放和剪切),MTLoRA重新参数化了rank-r假设类,在可忽略的O(r2)开销下改善了其条件和归纳偏差,并在T=Ir时恢复了LoRA。我们实例化了T - shim (T=C)、ICFM (T=CC)、CTCM (T=CD)和DTSM (T=C+D)的四种结构,提供了互补的归纳偏置(基的变化、PSD度量、阶段混合、对偶叠加)。优化分析表明,T作为子空间内的学习预条件,产生谱范数步长边界和算子范数方差收缩,稳定训练。从经验上看,MTLoRA在保持PEFT效率的同时提供了一致的收益:在带有debertav3碱基的GLUE(通用语言理解评估)上,MTLoRA在没有任何修剪计划的情况下,将LoRA的平均值提高了(+2.0)分(86.9 → 88.9),与AdaLoRA(88.9)相匹配;在使用GPT-2 Medium生成自然语言时,DART和WebNLG的BLEU分别提高了+0.95和+0.56;在使用llva -1.5- 7b进行多模态指令调谐时,DTSM在 ~ 4.7%可训练参数下达到最佳平均值(69.91),优于完全微调和强PEFT基线。这些结果表明,在低秩子空间内学习几何结构提高了有效性和稳定性,使MTLoRA成为一种实用的、可插入兼容的大模型微调替代LoRA。
{"title":"Matrix-Transformation based Low-Rank Adaptation (MTLoRA): A brain-Inspired method for parameter-Efficient fine-Tuning","authors":"Yao Liang ,&nbsp;Yuwei Wang ,&nbsp;Yang Li ,&nbsp;Yi Zeng","doi":"10.1016/j.neunet.2026.108642","DOIUrl":"10.1016/j.neunet.2026.108642","url":null,"abstract":"<div><div>Parameter-efficient fine-tuning (PEFT) reduces the compute and memory demands of adapting large language models, yet standard low-rank adapters (e.g., LoRA) can lag full fine-tuning in performance and stability because they restrict updates to a fixed rank-<em>r</em> subspace. We propose Matrix-Transformation based Low-Rank Adaptation (MTLoRA), a brain-inspired extension that inserts a learnable <em>r</em> × <em>r</em> transformation <em>T</em> into the low-rank update (<span><math><mrow><mstyle><mi>Δ</mi></mstyle><mi>W</mi><mo>=</mo><mi>B</mi><mi>T</mi><mi>A</mi></mrow></math></span>). By endowing the subspace with data-adapted geometry (e.g., rotations, scalings, and shears), MTLoRA reparameterizes the rank-<em>r</em> hypothesis class, improving its conditioning and inductive bias at negligible <em>O</em>(<em>r</em><sup>2</sup>) overhead, and recovers LoRA when <span><math><mrow><mi>T</mi><mo>=</mo><msub><mi>I</mi><mi>r</mi></msub></mrow></math></span>. We instantiate four structures for <em>T</em>—SHIM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><mo>)</mo></mrow></math></span>, ICFM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><msup><mi>C</mi><mi>⊤</mi></msup><mo>)</mo></mrow></math></span>, CTCM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><mi>D</mi><mo>)</mo></mrow></math></span>, and DTSM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><mo>+</mo><mi>D</mi><mo>)</mo></mrow></math></span>—providing complementary inductive biases (change of basis, PSD metric, staged mixing, dual superposition). An optimization analysis shows that <em>T</em> acts as a learned preconditioner within the subspace, yielding spectral-norm step-size bounds and operator-norm variance contraction that stabilize training. Empirically, MTLoRA delivers consistent gains while preserving PEFT efficiency: on GLUE (General Language Understanding Evaluation) with DeBERTaV3-base, MTLoRA improves the average over LoRA by (+2.0) points (86.9 → 88.9) and matches AdaLoRA (88.9) without any pruning schedule; on natural language generation with GPT-2 Medium, it raises BLEU on DART by (+0.95) and on WebNLG by (+0.56); and in multimodal instruction tuning with LLaVA-1.5-7B, DTSM attains the best average (69.91) with  ∼ 4.7% trainable parameters, outperforming full fine-tuning and strong PEFT baselines. These results indicate that learning geometry inside the low-rank subspace improves both effectiveness and stability, making MTLoRA a practical, plug-compatible alternative to LoRA for large-model fine-tuning.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108642"},"PeriodicalIF":6.3,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking autoregressive conditional diffusion models for turbulent flow simulation 湍流模拟的基准自回归条件扩散模型。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.neunet.2026.108641
Georg Kohl, Li-Wei Chen, Nils Thuerey
Simulating turbulent flows is crucial for a wide range of applications, and machine learning-based solvers are gaining increasing relevance. However, achieving temporal stability when generalizing to longer rollout horizons remains a persistent challenge for learned PDE solvers. In this work, we analyze if fully data-driven fluid solvers that utilize an autoregressive rollout based on conditional diffusion models are a viable option to address this challenge. We investigate accuracy, posterior sampling, spectral behavior, and temporal stability, while requiring that methods generalize to flow parameters beyond the training regime. To quantitatively and qualitatively benchmark the performance of various flow prediction approaches, three challenging 2D scenarios including incompressible and transonic flows, as well as isotropic turbulence are employed. We find that even simple diffusion-based approaches can outperform multiple established flow prediction methods in terms of accuracy and temporal stability, while being on par with state-of-the-art stabilization techniques like unrolling at training time. Such traditional architectures are superior in terms of inference speed, however, the probabilistic nature of diffusion approaches allows for inferring multiple predictions that align with the statistics of the underlying physics. Overall, our benchmark contains three carefully chosen data sets that are suitable for probabilistic evaluation alongside various established flow prediction architectures.
模拟湍流对于广泛的应用是至关重要的,基于机器学习的求解器正在获得越来越多的相关性。然而,当泛化到更长的推出范围时,实现时间稳定性仍然是学习PDE求解器的一个持续挑战。在这项工作中,我们分析了利用基于条件扩散模型的自回归推出的完全数据驱动的流体求解器是否是解决这一挑战的可行选择。我们研究了准确性、后验采样、光谱行为和时间稳定性,同时要求方法推广到超出训练范围的流量参数。为了定量和定性地对各种流动预测方法的性能进行基准测试,采用了三种具有挑战性的二维场景,包括不可压缩和跨音速流动,以及各向同性湍流。我们发现,即使是简单的基于扩散的方法,在准确性和时间稳定性方面也可以优于多种已建立的流量预测方法,同时与最先进的稳定技术(如在训练时展开)相当。这样的传统架构在推理速度方面是优越的,然而,扩散方法的概率性质允许推断与底层物理统计相一致的多个预测。总的来说,我们的基准包含三个精心挑选的数据集,它们适合于概率评估以及各种已建立的流量预测架构。
{"title":"Benchmarking autoregressive conditional diffusion models for turbulent flow simulation","authors":"Georg Kohl,&nbsp;Li-Wei Chen,&nbsp;Nils Thuerey","doi":"10.1016/j.neunet.2026.108641","DOIUrl":"10.1016/j.neunet.2026.108641","url":null,"abstract":"<div><div>Simulating turbulent flows is crucial for a wide range of applications, and machine learning-based solvers are gaining increasing relevance. However, achieving temporal stability when generalizing to longer rollout horizons remains a persistent challenge for learned PDE solvers. In this work, we analyze if fully data-driven fluid solvers that utilize an autoregressive rollout based on conditional diffusion models are a viable option to address this challenge. We investigate accuracy, posterior sampling, spectral behavior, and temporal stability, while requiring that methods generalize to flow parameters beyond the training regime. To quantitatively and qualitatively benchmark the performance of various flow prediction approaches, three challenging 2D scenarios including incompressible and transonic flows, as well as isotropic turbulence are employed. We find that even simple diffusion-based approaches can outperform multiple established flow prediction methods in terms of accuracy and temporal stability, while being on par with state-of-the-art stabilization techniques like unrolling at training time. Such traditional architectures are superior in terms of inference speed, however, the probabilistic nature of diffusion approaches allows for inferring multiple predictions that align with the statistics of the underlying physics. Overall, our benchmark contains three carefully chosen data sets that are suitable for probabilistic evaluation alongside various established flow prediction architectures.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108641"},"PeriodicalIF":6.3,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146151141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UniSymNet: A Unified Symbolic Network with Sparse Encoding and Bi-level Optimization 具有稀疏编码和双级优化的统一符号网络。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.neunet.2026.108615
Xinxin Li , Juan Zhang , Da Li , Xingyu Liu , Jin Xu , Junping Yin
Automatically discovering mathematical expressions is a challenging issue to precisely depict natural phenomena, in which Symbolic Regression (SR) is one of the most widely utilized techniques. Mainstream SR algorithms target on searching for the optimal symbolic tree, but the increasing complexity of the tree structure often limits their performance. Inspired by neural networks, symbolic networks have emerged as a promising new paradigm. However, existing symbolic networks still face certain challenges: binary nonlinear operators { × , ÷} cannot be naturally extended to multivariate, training with fixed architecture often leads to higher complexity and overfitting. In this work, we propose a Unified Symbolic Network that unifies nonlinear binary operators into nested unary operators, thereby transforming them into multivariate operators. The capability of the proposed UniSymNet is deduced from rigorous theoretical proof, resulting in lower complexity and stronger expressivity. Unlike the conventional neural network training, we design a bi-level optimization framework: the outer level pre-trains a Transformer with sparse label encoding scheme to guide UniSymNet structure selection, while the inner level employs objective-specific strategies to optimize network parameters. This allows for flexible adaptation of UniSymNet structures to different data, leading to reduced expression complexity. The UniSymNet is evaluated on low-dimensional Standard Benchmarks and high-dimensional SRBench, and shows excellent symbolic solution rate, high fitting accuracy, and relatively low expression complexity.
自动发现数学表达式是精确描述自然现象的一个具有挑战性的问题,其中符号回归(SR)是应用最广泛的技术之一。主流SR算法的目标是寻找最优的符号树,但树结构的复杂性往往限制了它们的性能。受神经网络的启发,符号网络已经成为一种很有前途的新范式。然而,现有的符号网络仍然面临着一定的挑战:二元非线性算子{ × ,÷}不能自然地扩展到多元,固定架构的训练往往会导致更高的复杂性和过拟合。在这项工作中,我们提出了一个统一的符号网络,将非线性二进制算子统一为嵌套的一元算子,从而将它们转换为多元算子。通过严格的理论论证,推导出了该网络的性能,具有较低的复杂度和较强的表达能力。与传统的神经网络训练不同,我们设计了一个双层优化框架:外层预训练一个具有稀疏标签编码方案的Transformer来指导UniSymNet的结构选择,而内层采用特定目标的策略来优化网络参数。这允许灵活地调整UniSymNet结构以适应不同的数据,从而降低表达式的复杂性。在低维的Standard benchmark和高维的SRBench上对UniSymNet进行了评估,结果表明该算法具有优异的符号解算率、较高的拟合精度和相对较低的表达式复杂度。
{"title":"UniSymNet: A Unified Symbolic Network with Sparse Encoding and Bi-level Optimization","authors":"Xinxin Li ,&nbsp;Juan Zhang ,&nbsp;Da Li ,&nbsp;Xingyu Liu ,&nbsp;Jin Xu ,&nbsp;Junping Yin","doi":"10.1016/j.neunet.2026.108615","DOIUrl":"10.1016/j.neunet.2026.108615","url":null,"abstract":"<div><div>Automatically discovering mathematical expressions is a challenging issue to precisely depict natural phenomena, in which Symbolic Regression (SR) is one of the most widely utilized techniques. Mainstream SR algorithms target on searching for the optimal symbolic tree, but the increasing complexity of the tree structure often limits their performance. Inspired by neural networks, symbolic networks have emerged as a promising new paradigm. However, existing symbolic networks still face certain challenges: binary nonlinear operators { × , ÷} cannot be naturally extended to multivariate, training with fixed architecture often leads to higher complexity and overfitting. In this work, we propose a <strong>Uni</strong>fied <strong>Sym</strong>bolic <strong>Net</strong>work that unifies nonlinear binary operators into nested unary operators, thereby transforming them into multivariate operators. The capability of the proposed UniSymNet is deduced from rigorous theoretical proof, resulting in lower complexity and stronger expressivity. Unlike the conventional neural network training, we design a bi-level optimization framework: the outer level pre-trains a Transformer with sparse label encoding scheme to guide UniSymNet structure selection, while the inner level employs objective-specific strategies to optimize network parameters. This allows for flexible adaptation of UniSymNet structures to different data, leading to reduced expression complexity. The UniSymNet is evaluated on low-dimensional Standard Benchmarks and high-dimensional SRBench, and shows excellent symbolic solution rate, high fitting accuracy, and relatively low expression complexity.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108615"},"PeriodicalIF":6.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146108216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring financial sentiment analysis via fine-tuning large language model and attributed graph neural network 通过微调大语言模型和属性图神经网络探索金融情绪分析。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.neunet.2026.108620
Zongshen Mu , Yujie Wan , Yueting Zhuang , Jie Tan , Hong Cheng , Yueyang Wang
Financial sentiment analysis (FSA) refers to the task of classifying textual content into predefined sentiment categories to analyze their potential impacts on financial market fluctuations. However, directly applying these pre-trained LLMs to FSA still poses significant challenges. Existing approaches fail to align with domain-specific objectives and struggle to adapt to customized financial data schemas. Moreover, these LLMs predict the stock change primarily depending on its own information, failing to take into account cross-impact among relevant stocks. In this paper, we propose a novel framework that synergizes an LLM with a Graph Neural Network (GNN) to model stock price dynamics, leveraging stock sentiment signals extracted from financial news. Specifically, we employ the open-source Llama-3-8B model as the backbone, then enhance its sensitivity to financial sentiment patterns through supervised fine-tuning (SFT) and direct preference optimization (DPO) techniques. Leveraging the sentiment outputs from the fine-tuned LLM, we design a GNN to enhance stock representations and model cross-asset dependencies via two types of text-attributed graphs, which dynamically encode time-varying price correlations. Experiments on the Chinese A-share market demonstrate that financial sentiment significantly influences stock price variations. Our framework outperforms previous baselines and exhibits an average improvement of 50% in Sharpe ratio.
金融情绪分析(Financial sentiment analysis, FSA)是指将文本内容分类到预定义的情绪类别中,分析其对金融市场波动的潜在影响。然而,将这些预先训练的法学硕士直接应用于金融服务管理局仍然面临着重大挑战。现有的方法不能与特定于领域的目标保持一致,并且难以适应定制的财务数据模式。而且,这些llm预测股票变动主要依靠自身的信息,没有考虑相关股票之间的交叉影响。在本文中,我们提出了一个新的框架,将LLM与图神经网络(GNN)协同作用,利用从金融新闻中提取的股票情绪信号来模拟股票价格动态。具体而言,我们采用开源的lama-3- 8b模型作为主干,然后通过监督微调(SFT)和直接偏好优化(DPO)技术增强其对金融情绪模式的敏感性。利用微调LLM的情感输出,我们设计了一个GNN来增强股票表示,并通过两种类型的文本属性图来建模跨资产依赖关系,这两种类型的文本属性图动态编码时变价格相关性。对中国a股市场的实验表明,金融情绪显著影响股价波动。我们的框架优于以前的基准,夏普比率平均提高了50%。
{"title":"Exploring financial sentiment analysis via fine-tuning large language model and attributed graph neural network","authors":"Zongshen Mu ,&nbsp;Yujie Wan ,&nbsp;Yueting Zhuang ,&nbsp;Jie Tan ,&nbsp;Hong Cheng ,&nbsp;Yueyang Wang","doi":"10.1016/j.neunet.2026.108620","DOIUrl":"10.1016/j.neunet.2026.108620","url":null,"abstract":"<div><div>Financial sentiment analysis (FSA) refers to the task of classifying textual content into predefined sentiment categories to analyze their potential impacts on financial market fluctuations. However, directly applying these pre-trained LLMs to FSA still poses significant challenges. Existing approaches fail to align with domain-specific objectives and struggle to adapt to customized financial data schemas. Moreover, these LLMs predict the stock change primarily depending on its own information, failing to take into account cross-impact among relevant stocks. In this paper, we propose a novel framework that synergizes an LLM with a Graph Neural Network (GNN) to model stock price dynamics, leveraging stock sentiment signals extracted from financial news. Specifically, we employ the open-source Llama-3-8B model as the backbone, then enhance its sensitivity to financial sentiment patterns through supervised fine-tuning (SFT) and direct preference optimization (DPO) techniques. Leveraging the sentiment outputs from the fine-tuned LLM, we design a GNN to enhance stock representations and model cross-asset dependencies via two types of text-attributed graphs, which dynamically encode time-varying price correlations. Experiments on the Chinese A-share market demonstrate that financial sentiment significantly influences stock price variations. Our framework outperforms previous baselines and exhibits an average improvement of 50% in Sharpe ratio.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108620"},"PeriodicalIF":6.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mitigating sensitive information leakage in LLMs4Code through machine unlearning 通过机器学习减少LLMs4Code中的敏感信息泄漏
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.neunet.2026.108606
Shanzhi Gu , Zhaoyang Qu , Ruotong Geng , Mingyang Geng , Shangwen Wang , Chuanfu Xu , Haotian Wang , Zhipeng Lin , Dezun Dong
Large Language Models for Code (LLMs4Code) have achieved strong performance in code generation, but recent studies reveal that they may memorize and leak sensitive information contained in training data, posing serious privacy risks. To address this gap, this work presents the first comprehensive empirical study on applying machine unlearning to mitigate sensitive information leakage in LLMs4Code. We first construct a dedicated benchmark that includes: (i) a synthetic forget set containing diverse forms of personal information, and (ii) a retain set designed to evaluate whether code-generation capability is preserved after unlearning. Using this benchmark, we systematically assess three representative unlearning algorithms (GA, GA+GD, GA+KL) across three widely used open-source LLMs4Code models (AIXCoder-7B, CodeLlama-7B, CodeQwen-7B). Experimental results demonstrate that machine unlearning can substantially reduce direct memorization-based leakage: on average, the direct leak rate drops by more than 50% while retaining about over 91% of the original code-generation performance. Moreover, by analyzing post-unlearning outputs, we uncover a consistent shift from direct to indirect leakage, revealing an underexplored vulnerability that persists even when the target data has been successfully forgotten. Our findings show that machine unlearning is a feasible and effective solution for enhancing privacy protection in LLMs4Code, while also highlighting the need for future techniques capable of mitigating both direct and indirect leakage simultaneously.
大型代码语言模型(Large Language Models for Code, LLMs4Code)在代码生成方面取得了较强的性能,但最近的研究表明,它们可能会记忆和泄露训练数据中包含的敏感信息,带来严重的隐私风险。为了解决这一差距,这项工作提出了第一个应用机器学习来减轻LLMs4Code中敏感信息泄漏的综合实证研究。我们首先构建了一个专用基准,其中包括:(i)包含多种形式个人信息的合成遗忘集,以及(ii)用于评估遗忘后是否保留代码生成能力的保留集。利用这一基准,我们系统地评估了三种具有代表性的学习算法(GA, GA+GD, GA+KL),涵盖了三种广泛使用的开源LLMs4Code模型(aixcode - 7b, CodeLlama-7B, CodeQwen-7B)。实验结果表明,机器学习可以大大减少基于直接记忆的泄漏:平均而言,直接泄漏率下降了50%以上,同时保留了91%以上的原始代码生成性能。此外,通过分析学习后的输出,我们发现了从直接泄漏到间接泄漏的一致转变,揭示了一个未被充分探索的漏洞,即使目标数据已经被成功遗忘,这个漏洞仍然存在。我们的研究结果表明,机器学习是增强LLMs4Code中隐私保护的可行且有效的解决方案,同时也强调了对未来能够同时减轻直接和间接泄漏的技术的需求。
{"title":"Mitigating sensitive information leakage in LLMs4Code through machine unlearning","authors":"Shanzhi Gu ,&nbsp;Zhaoyang Qu ,&nbsp;Ruotong Geng ,&nbsp;Mingyang Geng ,&nbsp;Shangwen Wang ,&nbsp;Chuanfu Xu ,&nbsp;Haotian Wang ,&nbsp;Zhipeng Lin ,&nbsp;Dezun Dong","doi":"10.1016/j.neunet.2026.108606","DOIUrl":"10.1016/j.neunet.2026.108606","url":null,"abstract":"<div><div>Large Language Models for Code (LLMs4Code) have achieved strong performance in code generation, but recent studies reveal that they may memorize and leak sensitive information contained in training data, posing serious privacy risks. To address this gap, this work presents the first comprehensive empirical study on applying machine unlearning to mitigate sensitive information leakage in LLMs4Code. We first construct a dedicated benchmark that includes: (i) a synthetic <em>forget set</em> containing diverse forms of personal information, and (ii) a <em>retain set</em> designed to evaluate whether code-generation capability is preserved after unlearning. Using this benchmark, we systematically assess three representative unlearning algorithms (GA, GA+GD, GA+KL) across three widely used open-source LLMs4Code models (AIXCoder-7B, CodeLlama-7B, CodeQwen-7B). Experimental results demonstrate that machine unlearning can substantially reduce direct memorization-based leakage: on average, the direct leak rate drops by more than <strong>50%</strong> while retaining about <strong>over 91%</strong> of the original code-generation performance. Moreover, by analyzing post-unlearning outputs, we uncover a consistent shift from <em>direct</em> to <em>indirect</em> leakage, revealing an underexplored vulnerability that persists even when the target data has been successfully forgotten. Our findings show that machine unlearning is a feasible and effective solution for enhancing privacy protection in LLMs4Code, while also highlighting the need for future techniques capable of mitigating both direct and indirect leakage simultaneously.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108606"},"PeriodicalIF":6.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neural Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1