Pub Date : 2026-01-27DOI: 10.1016/j.neunet.2026.108601
Dingyao Chen , Xiao Teng , Xingyu Shen , Xun Yang , Long Lan
Knowledge distillation (KD) is an effective strategy to transfer learned representations from a pre-trained teacher model to a smaller student model. Current methods for knowledge transfer from convolutional neural networks (CNNs) to vision transformers (ViTs) mainly align output logits. However, such approaches often overlook the rich semantic structures encoded in CNN features, thereby restricting ViTs from effectively inheriting the inductive biases inherent in convolutional architectures. To this end, this paper proposes a Feature-based CNN-to-ViT Structural Knowledge Distillation framework, dubbed FSKD, which combines the semantic structural knowledge embedded in CNN (teacher) features with the strength of ViT (student) in capturing long-range dependencies. Specifically, this framework includes a feature alignment module to bridge the representational gap between CNN and ViT features, and it incorporates a global feature alignment loss. Additionally, we develop patch-wise and attention-wise distillation losses to transfer inter-patch similarity and attention distribution, facilitating semantic structural knowledge transfer from CNNs to ViTs. Experimental results demonstrate that the proposed method considerably enhances ViT performance in visual recognition tasks, particularly under scenarios with limited data. Code is available at Github.
知识蒸馏(Knowledge distillation, KD)是一种将学习到的表征从预训练的教师模型转移到较小的学生模型的有效策略。目前从卷积神经网络(cnn)到视觉变压器(vit)的知识转移方法主要是对齐输出逻辑。然而,这些方法往往忽略了CNN特征中编码的丰富语义结构,从而限制了vit有效地继承卷积架构中固有的归纳偏差。为此,本文提出了一种基于特征的CNN- To -ViT结构知识蒸馏框架(FSKD),该框架将CNN(教师)特征中嵌入的语义结构知识与ViT(学生)在捕获远程依赖关系方面的优势结合起来。具体来说,该框架包括一个特征对齐模块,以弥合CNN和ViT特征之间的表征差距,并且它包含了一个全局特征对齐损失。此外,我们开发了patch-wise和attention-wise蒸馏损失来转移patch间的相似性和注意力分布,促进语义结构知识从cnn到vit的转移。实验结果表明,该方法显著提高了ViT在视觉识别任务中的性能,特别是在数据有限的情况下。代码可在Github上获得。
{"title":"Distilling structural knowledge from CNNs to vision transformers for data-efficient visual recognition","authors":"Dingyao Chen , Xiao Teng , Xingyu Shen , Xun Yang , Long Lan","doi":"10.1016/j.neunet.2026.108601","DOIUrl":"10.1016/j.neunet.2026.108601","url":null,"abstract":"<div><div>Knowledge distillation (KD) is an effective strategy to transfer learned representations from a pre-trained <span>teacher</span> model to a smaller <span>student</span> model. Current methods for knowledge transfer from convolutional neural networks (CNNs) to vision transformers (ViTs) mainly align output logits. However, such approaches often overlook the rich semantic structures encoded in CNN features, thereby restricting ViTs from effectively inheriting the inductive biases inherent in convolutional architectures. To this end, this paper proposes a <strong>F</strong>eature-based CNN-to-ViT <strong>S</strong>tructural <strong>K</strong>nowledge <strong>D</strong>istillation framework, dubbed <strong>FSKD</strong>, which combines the semantic structural knowledge embedded in CNN (<span>teacher</span>) features with the strength of ViT (<span>student</span>) in capturing long-range dependencies. Specifically, this framework includes a feature alignment module to bridge the representational gap between CNN and ViT features, and it incorporates a global feature alignment loss. Additionally, we develop patch-wise and attention-wise distillation losses to transfer inter-patch similarity and attention distribution, facilitating semantic structural knowledge transfer from CNNs to ViTs. Experimental results demonstrate that the proposed method considerably enhances ViT performance in visual recognition tasks, particularly under scenarios with limited data. Code is available at <span><span>Github</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108601"},"PeriodicalIF":6.3,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-25DOI: 10.1016/j.neunet.2026.108642
Yao Liang , Yuwei Wang , Yang Li , Yi Zeng
Parameter-efficient fine-tuning (PEFT) reduces the compute and memory demands of adapting large language models, yet standard low-rank adapters (e.g., LoRA) can lag full fine-tuning in performance and stability because they restrict updates to a fixed rank-r subspace. We propose Matrix-Transformation based Low-Rank Adaptation (MTLoRA), a brain-inspired extension that inserts a learnable r × r transformation T into the low-rank update (). By endowing the subspace with data-adapted geometry (e.g., rotations, scalings, and shears), MTLoRA reparameterizes the rank-r hypothesis class, improving its conditioning and inductive bias at negligible O(r2) overhead, and recovers LoRA when . We instantiate four structures for T—SHIM , ICFM , CTCM , and DTSM —providing complementary inductive biases (change of basis, PSD metric, staged mixing, dual superposition). An optimization analysis shows that T acts as a learned preconditioner within the subspace, yielding spectral-norm step-size bounds and operator-norm variance contraction that stabilize training. Empirically, MTLoRA delivers consistent gains while preserving PEFT efficiency: on GLUE (General Language Understanding Evaluation) with DeBERTaV3-base, MTLoRA improves the average over LoRA by (+2.0) points (86.9 → 88.9) and matches AdaLoRA (88.9) without any pruning schedule; on natural language generation with GPT-2 Medium, it raises BLEU on DART by (+0.95) and on WebNLG by (+0.56); and in multimodal instruction tuning with LLaVA-1.5-7B, DTSM attains the best average (69.91) with ∼ 4.7% trainable parameters, outperforming full fine-tuning and strong PEFT baselines. These results indicate that learning geometry inside the low-rank subspace improves both effectiveness and stability, making MTLoRA a practical, plug-compatible alternative to LoRA for large-model fine-tuning.
{"title":"Matrix-Transformation based Low-Rank Adaptation (MTLoRA): A brain-Inspired method for parameter-Efficient fine-Tuning","authors":"Yao Liang , Yuwei Wang , Yang Li , Yi Zeng","doi":"10.1016/j.neunet.2026.108642","DOIUrl":"10.1016/j.neunet.2026.108642","url":null,"abstract":"<div><div>Parameter-efficient fine-tuning (PEFT) reduces the compute and memory demands of adapting large language models, yet standard low-rank adapters (e.g., LoRA) can lag full fine-tuning in performance and stability because they restrict updates to a fixed rank-<em>r</em> subspace. We propose Matrix-Transformation based Low-Rank Adaptation (MTLoRA), a brain-inspired extension that inserts a learnable <em>r</em> × <em>r</em> transformation <em>T</em> into the low-rank update (<span><math><mrow><mstyle><mi>Δ</mi></mstyle><mi>W</mi><mo>=</mo><mi>B</mi><mi>T</mi><mi>A</mi></mrow></math></span>). By endowing the subspace with data-adapted geometry (e.g., rotations, scalings, and shears), MTLoRA reparameterizes the rank-<em>r</em> hypothesis class, improving its conditioning and inductive bias at negligible <em>O</em>(<em>r</em><sup>2</sup>) overhead, and recovers LoRA when <span><math><mrow><mi>T</mi><mo>=</mo><msub><mi>I</mi><mi>r</mi></msub></mrow></math></span>. We instantiate four structures for <em>T</em>—SHIM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><mo>)</mo></mrow></math></span>, ICFM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><msup><mi>C</mi><mi>⊤</mi></msup><mo>)</mo></mrow></math></span>, CTCM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><mi>D</mi><mo>)</mo></mrow></math></span>, and DTSM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><mo>+</mo><mi>D</mi><mo>)</mo></mrow></math></span>—providing complementary inductive biases (change of basis, PSD metric, staged mixing, dual superposition). An optimization analysis shows that <em>T</em> acts as a learned preconditioner within the subspace, yielding spectral-norm step-size bounds and operator-norm variance contraction that stabilize training. Empirically, MTLoRA delivers consistent gains while preserving PEFT efficiency: on GLUE (General Language Understanding Evaluation) with DeBERTaV3-base, MTLoRA improves the average over LoRA by (+2.0) points (86.9 → 88.9) and matches AdaLoRA (88.9) without any pruning schedule; on natural language generation with GPT-2 Medium, it raises BLEU on DART by (+0.95) and on WebNLG by (+0.56); and in multimodal instruction tuning with LLaVA-1.5-7B, DTSM attains the best average (69.91) with ∼ 4.7% trainable parameters, outperforming full fine-tuning and strong PEFT baselines. These results indicate that learning geometry inside the low-rank subspace improves both effectiveness and stability, making MTLoRA a practical, plug-compatible alternative to LoRA for large-model fine-tuning.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108642"},"PeriodicalIF":6.3,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-24DOI: 10.1016/j.neunet.2026.108641
Georg Kohl, Li-Wei Chen, Nils Thuerey
Simulating turbulent flows is crucial for a wide range of applications, and machine learning-based solvers are gaining increasing relevance. However, achieving temporal stability when generalizing to longer rollout horizons remains a persistent challenge for learned PDE solvers. In this work, we analyze if fully data-driven fluid solvers that utilize an autoregressive rollout based on conditional diffusion models are a viable option to address this challenge. We investigate accuracy, posterior sampling, spectral behavior, and temporal stability, while requiring that methods generalize to flow parameters beyond the training regime. To quantitatively and qualitatively benchmark the performance of various flow prediction approaches, three challenging 2D scenarios including incompressible and transonic flows, as well as isotropic turbulence are employed. We find that even simple diffusion-based approaches can outperform multiple established flow prediction methods in terms of accuracy and temporal stability, while being on par with state-of-the-art stabilization techniques like unrolling at training time. Such traditional architectures are superior in terms of inference speed, however, the probabilistic nature of diffusion approaches allows for inferring multiple predictions that align with the statistics of the underlying physics. Overall, our benchmark contains three carefully chosen data sets that are suitable for probabilistic evaluation alongside various established flow prediction architectures.
{"title":"Benchmarking autoregressive conditional diffusion models for turbulent flow simulation","authors":"Georg Kohl, Li-Wei Chen, Nils Thuerey","doi":"10.1016/j.neunet.2026.108641","DOIUrl":"10.1016/j.neunet.2026.108641","url":null,"abstract":"<div><div>Simulating turbulent flows is crucial for a wide range of applications, and machine learning-based solvers are gaining increasing relevance. However, achieving temporal stability when generalizing to longer rollout horizons remains a persistent challenge for learned PDE solvers. In this work, we analyze if fully data-driven fluid solvers that utilize an autoregressive rollout based on conditional diffusion models are a viable option to address this challenge. We investigate accuracy, posterior sampling, spectral behavior, and temporal stability, while requiring that methods generalize to flow parameters beyond the training regime. To quantitatively and qualitatively benchmark the performance of various flow prediction approaches, three challenging 2D scenarios including incompressible and transonic flows, as well as isotropic turbulence are employed. We find that even simple diffusion-based approaches can outperform multiple established flow prediction methods in terms of accuracy and temporal stability, while being on par with state-of-the-art stabilization techniques like unrolling at training time. Such traditional architectures are superior in terms of inference speed, however, the probabilistic nature of diffusion approaches allows for inferring multiple predictions that align with the statistics of the underlying physics. Overall, our benchmark contains three carefully chosen data sets that are suitable for probabilistic evaluation alongside various established flow prediction architectures.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108641"},"PeriodicalIF":6.3,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146151141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.neunet.2026.108615
Xinxin Li , Juan Zhang , Da Li , Xingyu Liu , Jin Xu , Junping Yin
Automatically discovering mathematical expressions is a challenging issue to precisely depict natural phenomena, in which Symbolic Regression (SR) is one of the most widely utilized techniques. Mainstream SR algorithms target on searching for the optimal symbolic tree, but the increasing complexity of the tree structure often limits their performance. Inspired by neural networks, symbolic networks have emerged as a promising new paradigm. However, existing symbolic networks still face certain challenges: binary nonlinear operators { × , ÷} cannot be naturally extended to multivariate, training with fixed architecture often leads to higher complexity and overfitting. In this work, we propose a Unified Symbolic Network that unifies nonlinear binary operators into nested unary operators, thereby transforming them into multivariate operators. The capability of the proposed UniSymNet is deduced from rigorous theoretical proof, resulting in lower complexity and stronger expressivity. Unlike the conventional neural network training, we design a bi-level optimization framework: the outer level pre-trains a Transformer with sparse label encoding scheme to guide UniSymNet structure selection, while the inner level employs objective-specific strategies to optimize network parameters. This allows for flexible adaptation of UniSymNet structures to different data, leading to reduced expression complexity. The UniSymNet is evaluated on low-dimensional Standard Benchmarks and high-dimensional SRBench, and shows excellent symbolic solution rate, high fitting accuracy, and relatively low expression complexity.
{"title":"UniSymNet: A Unified Symbolic Network with Sparse Encoding and Bi-level Optimization","authors":"Xinxin Li , Juan Zhang , Da Li , Xingyu Liu , Jin Xu , Junping Yin","doi":"10.1016/j.neunet.2026.108615","DOIUrl":"10.1016/j.neunet.2026.108615","url":null,"abstract":"<div><div>Automatically discovering mathematical expressions is a challenging issue to precisely depict natural phenomena, in which Symbolic Regression (SR) is one of the most widely utilized techniques. Mainstream SR algorithms target on searching for the optimal symbolic tree, but the increasing complexity of the tree structure often limits their performance. Inspired by neural networks, symbolic networks have emerged as a promising new paradigm. However, existing symbolic networks still face certain challenges: binary nonlinear operators { × , ÷} cannot be naturally extended to multivariate, training with fixed architecture often leads to higher complexity and overfitting. In this work, we propose a <strong>Uni</strong>fied <strong>Sym</strong>bolic <strong>Net</strong>work that unifies nonlinear binary operators into nested unary operators, thereby transforming them into multivariate operators. The capability of the proposed UniSymNet is deduced from rigorous theoretical proof, resulting in lower complexity and stronger expressivity. Unlike the conventional neural network training, we design a bi-level optimization framework: the outer level pre-trains a Transformer with sparse label encoding scheme to guide UniSymNet structure selection, while the inner level employs objective-specific strategies to optimize network parameters. This allows for flexible adaptation of UniSymNet structures to different data, leading to reduced expression complexity. The UniSymNet is evaluated on low-dimensional Standard Benchmarks and high-dimensional SRBench, and shows excellent symbolic solution rate, high fitting accuracy, and relatively low expression complexity.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108615"},"PeriodicalIF":6.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146108216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.neunet.2026.108620
Zongshen Mu , Yujie Wan , Yueting Zhuang , Jie Tan , Hong Cheng , Yueyang Wang
Financial sentiment analysis (FSA) refers to the task of classifying textual content into predefined sentiment categories to analyze their potential impacts on financial market fluctuations. However, directly applying these pre-trained LLMs to FSA still poses significant challenges. Existing approaches fail to align with domain-specific objectives and struggle to adapt to customized financial data schemas. Moreover, these LLMs predict the stock change primarily depending on its own information, failing to take into account cross-impact among relevant stocks. In this paper, we propose a novel framework that synergizes an LLM with a Graph Neural Network (GNN) to model stock price dynamics, leveraging stock sentiment signals extracted from financial news. Specifically, we employ the open-source Llama-3-8B model as the backbone, then enhance its sensitivity to financial sentiment patterns through supervised fine-tuning (SFT) and direct preference optimization (DPO) techniques. Leveraging the sentiment outputs from the fine-tuned LLM, we design a GNN to enhance stock representations and model cross-asset dependencies via two types of text-attributed graphs, which dynamically encode time-varying price correlations. Experiments on the Chinese A-share market demonstrate that financial sentiment significantly influences stock price variations. Our framework outperforms previous baselines and exhibits an average improvement of 50% in Sharpe ratio.
{"title":"Exploring financial sentiment analysis via fine-tuning large language model and attributed graph neural network","authors":"Zongshen Mu , Yujie Wan , Yueting Zhuang , Jie Tan , Hong Cheng , Yueyang Wang","doi":"10.1016/j.neunet.2026.108620","DOIUrl":"10.1016/j.neunet.2026.108620","url":null,"abstract":"<div><div>Financial sentiment analysis (FSA) refers to the task of classifying textual content into predefined sentiment categories to analyze their potential impacts on financial market fluctuations. However, directly applying these pre-trained LLMs to FSA still poses significant challenges. Existing approaches fail to align with domain-specific objectives and struggle to adapt to customized financial data schemas. Moreover, these LLMs predict the stock change primarily depending on its own information, failing to take into account cross-impact among relevant stocks. In this paper, we propose a novel framework that synergizes an LLM with a Graph Neural Network (GNN) to model stock price dynamics, leveraging stock sentiment signals extracted from financial news. Specifically, we employ the open-source Llama-3-8B model as the backbone, then enhance its sensitivity to financial sentiment patterns through supervised fine-tuning (SFT) and direct preference optimization (DPO) techniques. Leveraging the sentiment outputs from the fine-tuned LLM, we design a GNN to enhance stock representations and model cross-asset dependencies via two types of text-attributed graphs, which dynamically encode time-varying price correlations. Experiments on the Chinese A-share market demonstrate that financial sentiment significantly influences stock price variations. Our framework outperforms previous baselines and exhibits an average improvement of 50% in Sharpe ratio.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108620"},"PeriodicalIF":6.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.neunet.2026.108606
Shanzhi Gu , Zhaoyang Qu , Ruotong Geng , Mingyang Geng , Shangwen Wang , Chuanfu Xu , Haotian Wang , Zhipeng Lin , Dezun Dong
Large Language Models for Code (LLMs4Code) have achieved strong performance in code generation, but recent studies reveal that they may memorize and leak sensitive information contained in training data, posing serious privacy risks. To address this gap, this work presents the first comprehensive empirical study on applying machine unlearning to mitigate sensitive information leakage in LLMs4Code. We first construct a dedicated benchmark that includes: (i) a synthetic forget set containing diverse forms of personal information, and (ii) a retain set designed to evaluate whether code-generation capability is preserved after unlearning. Using this benchmark, we systematically assess three representative unlearning algorithms (GA, GA+GD, GA+KL) across three widely used open-source LLMs4Code models (AIXCoder-7B, CodeLlama-7B, CodeQwen-7B). Experimental results demonstrate that machine unlearning can substantially reduce direct memorization-based leakage: on average, the direct leak rate drops by more than 50% while retaining about over 91% of the original code-generation performance. Moreover, by analyzing post-unlearning outputs, we uncover a consistent shift from direct to indirect leakage, revealing an underexplored vulnerability that persists even when the target data has been successfully forgotten. Our findings show that machine unlearning is a feasible and effective solution for enhancing privacy protection in LLMs4Code, while also highlighting the need for future techniques capable of mitigating both direct and indirect leakage simultaneously.
大型代码语言模型(Large Language Models for Code, LLMs4Code)在代码生成方面取得了较强的性能,但最近的研究表明,它们可能会记忆和泄露训练数据中包含的敏感信息,带来严重的隐私风险。为了解决这一差距,这项工作提出了第一个应用机器学习来减轻LLMs4Code中敏感信息泄漏的综合实证研究。我们首先构建了一个专用基准,其中包括:(i)包含多种形式个人信息的合成遗忘集,以及(ii)用于评估遗忘后是否保留代码生成能力的保留集。利用这一基准,我们系统地评估了三种具有代表性的学习算法(GA, GA+GD, GA+KL),涵盖了三种广泛使用的开源LLMs4Code模型(aixcode - 7b, CodeLlama-7B, CodeQwen-7B)。实验结果表明,机器学习可以大大减少基于直接记忆的泄漏:平均而言,直接泄漏率下降了50%以上,同时保留了91%以上的原始代码生成性能。此外,通过分析学习后的输出,我们发现了从直接泄漏到间接泄漏的一致转变,揭示了一个未被充分探索的漏洞,即使目标数据已经被成功遗忘,这个漏洞仍然存在。我们的研究结果表明,机器学习是增强LLMs4Code中隐私保护的可行且有效的解决方案,同时也强调了对未来能够同时减轻直接和间接泄漏的技术的需求。
{"title":"Mitigating sensitive information leakage in LLMs4Code through machine unlearning","authors":"Shanzhi Gu , Zhaoyang Qu , Ruotong Geng , Mingyang Geng , Shangwen Wang , Chuanfu Xu , Haotian Wang , Zhipeng Lin , Dezun Dong","doi":"10.1016/j.neunet.2026.108606","DOIUrl":"10.1016/j.neunet.2026.108606","url":null,"abstract":"<div><div>Large Language Models for Code (LLMs4Code) have achieved strong performance in code generation, but recent studies reveal that they may memorize and leak sensitive information contained in training data, posing serious privacy risks. To address this gap, this work presents the first comprehensive empirical study on applying machine unlearning to mitigate sensitive information leakage in LLMs4Code. We first construct a dedicated benchmark that includes: (i) a synthetic <em>forget set</em> containing diverse forms of personal information, and (ii) a <em>retain set</em> designed to evaluate whether code-generation capability is preserved after unlearning. Using this benchmark, we systematically assess three representative unlearning algorithms (GA, GA+GD, GA+KL) across three widely used open-source LLMs4Code models (AIXCoder-7B, CodeLlama-7B, CodeQwen-7B). Experimental results demonstrate that machine unlearning can substantially reduce direct memorization-based leakage: on average, the direct leak rate drops by more than <strong>50%</strong> while retaining about <strong>over 91%</strong> of the original code-generation performance. Moreover, by analyzing post-unlearning outputs, we uncover a consistent shift from <em>direct</em> to <em>indirect</em> leakage, revealing an underexplored vulnerability that persists even when the target data has been successfully forgotten. Our findings show that machine unlearning is a feasible and effective solution for enhancing privacy protection in LLMs4Code, while also highlighting the need for future techniques capable of mitigating both direct and indirect leakage simultaneously.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108606"},"PeriodicalIF":6.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.neunet.2026.108630
Deyu Chen , Caicai Guo , Qiyuan Li , Jinguang Gu , Meiyi Xie , Hong Zhu
Lifelong knowledge graph embedding (KGE) methods aim to learn new knowledge continuously while retaining old knowledge. This line of work has received much attention for its potential to enable knowledge retention and transfer and to reduce training costs under knowledge graphs’ growing scale and flexibility. However, embedding space drift under different contexts is a crucial reason for catastrophic forgetting and inefficient learning of new facts, and existing work ignores this perspective. In order to address the above issues, we proposed a novel lifelong KGE framework that considers learning new facts and preserving old facts in a unified perspective. We propose a diffusion-based embedding method that captures the contextual variation of entity representations and obtains transferable embeddings. In order to handle the drift of the embedding space and balance the learning efficiency, we adopt a reconstruction and generation strategy based on contrastive learning. To avoid catastrophic forgetting and maintain the stability of the embedding distribution, we proposed an effective distribution regularization method. We conduct extensive experiments on seven benchmark datasets with different construction strategies and incremental speed. Experimental results show that our proposed framework outperforms existing lifelong KGE methods.
{"title":"Lifelong knowledge graph embedding via diffusion model","authors":"Deyu Chen , Caicai Guo , Qiyuan Li , Jinguang Gu , Meiyi Xie , Hong Zhu","doi":"10.1016/j.neunet.2026.108630","DOIUrl":"10.1016/j.neunet.2026.108630","url":null,"abstract":"<div><div>Lifelong knowledge graph embedding (KGE) methods aim to learn new knowledge continuously while retaining old knowledge. This line of work has received much attention for its potential to enable knowledge retention and transfer and to reduce training costs under knowledge graphs’ growing scale and flexibility. However, embedding space drift under different contexts is a crucial reason for catastrophic forgetting and inefficient learning of new facts, and existing work ignores this perspective. In order to address the above issues, we proposed a novel lifelong KGE framework that considers learning new facts and preserving old facts in a unified perspective. We propose a diffusion-based embedding method that captures the contextual variation of entity representations and obtains transferable embeddings. In order to handle the drift of the embedding space and balance the learning efficiency, we adopt a reconstruction and generation strategy based on contrastive learning. To avoid catastrophic forgetting and maintain the stability of the embedding distribution, we proposed an effective distribution regularization method. We conduct extensive experiments on seven benchmark datasets with different construction strategies and incremental speed. Experimental results show that our proposed framework outperforms existing lifelong KGE methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108630"},"PeriodicalIF":6.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.neunet.2026.108631
Nianyi Wang, Shuai Zheng, Yu Chen, Hai Zhao, Zhou Fang
Learning-based fluid simulation has emerged as an efficient alternative to traditional Navier-Stokes solvers. However, existing neural methods that build upon Smoothed Particle Hydrodynamics (SPH) predominantly rely on local particle interactions, which induces instability in complex scenarios due to error accumulation. To address this, we introduce FluidFormer, a novel architecture that establishes a hierarchical local-global modeling paradigm. The core of our model is the Fluid Attention Block (FAB), a co-design that orchestrates continuous convolution for locality with self-attention for global corrective long-range hydrodynamic phenomena. Embedded in a dual-pipeline network, our approach seamlessly fuses inductive physical biases with structured global reasoning. Extensive experiments show that FluidFormer achieves state-of-the-art performance, with significantly improved stability and generalization in challenging fluid scenes, demonstrating its potential as a robust simulator for complex physical systems.
{"title":"FluidFormer : Transformer with continuous convolution for particle-based fluid simulation","authors":"Nianyi Wang, Shuai Zheng, Yu Chen, Hai Zhao, Zhou Fang","doi":"10.1016/j.neunet.2026.108631","DOIUrl":"10.1016/j.neunet.2026.108631","url":null,"abstract":"<div><div>Learning-based fluid simulation has emerged as an efficient alternative to traditional Navier-Stokes solvers. However, existing neural methods that build upon Smoothed Particle Hydrodynamics (SPH) predominantly rely on local particle interactions, which induces instability in complex scenarios due to error accumulation. To address this, we introduce FluidFormer, a novel architecture that establishes a hierarchical local-global modeling paradigm. The core of our model is the Fluid Attention Block (FAB), a co-design that orchestrates continuous convolution for locality with self-attention for global corrective long-range hydrodynamic phenomena. Embedded in a dual-pipeline network, our approach seamlessly fuses inductive physical biases with structured global reasoning. Extensive experiments show that FluidFormer achieves state-of-the-art performance, with significantly improved stability and generalization in challenging fluid scenes, demonstrating its potential as a robust simulator for complex physical systems.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108631"},"PeriodicalIF":6.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.neunet.2026.108618
Xiaobo Li , Xiaodi Hou , Shilong Wang , Hongfei Lin , Yijia Zhang
Drug recommendation systems have garnered considerable interest in the healthcare, striving to offer precise and customized drug prescriptions that align with patients’ specific health needs. However, existing methods primarily focus on modeling temporal dependencies between visits for patients with multiple encounters, often neglecting the challenge of data sparsity in single-visit patients. To address above limitation, we propose a novel Relation-aware Pre-trained Network with hierarchical aggregation mechanism for drug recommendation (RPNet), which employs a pre-training and fine-tuning framework to enhance drug recommendation in cold-start scenario. Specifically, we introduce: 1) A code matching discrimination task during pre-training, designed to model the complex relationships between diagnosis and procedure entities. This task employs a mask-replace contrastive learning strategy, which pulls similar samples closer while pushing dissimilar ones apart, thereby capturing robust feature representations; 2) A hierarchical aggregation mechanism that enhances drug information integration by first selecting relevant visits based on rarity discrimination and then retrieving similar patients’ drug insights via similarity matching during fine-tuning. Extensive experiments on two real-world datasets demonstrate the superiority of the proposed RPNet, notably improving the F1 metric by 1.32% and 1.19%. The code of our model is available at https://github.com/Lxb0102/RPNet.
{"title":"Relation-aware pre-trained network with hierarchical aggregation mechanism for cold-start drug recommendation","authors":"Xiaobo Li , Xiaodi Hou , Shilong Wang , Hongfei Lin , Yijia Zhang","doi":"10.1016/j.neunet.2026.108618","DOIUrl":"10.1016/j.neunet.2026.108618","url":null,"abstract":"<div><div>Drug recommendation systems have garnered considerable interest in the healthcare, striving to offer precise and customized drug prescriptions that align with patients’ specific health needs. However, existing methods primarily focus on modeling temporal dependencies between visits for patients with multiple encounters, often neglecting the challenge of data sparsity in single-visit patients. To address above limitation, we propose a novel Relation-aware Pre-trained Network with hierarchical aggregation mechanism for drug recommendation (RPNet), which employs a pre-training and fine-tuning framework to enhance drug recommendation in cold-start scenario. Specifically, we introduce: 1) A code matching discrimination task during pre-training, designed to model the complex relationships between diagnosis and procedure entities. This task employs a mask-replace contrastive learning strategy, which pulls similar samples closer while pushing dissimilar ones apart, thereby capturing robust feature representations; 2) A hierarchical aggregation mechanism that enhances drug information integration by first selecting relevant visits based on rarity discrimination and then retrieving similar patients’ drug insights via similarity matching during fine-tuning. Extensive experiments on two real-world datasets demonstrate the superiority of the proposed RPNet, notably improving the F1 metric by 1.32% and 1.19%. The code of our model is available at <span><span>https://github.com/Lxb0102/RPNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108618"},"PeriodicalIF":6.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.neunet.2026.108628
Daixun Li , Weiying Xie , Leyuan Fang , Yunke Wang , Zirui Li , Mingxiang Cao , Jitao Ma , Yunsong Li , Chang Xu
Significant progress has been made in the application of transformer architectures for multimodal tasks. However, current methods such as the self-attention mechanism rarely consider the benefits that feature complementarity and consistency between different modalities bring to fusion, leading to obstacles such as redundant fusion or incomplete representation. Inspired by topological homology groups, we introduce MMFormer, a novel semi-supervised algorithm for high-dimensional multimodal fusion. This method is engineered to capture comprehensive representations by enhancing the interactivity between modal mappings. Specifically, we advocate for the representational consistency between these heterogeneous representations through a complete dictionary lookup and homology space in the encoder, and establish an exclusivity-aware mapping of the two modalities to emphasize their complementary information, serving as a powerful supplement for multimodal feature interpretation. Moreover, the model attempts to alleviate the challenge of sparse annotations in high-dimensional multimodal data by introducing a consistency joint regularization term. We have formulated these focuses into a unified end-to-end optimization framework and are the first to explore and derive the application of semi-supervised visual transformers in high-dimensional multimodal data fusion. Extensive experiments across three benchmarks demonstrate the superiority of MMFormer. Specifically, the model improves overall accuracy by 3.12% on Houston2013, 1.86% on Augsburg, and 1.66% on MUUFL compared with the strongest existing methods, confirming its robustness and effectiveness under sparse annotation conditions. The code is available at https://github.com/LDXDU/MMFormer.
{"title":"MMFormer: Multi-Modality semi-Supervised vision transformer in remote sensing imagery classification","authors":"Daixun Li , Weiying Xie , Leyuan Fang , Yunke Wang , Zirui Li , Mingxiang Cao , Jitao Ma , Yunsong Li , Chang Xu","doi":"10.1016/j.neunet.2026.108628","DOIUrl":"10.1016/j.neunet.2026.108628","url":null,"abstract":"<div><div>Significant progress has been made in the application of transformer architectures for multimodal tasks. However, current methods such as the self-attention mechanism rarely consider the benefits that feature complementarity and consistency between different modalities bring to fusion, leading to obstacles such as redundant fusion or incomplete representation. Inspired by topological homology groups, we introduce MMFormer, a novel semi-supervised algorithm for high-dimensional multimodal fusion. This method is engineered to capture comprehensive representations by enhancing the interactivity between modal mappings. Specifically, we advocate for the representational consistency between these heterogeneous representations through a complete dictionary lookup and homology space in the encoder, and establish an exclusivity-aware mapping of the two modalities to emphasize their complementary information, serving as a powerful supplement for multimodal feature interpretation. Moreover, the model attempts to alleviate the challenge of sparse annotations in high-dimensional multimodal data by introducing a consistency joint regularization term. We have formulated these focuses into a unified end-to-end optimization framework and are the first to explore and derive the application of semi-supervised visual transformers in high-dimensional multimodal data fusion. Extensive experiments across three benchmarks demonstrate the superiority of MMFormer. Specifically, the model improves overall accuracy by 3.12% on Houston2013, 1.86% on Augsburg, and 1.66% on MUUFL compared with the strongest existing methods, confirming its robustness and effectiveness under sparse annotation conditions. The code is available at <span><span>https://github.com/LDXDU/MMFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108628"},"PeriodicalIF":6.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}