Pub Date : 2026-01-27DOI: 10.1016/j.neunet.2026.108648
Lanyun Cui , Ying Yu , Qingyun Wang , Guanrong Chen
Cerebellar-inspired motor control systems have been widely explored in robotics to achieve biologically plausible movement generation. However, most existing models rely heavily on high-dimensional instruction inputs during training, diverging from the input-efficient control observed in biological systems. In humans, effective motor learning often based on sparse or incomplete external feedback. It is possibly attributed to the interaction between multiple brain regions, especially the cortex and the cerebellum. In this study, we present a hierarchical cortico-cerebellar neural network model that investigates the neural mechanisms enabling motor control under incomplete or low-dimensional instructions. The evaluation results, measured by two complementary levels of evaluation metrics, demonstrate that the cortico-cerebellar model reduces dependency on external instruction without compromising trajectory smoothness. The model features a division of roles: the cortical network handles high-level action selection, while the cerebellar network executes motor commands by torque control, directly operating on a planar arm. Additionally, the cortex exhibits enhanced exploration indirectly driven by the stochastic characteristics of cerebellar torque control. Our results show that cortico-cerebellar coordination can facilitate robust and flexible control even with sparse instruction signals, suggesting a potential mechanism by which biological systems achieve efficient behavior under informational constraints.
{"title":"A cortico-cerebellar neural model for task control under incomplete instructions","authors":"Lanyun Cui , Ying Yu , Qingyun Wang , Guanrong Chen","doi":"10.1016/j.neunet.2026.108648","DOIUrl":"10.1016/j.neunet.2026.108648","url":null,"abstract":"<div><div>Cerebellar-inspired motor control systems have been widely explored in robotics to achieve biologically plausible movement generation. However, most existing models rely heavily on high-dimensional instruction inputs during training, diverging from the input-efficient control observed in biological systems. In humans, effective motor learning often based on sparse or incomplete external feedback. It is possibly attributed to the interaction between multiple brain regions, especially the cortex and the cerebellum. In this study, we present a hierarchical cortico-cerebellar neural network model that investigates the neural mechanisms enabling motor control under incomplete or low-dimensional instructions. The evaluation results, measured by two complementary levels of evaluation metrics, demonstrate that the cortico-cerebellar model reduces dependency on external instruction without compromising trajectory smoothness. The model features a division of roles: the cortical network handles high-level action selection, while the cerebellar network executes motor commands by torque control, directly operating on a planar arm. Additionally, the cortex exhibits enhanced exploration indirectly driven by the stochastic characteristics of cerebellar torque control. Our results show that cortico-cerebellar coordination can facilitate robust and flexible control even with sparse instruction signals, suggesting a potential mechanism by which biological systems achieve efficient behavior under informational constraints.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108648"},"PeriodicalIF":6.3,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.neunet.2026.108645
Jongsoo Lee , Jangwon Kim , Jiseok Jeong , Soohee Han
Real-world reinforcement learning applications are often subject to unavoidable delayed feedback from the environment. Under such conditions, the standard state representation may no longer induce Markovian dynamics unless additional information is incorporated at decision time, which introduces significant challenges for both learning and control. While numerous delay-compensation methods have been proposed for environments with constant delays, those with random delays remain largely unexplored due to their inherent variability and unpredictability. In this study, we propose a robust agent for decision-making under bounded random delays, termed the conservative agent. This agent reformulates the random-delay environment into a constant-delay surrogate, which enables any constant-delay method to be directly extended to random-delay environments without modifying their algorithmic structure. Apart from a maximum delay, the conservative agent does not require prior knowledge of the underlying delay distribution and maintains performance invariant to changes in the delay distribution as long as the maximum delay remains unchanged. We present a theoretical analysis of conservative agent and evaluate its performance on diverse continuous control tasks from the MuJoCo benchmarks. Empirical results demonstrate that it significantly outperforms existing baselines in terms of both asymptotic performance and sample efficiency.
{"title":"Reinforcement learning via conservative agent for environments with random delays","authors":"Jongsoo Lee , Jangwon Kim , Jiseok Jeong , Soohee Han","doi":"10.1016/j.neunet.2026.108645","DOIUrl":"10.1016/j.neunet.2026.108645","url":null,"abstract":"<div><div>Real-world reinforcement learning applications are often subject to unavoidable delayed feedback from the environment. Under such conditions, the standard state representation may no longer induce Markovian dynamics unless additional information is incorporated at decision time, which introduces significant challenges for both learning and control. While numerous delay-compensation methods have been proposed for environments with constant delays, those with random delays remain largely unexplored due to their inherent variability and unpredictability. In this study, we propose a robust agent for decision-making under bounded random delays, termed the <em>conservative agent</em>. This agent reformulates the random-delay environment into a constant-delay surrogate, which enables any constant-delay method to be directly extended to random-delay environments without modifying their algorithmic structure. Apart from a maximum delay, the conservative agent does not require prior knowledge of the underlying delay distribution and maintains performance invariant to changes in the delay distribution as long as the maximum delay remains unchanged. We present a theoretical analysis of conservative agent and evaluate its performance on diverse continuous control tasks from the MuJoCo benchmarks. Empirical results demonstrate that it significantly outperforms existing baselines in terms of both asymptotic performance and sample efficiency.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108645"},"PeriodicalIF":6.3,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.neunet.2026.108643
Yuanyuan Zhang, Xinzuo Ma, Seakweng Vong
This paper studies the exponential stability of neural networks with time delays. A separated-matrix-based integral inequality is proposed to incorporate more delay information. It not only reflects the information of each component in the state-related vector but also considers the cross terms among the three components, significantly reducing the inherent conservativeness of traditional methods. By constructing a Lyapunov-Krasovskii functional with separation-matrix-based integral and a linear matrix inequality framework via quadratic negative definiteness, less conservative stability criteria are established. Two numerical examples demonstrate the method superiority in maximum allowable delay bounds and computational efficiency compared to existing approaches.
{"title":"Improved exponential stability of time delay neural networks via separated-matrix-based integral inequalities","authors":"Yuanyuan Zhang, Xinzuo Ma, Seakweng Vong","doi":"10.1016/j.neunet.2026.108643","DOIUrl":"10.1016/j.neunet.2026.108643","url":null,"abstract":"<div><div>This paper studies the exponential stability of neural networks with time delays. A separated-matrix-based integral inequality is proposed to incorporate more delay information. It not only reflects the information of each component in the state-related vector but also considers the cross terms among the three components, significantly reducing the inherent conservativeness of traditional methods. By constructing a Lyapunov-Krasovskii functional with separation-matrix-based integral and a linear matrix inequality framework via quadratic negative definiteness, less conservative stability criteria are established. Two numerical examples demonstrate the method superiority in maximum allowable delay bounds and computational efficiency compared to existing approaches.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108643"},"PeriodicalIF":6.3,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.neunet.2026.108662
Xinfei Wang , Lan Huang , Yan Wang , Renchu Guan , Zhuhong You , Fengfeng Zhou , Yuqing Li , Yuan Fu
The competing endogenous RNA (ceRNA) regulatory network (CENA) plays a critical role in elucidating the molecular mechanisms of diseases. However, existing computational methods primarily focus on modeling local topological structures of biological networks, struggling to capture high-order regulatory relationships and global topological structures, thus limiting a deeper understanding of complex regulatory interactions.
To address this, we propose HpMiX, a Graph Topology-Constrained Mixup (GTCM) and hypergraph residual enhancement learning framework for the discovery of disease-related ceRNA biomarkers. This framework first constructs a CENA network encompassing multi-molecule associations, including miRNA, lncRNA, circRNA, and mRNA, and models higher-order regulatory relationships using K-hop hyperedges. Biologically meaningful initial features are then extracted from CENA via a multi-structure hypergraph weighted random walk method (MHWRW), integrating prior biological knowledge and regulatory information. Subsequently, graph topology-constrained Mixup and multi-head attention, combined with a residual hypergraph neural network, are employed to generate robust node embeddings with both local and global context, enabling the identification of potential disease-ceRNA biomarkers.
Prediction results across multiple disease biomarkers demonstrate that HpMiX significantly outperforms state-of-the-art methods, validating its effectiveness in biological regulatory network representation learning. Case studies further confirm that the framework can effectively identify differentially expressed ceRNAs in diseases, highlighting its potential as a tool for pre-screening high-probability disease biomarkers.
{"title":"HpMiX: A Disease ceRNA biomarker prediction framework driven by graph topology-constrained Mixup and hypergraph residual enhancement","authors":"Xinfei Wang , Lan Huang , Yan Wang , Renchu Guan , Zhuhong You , Fengfeng Zhou , Yuqing Li , Yuan Fu","doi":"10.1016/j.neunet.2026.108662","DOIUrl":"10.1016/j.neunet.2026.108662","url":null,"abstract":"<div><div>The competing endogenous RNA (ceRNA) regulatory network (CENA) plays a critical role in elucidating the molecular mechanisms of diseases. However, existing computational methods primarily focus on modeling local topological structures of biological networks, struggling to capture high-order regulatory relationships and global topological structures, thus limiting a deeper understanding of complex regulatory interactions.</div><div>To address this, we propose HpMiX, a Graph Topology-Constrained Mixup (GTCM) and hypergraph residual enhancement learning framework for the discovery of disease-related ceRNA biomarkers. This framework first constructs a CENA network encompassing multi-molecule associations, including miRNA, lncRNA, circRNA, and mRNA, and models higher-order regulatory relationships using K-hop hyperedges. Biologically meaningful initial features are then extracted from CENA via a multi-structure hypergraph weighted random walk method (MHWRW), integrating prior biological knowledge and regulatory information. Subsequently, graph topology-constrained Mixup and multi-head attention, combined with a residual hypergraph neural network, are employed to generate robust node embeddings with both local and global context, enabling the identification of potential disease-ceRNA biomarkers.</div><div>Prediction results across multiple disease biomarkers demonstrate that HpMiX significantly outperforms state-of-the-art methods, validating its effectiveness in biological regulatory network representation learning. Case studies further confirm that the framework can effectively identify differentially expressed ceRNAs in diseases, highlighting its potential as a tool for pre-screening high-probability disease biomarkers.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108662"},"PeriodicalIF":6.3,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.neunet.2026.108601
Dingyao Chen , Xiao Teng , Xingyu Shen , Xun Yang , Long Lan
Knowledge distillation (KD) is an effective strategy to transfer learned representations from a pre-trained teacher model to a smaller student model. Current methods for knowledge transfer from convolutional neural networks (CNNs) to vision transformers (ViTs) mainly align output logits. However, such approaches often overlook the rich semantic structures encoded in CNN features, thereby restricting ViTs from effectively inheriting the inductive biases inherent in convolutional architectures. To this end, this paper proposes a Feature-based CNN-to-ViT Structural Knowledge Distillation framework, dubbed FSKD, which combines the semantic structural knowledge embedded in CNN (teacher) features with the strength of ViT (student) in capturing long-range dependencies. Specifically, this framework includes a feature alignment module to bridge the representational gap between CNN and ViT features, and it incorporates a global feature alignment loss. Additionally, we develop patch-wise and attention-wise distillation losses to transfer inter-patch similarity and attention distribution, facilitating semantic structural knowledge transfer from CNNs to ViTs. Experimental results demonstrate that the proposed method considerably enhances ViT performance in visual recognition tasks, particularly under scenarios with limited data. Code is available at Github.
知识蒸馏(Knowledge distillation, KD)是一种将学习到的表征从预训练的教师模型转移到较小的学生模型的有效策略。目前从卷积神经网络(cnn)到视觉变压器(vit)的知识转移方法主要是对齐输出逻辑。然而,这些方法往往忽略了CNN特征中编码的丰富语义结构,从而限制了vit有效地继承卷积架构中固有的归纳偏差。为此,本文提出了一种基于特征的CNN- To -ViT结构知识蒸馏框架(FSKD),该框架将CNN(教师)特征中嵌入的语义结构知识与ViT(学生)在捕获远程依赖关系方面的优势结合起来。具体来说,该框架包括一个特征对齐模块,以弥合CNN和ViT特征之间的表征差距,并且它包含了一个全局特征对齐损失。此外,我们开发了patch-wise和attention-wise蒸馏损失来转移patch间的相似性和注意力分布,促进语义结构知识从cnn到vit的转移。实验结果表明,该方法显著提高了ViT在视觉识别任务中的性能,特别是在数据有限的情况下。代码可在Github上获得。
{"title":"Distilling structural knowledge from CNNs to vision transformers for data-efficient visual recognition","authors":"Dingyao Chen , Xiao Teng , Xingyu Shen , Xun Yang , Long Lan","doi":"10.1016/j.neunet.2026.108601","DOIUrl":"10.1016/j.neunet.2026.108601","url":null,"abstract":"<div><div>Knowledge distillation (KD) is an effective strategy to transfer learned representations from a pre-trained <span>teacher</span> model to a smaller <span>student</span> model. Current methods for knowledge transfer from convolutional neural networks (CNNs) to vision transformers (ViTs) mainly align output logits. However, such approaches often overlook the rich semantic structures encoded in CNN features, thereby restricting ViTs from effectively inheriting the inductive biases inherent in convolutional architectures. To this end, this paper proposes a <strong>F</strong>eature-based CNN-to-ViT <strong>S</strong>tructural <strong>K</strong>nowledge <strong>D</strong>istillation framework, dubbed <strong>FSKD</strong>, which combines the semantic structural knowledge embedded in CNN (<span>teacher</span>) features with the strength of ViT (<span>student</span>) in capturing long-range dependencies. Specifically, this framework includes a feature alignment module to bridge the representational gap between CNN and ViT features, and it incorporates a global feature alignment loss. Additionally, we develop patch-wise and attention-wise distillation losses to transfer inter-patch similarity and attention distribution, facilitating semantic structural knowledge transfer from CNNs to ViTs. Experimental results demonstrate that the proposed method considerably enhances ViT performance in visual recognition tasks, particularly under scenarios with limited data. Code is available at <span><span>Github</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108601"},"PeriodicalIF":6.3,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-25DOI: 10.1016/j.neunet.2026.108642
Yao Liang , Yuwei Wang , Yang Li , Yi Zeng
Parameter-efficient fine-tuning (PEFT) reduces the compute and memory demands of adapting large language models, yet standard low-rank adapters (e.g., LoRA) can lag full fine-tuning in performance and stability because they restrict updates to a fixed rank-r subspace. We propose Matrix-Transformation based Low-Rank Adaptation (MTLoRA), a brain-inspired extension that inserts a learnable r × r transformation T into the low-rank update (). By endowing the subspace with data-adapted geometry (e.g., rotations, scalings, and shears), MTLoRA reparameterizes the rank-r hypothesis class, improving its conditioning and inductive bias at negligible O(r2) overhead, and recovers LoRA when . We instantiate four structures for T—SHIM , ICFM , CTCM , and DTSM —providing complementary inductive biases (change of basis, PSD metric, staged mixing, dual superposition). An optimization analysis shows that T acts as a learned preconditioner within the subspace, yielding spectral-norm step-size bounds and operator-norm variance contraction that stabilize training. Empirically, MTLoRA delivers consistent gains while preserving PEFT efficiency: on GLUE (General Language Understanding Evaluation) with DeBERTaV3-base, MTLoRA improves the average over LoRA by (+2.0) points (86.9 → 88.9) and matches AdaLoRA (88.9) without any pruning schedule; on natural language generation with GPT-2 Medium, it raises BLEU on DART by (+0.95) and on WebNLG by (+0.56); and in multimodal instruction tuning with LLaVA-1.5-7B, DTSM attains the best average (69.91) with ∼ 4.7% trainable parameters, outperforming full fine-tuning and strong PEFT baselines. These results indicate that learning geometry inside the low-rank subspace improves both effectiveness and stability, making MTLoRA a practical, plug-compatible alternative to LoRA for large-model fine-tuning.
{"title":"Matrix-Transformation based Low-Rank Adaptation (MTLoRA): A brain-Inspired method for parameter-Efficient fine-Tuning","authors":"Yao Liang , Yuwei Wang , Yang Li , Yi Zeng","doi":"10.1016/j.neunet.2026.108642","DOIUrl":"10.1016/j.neunet.2026.108642","url":null,"abstract":"<div><div>Parameter-efficient fine-tuning (PEFT) reduces the compute and memory demands of adapting large language models, yet standard low-rank adapters (e.g., LoRA) can lag full fine-tuning in performance and stability because they restrict updates to a fixed rank-<em>r</em> subspace. We propose Matrix-Transformation based Low-Rank Adaptation (MTLoRA), a brain-inspired extension that inserts a learnable <em>r</em> × <em>r</em> transformation <em>T</em> into the low-rank update (<span><math><mrow><mstyle><mi>Δ</mi></mstyle><mi>W</mi><mo>=</mo><mi>B</mi><mi>T</mi><mi>A</mi></mrow></math></span>). By endowing the subspace with data-adapted geometry (e.g., rotations, scalings, and shears), MTLoRA reparameterizes the rank-<em>r</em> hypothesis class, improving its conditioning and inductive bias at negligible <em>O</em>(<em>r</em><sup>2</sup>) overhead, and recovers LoRA when <span><math><mrow><mi>T</mi><mo>=</mo><msub><mi>I</mi><mi>r</mi></msub></mrow></math></span>. We instantiate four structures for <em>T</em>—SHIM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><mo>)</mo></mrow></math></span>, ICFM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><msup><mi>C</mi><mi>⊤</mi></msup><mo>)</mo></mrow></math></span>, CTCM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><mi>D</mi><mo>)</mo></mrow></math></span>, and DTSM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><mo>+</mo><mi>D</mi><mo>)</mo></mrow></math></span>—providing complementary inductive biases (change of basis, PSD metric, staged mixing, dual superposition). An optimization analysis shows that <em>T</em> acts as a learned preconditioner within the subspace, yielding spectral-norm step-size bounds and operator-norm variance contraction that stabilize training. Empirically, MTLoRA delivers consistent gains while preserving PEFT efficiency: on GLUE (General Language Understanding Evaluation) with DeBERTaV3-base, MTLoRA improves the average over LoRA by (+2.0) points (86.9 → 88.9) and matches AdaLoRA (88.9) without any pruning schedule; on natural language generation with GPT-2 Medium, it raises BLEU on DART by (+0.95) and on WebNLG by (+0.56); and in multimodal instruction tuning with LLaVA-1.5-7B, DTSM attains the best average (69.91) with ∼ 4.7% trainable parameters, outperforming full fine-tuning and strong PEFT baselines. These results indicate that learning geometry inside the low-rank subspace improves both effectiveness and stability, making MTLoRA a practical, plug-compatible alternative to LoRA for large-model fine-tuning.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108642"},"PeriodicalIF":6.3,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-24DOI: 10.1016/j.neunet.2026.108641
Georg Kohl, Li-Wei Chen, Nils Thuerey
Simulating turbulent flows is crucial for a wide range of applications, and machine learning-based solvers are gaining increasing relevance. However, achieving temporal stability when generalizing to longer rollout horizons remains a persistent challenge for learned PDE solvers. In this work, we analyze if fully data-driven fluid solvers that utilize an autoregressive rollout based on conditional diffusion models are a viable option to address this challenge. We investigate accuracy, posterior sampling, spectral behavior, and temporal stability, while requiring that methods generalize to flow parameters beyond the training regime. To quantitatively and qualitatively benchmark the performance of various flow prediction approaches, three challenging 2D scenarios including incompressible and transonic flows, as well as isotropic turbulence are employed. We find that even simple diffusion-based approaches can outperform multiple established flow prediction methods in terms of accuracy and temporal stability, while being on par with state-of-the-art stabilization techniques like unrolling at training time. Such traditional architectures are superior in terms of inference speed, however, the probabilistic nature of diffusion approaches allows for inferring multiple predictions that align with the statistics of the underlying physics. Overall, our benchmark contains three carefully chosen data sets that are suitable for probabilistic evaluation alongside various established flow prediction architectures.
{"title":"Benchmarking autoregressive conditional diffusion models for turbulent flow simulation","authors":"Georg Kohl, Li-Wei Chen, Nils Thuerey","doi":"10.1016/j.neunet.2026.108641","DOIUrl":"10.1016/j.neunet.2026.108641","url":null,"abstract":"<div><div>Simulating turbulent flows is crucial for a wide range of applications, and machine learning-based solvers are gaining increasing relevance. However, achieving temporal stability when generalizing to longer rollout horizons remains a persistent challenge for learned PDE solvers. In this work, we analyze if fully data-driven fluid solvers that utilize an autoregressive rollout based on conditional diffusion models are a viable option to address this challenge. We investigate accuracy, posterior sampling, spectral behavior, and temporal stability, while requiring that methods generalize to flow parameters beyond the training regime. To quantitatively and qualitatively benchmark the performance of various flow prediction approaches, three challenging 2D scenarios including incompressible and transonic flows, as well as isotropic turbulence are employed. We find that even simple diffusion-based approaches can outperform multiple established flow prediction methods in terms of accuracy and temporal stability, while being on par with state-of-the-art stabilization techniques like unrolling at training time. Such traditional architectures are superior in terms of inference speed, however, the probabilistic nature of diffusion approaches allows for inferring multiple predictions that align with the statistics of the underlying physics. Overall, our benchmark contains three carefully chosen data sets that are suitable for probabilistic evaluation alongside various established flow prediction architectures.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108641"},"PeriodicalIF":6.3,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146151141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.neunet.2026.108615
Xinxin Li , Juan Zhang , Da Li , Xingyu Liu , Jin Xu , Junping Yin
Automatically discovering mathematical expressions is a challenging issue to precisely depict natural phenomena, in which Symbolic Regression (SR) is one of the most widely utilized techniques. Mainstream SR algorithms target on searching for the optimal symbolic tree, but the increasing complexity of the tree structure often limits their performance. Inspired by neural networks, symbolic networks have emerged as a promising new paradigm. However, existing symbolic networks still face certain challenges: binary nonlinear operators { × , ÷} cannot be naturally extended to multivariate, training with fixed architecture often leads to higher complexity and overfitting. In this work, we propose a Unified Symbolic Network that unifies nonlinear binary operators into nested unary operators, thereby transforming them into multivariate operators. The capability of the proposed UniSymNet is deduced from rigorous theoretical proof, resulting in lower complexity and stronger expressivity. Unlike the conventional neural network training, we design a bi-level optimization framework: the outer level pre-trains a Transformer with sparse label encoding scheme to guide UniSymNet structure selection, while the inner level employs objective-specific strategies to optimize network parameters. This allows for flexible adaptation of UniSymNet structures to different data, leading to reduced expression complexity. The UniSymNet is evaluated on low-dimensional Standard Benchmarks and high-dimensional SRBench, and shows excellent symbolic solution rate, high fitting accuracy, and relatively low expression complexity.
{"title":"UniSymNet: A Unified Symbolic Network with Sparse Encoding and Bi-level Optimization","authors":"Xinxin Li , Juan Zhang , Da Li , Xingyu Liu , Jin Xu , Junping Yin","doi":"10.1016/j.neunet.2026.108615","DOIUrl":"10.1016/j.neunet.2026.108615","url":null,"abstract":"<div><div>Automatically discovering mathematical expressions is a challenging issue to precisely depict natural phenomena, in which Symbolic Regression (SR) is one of the most widely utilized techniques. Mainstream SR algorithms target on searching for the optimal symbolic tree, but the increasing complexity of the tree structure often limits their performance. Inspired by neural networks, symbolic networks have emerged as a promising new paradigm. However, existing symbolic networks still face certain challenges: binary nonlinear operators { × , ÷} cannot be naturally extended to multivariate, training with fixed architecture often leads to higher complexity and overfitting. In this work, we propose a <strong>Uni</strong>fied <strong>Sym</strong>bolic <strong>Net</strong>work that unifies nonlinear binary operators into nested unary operators, thereby transforming them into multivariate operators. The capability of the proposed UniSymNet is deduced from rigorous theoretical proof, resulting in lower complexity and stronger expressivity. Unlike the conventional neural network training, we design a bi-level optimization framework: the outer level pre-trains a Transformer with sparse label encoding scheme to guide UniSymNet structure selection, while the inner level employs objective-specific strategies to optimize network parameters. This allows for flexible adaptation of UniSymNet structures to different data, leading to reduced expression complexity. The UniSymNet is evaluated on low-dimensional Standard Benchmarks and high-dimensional SRBench, and shows excellent symbolic solution rate, high fitting accuracy, and relatively low expression complexity.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108615"},"PeriodicalIF":6.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146108216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.neunet.2026.108620
Zongshen Mu , Yujie Wan , Yueting Zhuang , Jie Tan , Hong Cheng , Yueyang Wang
Financial sentiment analysis (FSA) refers to the task of classifying textual content into predefined sentiment categories to analyze their potential impacts on financial market fluctuations. However, directly applying these pre-trained LLMs to FSA still poses significant challenges. Existing approaches fail to align with domain-specific objectives and struggle to adapt to customized financial data schemas. Moreover, these LLMs predict the stock change primarily depending on its own information, failing to take into account cross-impact among relevant stocks. In this paper, we propose a novel framework that synergizes an LLM with a Graph Neural Network (GNN) to model stock price dynamics, leveraging stock sentiment signals extracted from financial news. Specifically, we employ the open-source Llama-3-8B model as the backbone, then enhance its sensitivity to financial sentiment patterns through supervised fine-tuning (SFT) and direct preference optimization (DPO) techniques. Leveraging the sentiment outputs from the fine-tuned LLM, we design a GNN to enhance stock representations and model cross-asset dependencies via two types of text-attributed graphs, which dynamically encode time-varying price correlations. Experiments on the Chinese A-share market demonstrate that financial sentiment significantly influences stock price variations. Our framework outperforms previous baselines and exhibits an average improvement of 50% in Sharpe ratio.
{"title":"Exploring financial sentiment analysis via fine-tuning large language model and attributed graph neural network","authors":"Zongshen Mu , Yujie Wan , Yueting Zhuang , Jie Tan , Hong Cheng , Yueyang Wang","doi":"10.1016/j.neunet.2026.108620","DOIUrl":"10.1016/j.neunet.2026.108620","url":null,"abstract":"<div><div>Financial sentiment analysis (FSA) refers to the task of classifying textual content into predefined sentiment categories to analyze their potential impacts on financial market fluctuations. However, directly applying these pre-trained LLMs to FSA still poses significant challenges. Existing approaches fail to align with domain-specific objectives and struggle to adapt to customized financial data schemas. Moreover, these LLMs predict the stock change primarily depending on its own information, failing to take into account cross-impact among relevant stocks. In this paper, we propose a novel framework that synergizes an LLM with a Graph Neural Network (GNN) to model stock price dynamics, leveraging stock sentiment signals extracted from financial news. Specifically, we employ the open-source Llama-3-8B model as the backbone, then enhance its sensitivity to financial sentiment patterns through supervised fine-tuning (SFT) and direct preference optimization (DPO) techniques. Leveraging the sentiment outputs from the fine-tuned LLM, we design a GNN to enhance stock representations and model cross-asset dependencies via two types of text-attributed graphs, which dynamically encode time-varying price correlations. Experiments on the Chinese A-share market demonstrate that financial sentiment significantly influences stock price variations. Our framework outperforms previous baselines and exhibits an average improvement of 50% in Sharpe ratio.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108620"},"PeriodicalIF":6.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.neunet.2026.108606
Shanzhi Gu , Zhaoyang Qu , Ruotong Geng , Mingyang Geng , Shangwen Wang , Chuanfu Xu , Haotian Wang , Zhipeng Lin , Dezun Dong
Large Language Models for Code (LLMs4Code) have achieved strong performance in code generation, but recent studies reveal that they may memorize and leak sensitive information contained in training data, posing serious privacy risks. To address this gap, this work presents the first comprehensive empirical study on applying machine unlearning to mitigate sensitive information leakage in LLMs4Code. We first construct a dedicated benchmark that includes: (i) a synthetic forget set containing diverse forms of personal information, and (ii) a retain set designed to evaluate whether code-generation capability is preserved after unlearning. Using this benchmark, we systematically assess three representative unlearning algorithms (GA, GA+GD, GA+KL) across three widely used open-source LLMs4Code models (AIXCoder-7B, CodeLlama-7B, CodeQwen-7B). Experimental results demonstrate that machine unlearning can substantially reduce direct memorization-based leakage: on average, the direct leak rate drops by more than 50% while retaining about over 91% of the original code-generation performance. Moreover, by analyzing post-unlearning outputs, we uncover a consistent shift from direct to indirect leakage, revealing an underexplored vulnerability that persists even when the target data has been successfully forgotten. Our findings show that machine unlearning is a feasible and effective solution for enhancing privacy protection in LLMs4Code, while also highlighting the need for future techniques capable of mitigating both direct and indirect leakage simultaneously.
大型代码语言模型(Large Language Models for Code, LLMs4Code)在代码生成方面取得了较强的性能,但最近的研究表明,它们可能会记忆和泄露训练数据中包含的敏感信息,带来严重的隐私风险。为了解决这一差距,这项工作提出了第一个应用机器学习来减轻LLMs4Code中敏感信息泄漏的综合实证研究。我们首先构建了一个专用基准,其中包括:(i)包含多种形式个人信息的合成遗忘集,以及(ii)用于评估遗忘后是否保留代码生成能力的保留集。利用这一基准,我们系统地评估了三种具有代表性的学习算法(GA, GA+GD, GA+KL),涵盖了三种广泛使用的开源LLMs4Code模型(aixcode - 7b, CodeLlama-7B, CodeQwen-7B)。实验结果表明,机器学习可以大大减少基于直接记忆的泄漏:平均而言,直接泄漏率下降了50%以上,同时保留了91%以上的原始代码生成性能。此外,通过分析学习后的输出,我们发现了从直接泄漏到间接泄漏的一致转变,揭示了一个未被充分探索的漏洞,即使目标数据已经被成功遗忘,这个漏洞仍然存在。我们的研究结果表明,机器学习是增强LLMs4Code中隐私保护的可行且有效的解决方案,同时也强调了对未来能够同时减轻直接和间接泄漏的技术的需求。
{"title":"Mitigating sensitive information leakage in LLMs4Code through machine unlearning","authors":"Shanzhi Gu , Zhaoyang Qu , Ruotong Geng , Mingyang Geng , Shangwen Wang , Chuanfu Xu , Haotian Wang , Zhipeng Lin , Dezun Dong","doi":"10.1016/j.neunet.2026.108606","DOIUrl":"10.1016/j.neunet.2026.108606","url":null,"abstract":"<div><div>Large Language Models for Code (LLMs4Code) have achieved strong performance in code generation, but recent studies reveal that they may memorize and leak sensitive information contained in training data, posing serious privacy risks. To address this gap, this work presents the first comprehensive empirical study on applying machine unlearning to mitigate sensitive information leakage in LLMs4Code. We first construct a dedicated benchmark that includes: (i) a synthetic <em>forget set</em> containing diverse forms of personal information, and (ii) a <em>retain set</em> designed to evaluate whether code-generation capability is preserved after unlearning. Using this benchmark, we systematically assess three representative unlearning algorithms (GA, GA+GD, GA+KL) across three widely used open-source LLMs4Code models (AIXCoder-7B, CodeLlama-7B, CodeQwen-7B). Experimental results demonstrate that machine unlearning can substantially reduce direct memorization-based leakage: on average, the direct leak rate drops by more than <strong>50%</strong> while retaining about <strong>over 91%</strong> of the original code-generation performance. Moreover, by analyzing post-unlearning outputs, we uncover a consistent shift from <em>direct</em> to <em>indirect</em> leakage, revealing an underexplored vulnerability that persists even when the target data has been successfully forgotten. Our findings show that machine unlearning is a feasible and effective solution for enhancing privacy protection in LLMs4Code, while also highlighting the need for future techniques capable of mitigating both direct and indirect leakage simultaneously.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108606"},"PeriodicalIF":6.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}