Pub Date : 2026-07-01Epub Date: 2026-02-03DOI: 10.1016/j.neunet.2026.108673
Chao Chen , Xujia Li , Dongsheng Hong , Shanshan Lin , Xiangwen Liao , Chuanyi Liu , Lei Chen
The challenges of training and inference in few-shot environments persist in the area of graph representation learning. The quality and quantity of labels are often insufficient due to the extensive expert knowledge required to annotate graph data. In this context, Few-Shot Graph Learning (FSGL) approaches have been developed over the years. Through sophisticated neural architectures and customized training pipelines, these approaches enhance model adaptability to new label distributions. However, compromises in the model’s robustness and interpretability can result in overfitting to noise in labeled data and degraded performance. This paper introduces the first explanation-in-the-loop framework for the FSGL problem, called BAED. We novelly employ the belief propagation algorithm to facilitate label augmentation on graphs. Then, leveraging an auxiliary graph neural network and the gradient backpropagation method, our framework effectively extracts explanatory subgraphs surrounding target nodes. The final predictions are based on these informative subgraphs while mitigating the influence of redundant information from neighboring nodes. Extensive experiments on seven benchmark datasets demonstrate superior prediction accuracy, training efficiency, and explanation quality of BAED. As a pioneer, this work highlights the potential of the explanation-based research paradigm in FSGL.
{"title":"BAED: A new paradigm for few-shot graph learning with explanation in the loop","authors":"Chao Chen , Xujia Li , Dongsheng Hong , Shanshan Lin , Xiangwen Liao , Chuanyi Liu , Lei Chen","doi":"10.1016/j.neunet.2026.108673","DOIUrl":"10.1016/j.neunet.2026.108673","url":null,"abstract":"<div><div>The challenges of training and inference in few-shot environments persist in the area of graph representation learning. The quality and quantity of labels are often insufficient due to the extensive expert knowledge required to annotate graph data. In this context, Few-Shot Graph Learning (FSGL) approaches have been developed over the years. Through sophisticated neural architectures and customized training pipelines, these approaches enhance model adaptability to new label distributions. However, compromises in the model’s robustness and interpretability can result in overfitting to noise in labeled data and degraded performance. This paper introduces the first explanation-in-the-loop framework for the FSGL problem, called BAED. We novelly employ the belief propagation algorithm to facilitate label augmentation on graphs. Then, leveraging an auxiliary graph neural network and the gradient backpropagation method, our framework effectively extracts explanatory subgraphs surrounding target nodes. The final predictions are based on these informative subgraphs while mitigating the influence of redundant information from neighboring nodes. Extensive experiments on seven benchmark datasets demonstrate superior prediction accuracy, training efficiency, and explanation quality of BAED. As a pioneer, this work highlights the potential of the explanation-based research paradigm in FSGL.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108673"},"PeriodicalIF":6.3,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146158712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-04DOI: 10.1016/j.neunet.2026.108684
Zhijun Zhang, Xitong Gao, Jinjia Guo
To solve the repetitive motion problem of redundant robotic manipulators, a punishment neural network-based acceleration-level joint drift-free (PNN-ALJDF) scheme is designed. Traditional joint physical limits constraints are fixed and lack margin. Thus, a novel joint acceleration time-varying constraint is considered in the PNN-ALJDF scheme to avoid the joint state exceeding the physical limits. In addition, to ensure that redundant robotic manipulators can periodically return to the initial pose, a joint drift-free criterion is designed. Furthermore, the joint drift-free criterion, kinematics equation and joint acceleration time-varying constraint are formulated globally as an acceleration-level joint drift-free (ALJDF) scheme by a time-varying quadratic programming approach. Then, the ALJDF scheme is solved by the designed punishment neural network. Thus, the proposed PNN-ALJDF scheme is composed of the ALJDF scheme and punishment neural network. Finally, the simulations demonstrate that the PNN-ALJDF scheme avoids joints from drifting, and the states of joints are all within the acceleration time-varying constraint. In addition, the proposed PNN-ALJDF has higher solution accuracy than the linear variational inequalities-based primal-dual neural network.
{"title":"A punishment neural network-based acceleration-level joint drift-free scheme for solving constrained motion planning problem of redundant robotic manipulators","authors":"Zhijun Zhang, Xitong Gao, Jinjia Guo","doi":"10.1016/j.neunet.2026.108684","DOIUrl":"10.1016/j.neunet.2026.108684","url":null,"abstract":"<div><div>To solve the repetitive motion problem of redundant robotic manipulators, a punishment neural network-based acceleration-level joint drift-free (PNN-ALJDF) scheme is designed. Traditional joint physical limits constraints are fixed and lack margin. Thus, a novel joint acceleration time-varying constraint is considered in the PNN-ALJDF scheme to avoid the joint state exceeding the physical limits. In addition, to ensure that redundant robotic manipulators can periodically return to the initial pose, a joint drift-free criterion is designed. Furthermore, the joint drift-free criterion, kinematics equation and joint acceleration time-varying constraint are formulated globally as an acceleration-level joint drift-free (ALJDF) scheme by a time-varying quadratic programming approach. Then, the ALJDF scheme is solved by the designed punishment neural network. Thus, the proposed PNN-ALJDF scheme is composed of the ALJDF scheme and punishment neural network. Finally, the simulations demonstrate that the PNN-ALJDF scheme avoids joints from drifting, and the states of joints are all within the acceleration time-varying constraint. In addition, the proposed PNN-ALJDF has higher solution accuracy than the linear variational inequalities-based primal-dual neural network.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108684"},"PeriodicalIF":6.3,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146158685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-29DOI: 10.1016/j.neunet.2026.108666
Geonhui Son , Jeong Ryong Lee , Dosik Hwang
Generative Adversarial Networks (GANs) have made significant progress in enhancing the quality of image synthesis. Recent methods frequently leverage pretrained networks to calculate perceptual losses or utilize pretrained feature spaces. In this paper, we extend the capabilities of pretrained networks by incorporating innovative self-supervised learning techniques and enforcing consistency between discriminators during GAN training. Our proposed method, named HP-GAN, effectively exploits neural network priors through two primary strategies: FakeTwins and discriminator consistency. FakeTwins leverages pretrained networks as encoders to compute a self-supervised loss and applies this through the generated images to train the generator, thereby enabling the generation of more diverse and high quality images. Additionally, we introduce a consistency mechanism between discriminators that evaluate feature maps extracted from Convolutional Neural Network (CNN) and Vision Transformer (ViT) feature networks. Discriminator consistency promotes coherent learning among discriminators and enhances training robustness by aligning their assessments of image quality. Our extensive evaluation across seventeen datasets-including scenarios with large, small, and limited data, and covering a variety of image domains-demonstrates that HP-GAN consistently outperforms current state-of-the-art methods in terms of Fréchet Inception Distance (FID), achieving significant improvements in image diversity and quality. Code is available at: https://github.com/higun2/HP-GAN.
{"title":"HP-GAN: Harnessing pretrained networks for GAN improvement with FakeTwins and discriminator consistency","authors":"Geonhui Son , Jeong Ryong Lee , Dosik Hwang","doi":"10.1016/j.neunet.2026.108666","DOIUrl":"10.1016/j.neunet.2026.108666","url":null,"abstract":"<div><div>Generative Adversarial Networks (GANs) have made significant progress in enhancing the quality of image synthesis. Recent methods frequently leverage pretrained networks to calculate perceptual losses or utilize pretrained feature spaces. In this paper, we extend the capabilities of pretrained networks by incorporating innovative self-supervised learning techniques and enforcing consistency between discriminators during GAN training. Our proposed method, named HP-GAN, effectively exploits neural network priors through two primary strategies: FakeTwins and discriminator consistency. FakeTwins leverages pretrained networks as encoders to compute a self-supervised loss and applies this through the generated images to train the generator, thereby enabling the generation of more diverse and high quality images. Additionally, we introduce a consistency mechanism between discriminators that evaluate feature maps extracted from Convolutional Neural Network (CNN) and Vision Transformer (ViT) feature networks. Discriminator consistency promotes coherent learning among discriminators and enhances training robustness by aligning their assessments of image quality. Our extensive evaluation across seventeen datasets-including scenarios with large, small, and limited data, and covering a variety of image domains-demonstrates that HP-GAN consistently outperforms current state-of-the-art methods in terms of Fréchet Inception Distance (FID), achieving significant improvements in image diversity and quality. Code is available at: <span><span>https://github.com/higun2/HP-GAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108666"},"PeriodicalIF":6.3,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-29DOI: 10.1016/j.neunet.2026.108665
Cheng Peng , Zeze Tao , Junyu Liu , Jinjia Peng
Transfer-based attack generates adversarial examples on a surrogate model and exploits the intriguing property of transferability to deceive other unknown models, making it practical for real-world scenarios. Recent research has sought to optimize the loss surface by minimizing its maximum loss, which in practice cannot be computed exactly and is instead approximated through gradient ascent. However, the loss landscape becomes increasingly non-linear during later attack stages, making the gradient ascent less effective. To address this challenge, we propose a novel attack called Curvature-Aware Penalization (CAP), which incorporates the gradient norm and the curvature-aware term as regularization terms to maintain the flatness of the loss surface. Since directly computing the Hessian matrix is computationally expensive, we utilize the finite difference method to reduce computational complexity. Specifically, we randomly sample an example from the neighborhood and interpolate gradients at three neighboring points along the example’s gradient direction to approximate the Hessian. Additionally, to reduce the variance caused by random sampling, the combined gradients are averaged over multiple stochastic samples. Comprehensive experimental results demonstrate that our CAP can not only craft adversarial examples with enhanced transferability across various network architectures but also exhibit stronger resistance to state-of-the-art adversarial defense methods. Code is available at https://github.com/PC614/CAP.
{"title":"Enhancing adversarial transferability via curvature-aware penalization","authors":"Cheng Peng , Zeze Tao , Junyu Liu , Jinjia Peng","doi":"10.1016/j.neunet.2026.108665","DOIUrl":"10.1016/j.neunet.2026.108665","url":null,"abstract":"<div><div>Transfer-based attack generates adversarial examples on a surrogate model and exploits the intriguing property of transferability to deceive other unknown models, making it practical for real-world scenarios. Recent research has sought to optimize the loss surface by minimizing its maximum loss, which in practice cannot be computed exactly and is instead approximated through gradient ascent. However, the loss landscape becomes increasingly non-linear during later attack stages, making the gradient ascent less effective. To address this challenge, we propose a novel attack called Curvature-Aware Penalization (CAP), which incorporates the gradient norm and the curvature-aware term as regularization terms to maintain the flatness of the loss surface. Since directly computing the Hessian matrix is computationally expensive, we utilize the finite difference method to reduce computational complexity. Specifically, we randomly sample an example from the neighborhood and interpolate gradients at three neighboring points along the example’s gradient direction to approximate the Hessian. Additionally, to reduce the variance caused by random sampling, the combined gradients are averaged over multiple stochastic samples. Comprehensive experimental results demonstrate that our CAP can not only craft adversarial examples with enhanced transferability across various network architectures but also exhibit stronger resistance to state-of-the-art adversarial defense methods. Code is available at <span><span>https://github.com/PC614/CAP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108665"},"PeriodicalIF":6.3,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-05DOI: 10.1016/j.neunet.2026.108690
Chen Guan , Haihong Ai , Weiwei Wang , Ravi P. Singh , Shiya Song
Diffusion models have application potential in medical image classification tasks due to their effectiveness in eliminating unexpected noise and perturbations from medical images. However, existing diffusion models for medical image classification utilize image features as the condition guiding diffusion model denoising, neglecting the most critical structured semantic information within medical images—namely, the mask of the lesion region. This results in suboptimal denoising performance from diffusion models, consequently impairing classification performance. To address this issue, we propose a diffusion model with the mask-conditioned guiding module called DiffMCG. Specifically, we introduce the Mask-Conditioned Guiding (MCG) module that concurrently extracts features from the medical image and its corresponding mask. Secondly, we design a U-Net denoising network based on the multi-layer perceptron (MLP) that is tailored for low-dimensional vector data and performs denoising tasks within the category label space. Furthermore, we introduce the MMD regularization constraint loss to establish a distributional relationship between the image prediction distribution, mask prediction distribution, and ground-truth label distribution within the label prediction space. This ensures the consistency of multimodal information during the diffusion process. Through analysis of comparative and ablation experiments, we validate the advantages of the MCG module in medical image classification, providing technical support for precision medical diagnostics.
{"title":"DiffMCG: A diffusion model with mask-conditioned guiding module for medical image classification","authors":"Chen Guan , Haihong Ai , Weiwei Wang , Ravi P. Singh , Shiya Song","doi":"10.1016/j.neunet.2026.108690","DOIUrl":"10.1016/j.neunet.2026.108690","url":null,"abstract":"<div><div>Diffusion models have application potential in medical image classification tasks due to their effectiveness in eliminating unexpected noise and perturbations from medical images. However, existing diffusion models for medical image classification utilize image features as the condition guiding diffusion model denoising, neglecting the most critical structured semantic information within medical images—namely, the mask of the lesion region. This results in suboptimal denoising performance from diffusion models, consequently impairing classification performance. To address this issue, we propose a diffusion model with the mask-conditioned guiding module called DiffMCG. Specifically, we introduce the Mask-Conditioned Guiding (MCG) module that concurrently extracts features from the medical image and its corresponding mask. Secondly, we design a U-Net denoising network based on the multi-layer perceptron (MLP) that is tailored for low-dimensional vector data and performs denoising tasks within the category label space. Furthermore, we introduce the MMD regularization constraint loss to establish a distributional relationship between the image prediction distribution, mask prediction distribution, and ground-truth label distribution within the label prediction space. This ensures the consistency of multimodal information during the diffusion process. Through analysis of comparative and ablation experiments, we validate the advantages of the MCG module in medical image classification, providing technical support for precision medical diagnostics.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108690"},"PeriodicalIF":6.3,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146167526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-25DOI: 10.1016/j.neunet.2026.108642
Yao Liang , Yuwei Wang , Yang Li , Yi Zeng
Parameter-efficient fine-tuning (PEFT) reduces the compute and memory demands of adapting large language models, yet standard low-rank adapters (e.g., LoRA) can lag full fine-tuning in performance and stability because they restrict updates to a fixed rank-r subspace. We propose Matrix-Transformation based Low-Rank Adaptation (MTLoRA), a brain-inspired extension that inserts a learnable r × r transformation T into the low-rank update (). By endowing the subspace with data-adapted geometry (e.g., rotations, scalings, and shears), MTLoRA reparameterizes the rank-r hypothesis class, improving its conditioning and inductive bias at negligible O(r2) overhead, and recovers LoRA when . We instantiate four structures for T—SHIM , ICFM , CTCM , and DTSM —providing complementary inductive biases (change of basis, PSD metric, staged mixing, dual superposition). An optimization analysis shows that T acts as a learned preconditioner within the subspace, yielding spectral-norm step-size bounds and operator-norm variance contraction that stabilize training. Empirically, MTLoRA delivers consistent gains while preserving PEFT efficiency: on GLUE (General Language Understanding Evaluation) with DeBERTaV3-base, MTLoRA improves the average over LoRA by (+2.0) points (86.9 → 88.9) and matches AdaLoRA (88.9) without any pruning schedule; on natural language generation with GPT-2 Medium, it raises BLEU on DART by (+0.95) and on WebNLG by (+0.56); and in multimodal instruction tuning with LLaVA-1.5-7B, DTSM attains the best average (69.91) with ∼ 4.7% trainable parameters, outperforming full fine-tuning and strong PEFT baselines. These results indicate that learning geometry inside the low-rank subspace improves both effectiveness and stability, making MTLoRA a practical, plug-compatible alternative to LoRA for large-model fine-tuning.
{"title":"Matrix-Transformation based Low-Rank Adaptation (MTLoRA): A brain-Inspired method for parameter-Efficient fine-Tuning","authors":"Yao Liang , Yuwei Wang , Yang Li , Yi Zeng","doi":"10.1016/j.neunet.2026.108642","DOIUrl":"10.1016/j.neunet.2026.108642","url":null,"abstract":"<div><div>Parameter-efficient fine-tuning (PEFT) reduces the compute and memory demands of adapting large language models, yet standard low-rank adapters (e.g., LoRA) can lag full fine-tuning in performance and stability because they restrict updates to a fixed rank-<em>r</em> subspace. We propose Matrix-Transformation based Low-Rank Adaptation (MTLoRA), a brain-inspired extension that inserts a learnable <em>r</em> × <em>r</em> transformation <em>T</em> into the low-rank update (<span><math><mrow><mstyle><mi>Δ</mi></mstyle><mi>W</mi><mo>=</mo><mi>B</mi><mi>T</mi><mi>A</mi></mrow></math></span>). By endowing the subspace with data-adapted geometry (e.g., rotations, scalings, and shears), MTLoRA reparameterizes the rank-<em>r</em> hypothesis class, improving its conditioning and inductive bias at negligible <em>O</em>(<em>r</em><sup>2</sup>) overhead, and recovers LoRA when <span><math><mrow><mi>T</mi><mo>=</mo><msub><mi>I</mi><mi>r</mi></msub></mrow></math></span>. We instantiate four structures for <em>T</em>—SHIM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><mo>)</mo></mrow></math></span>, ICFM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><msup><mi>C</mi><mi>⊤</mi></msup><mo>)</mo></mrow></math></span>, CTCM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><mi>D</mi><mo>)</mo></mrow></math></span>, and DTSM <span><math><mrow><mo>(</mo><mi>T</mi><mo>=</mo><mi>C</mi><mo>+</mo><mi>D</mi><mo>)</mo></mrow></math></span>—providing complementary inductive biases (change of basis, PSD metric, staged mixing, dual superposition). An optimization analysis shows that <em>T</em> acts as a learned preconditioner within the subspace, yielding spectral-norm step-size bounds and operator-norm variance contraction that stabilize training. Empirically, MTLoRA delivers consistent gains while preserving PEFT efficiency: on GLUE (General Language Understanding Evaluation) with DeBERTaV3-base, MTLoRA improves the average over LoRA by (+2.0) points (86.9 → 88.9) and matches AdaLoRA (88.9) without any pruning schedule; on natural language generation with GPT-2 Medium, it raises BLEU on DART by (+0.95) and on WebNLG by (+0.56); and in multimodal instruction tuning with LLaVA-1.5-7B, DTSM attains the best average (69.91) with ∼ 4.7% trainable parameters, outperforming full fine-tuning and strong PEFT baselines. These results indicate that learning geometry inside the low-rank subspace improves both effectiveness and stability, making MTLoRA a practical, plug-compatible alternative to LoRA for large-model fine-tuning.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108642"},"PeriodicalIF":6.3,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-30DOI: 10.1016/j.neunet.2026.108670
Tianyu Wang , Maite Zhang , Mingxuan Lu , Mian Li
In real-world applications, tabular datasets often evolve over time, leading to temporal shift that degrades the long-range neural network performance. Most existing temporal encoding or adaptation solutions treat time cues as fixed auxiliary variables at a single scale. Motivated by the multi-horizon nature of temporal shifts with heterogeneous temporal dynamics, this paper presents TARS (Temporal Abstraction with Routed Scales), a novel plug-and-play method for robust tabular learning under temporal shift, applicable to various deep learning model backbones. First, an explicit temporal encoder decomposes timestamps into short-term recency, mid-term periodicity, and long-term contextual embeddings with structured memory. Next, an implicit drift encoder tracks higher-order distributional statistics at the same aligned timescales, producing drift signals that reflect ongoing temporal dynamics. These signals drive a drift-aware routing mechanism that adaptively weights the explicit temporal pathways, emphasizing the most relevant timescales under current conditions. Finally, a feature-temporal fusion layer integrates the routed temporal representation with original features, injecting context-aware bias. Extensive experiments on eight real-world datasets from the TabReD benchmark show that TARS consistently outperforms the competitive compared methods across various backbone models, achieving up to +2.38% average relative improvement on MLP, +4.08% on DCNv2, etc. Ablation studies verify the complementary contributions of all four modules. These results highlight the effectiveness of TARS for improving the temporal robustness of existing deep tabular models.
在现实世界的应用中,表格数据集经常随着时间的推移而变化,导致时间的变化,从而降低了远程神经网络的性能。大多数现有的时间编码或自适应解决方案将时间线索视为单一尺度上的固定辅助变量。摘要针对具有异构时间动态的时间转移的多视界特性,提出了一种适用于各种深度学习模型主干的时间转移鲁棒表格学习的即插即用方法TARS (temporal Abstraction with routing Scales)。首先,显式时间编码器将时间戳分解为具有结构化记忆的短期近期性、中期周期性和长期上下文嵌入。接下来,隐式漂移编码器在相同的对齐时间尺度上跟踪高阶分布统计数据,产生反映持续时间动态的漂移信号。这些信号驱动漂移感知路由机制,该机制自适应地加权显式时间路径,强调当前条件下最相关的时间尺度。最后,特征时间融合层将路由的时间表示与原始特征集成在一起,注入上下文感知偏差。在TabReD基准测试的8个真实数据集上进行的大量实验表明,TARS在各种骨干模型中始终优于竞争性比较方法,在MLP上实现了+2.38%的平均相对改进,在DCNv2等上实现了+4.08%的平均相对改进。消融研究证实了所有四个模块的互补贡献。这些结果突出了TARS在提高现有深度表格模型的时间鲁棒性方面的有效性。
{"title":"Multi-timescale representation with adaptive routing for deep tabular learning under temporal shift","authors":"Tianyu Wang , Maite Zhang , Mingxuan Lu , Mian Li","doi":"10.1016/j.neunet.2026.108670","DOIUrl":"10.1016/j.neunet.2026.108670","url":null,"abstract":"<div><div>In real-world applications, tabular datasets often evolve over time, leading to temporal shift that degrades the long-range neural network performance. Most existing temporal encoding or adaptation solutions treat time cues as fixed auxiliary variables at a single scale. Motivated by the multi-horizon nature of temporal shifts with heterogeneous temporal dynamics, this paper presents TARS (Temporal Abstraction with Routed Scales), a novel plug-and-play method for robust tabular learning under temporal shift, applicable to various deep learning model backbones. First, an explicit temporal encoder decomposes timestamps into short-term recency, mid-term periodicity, and long-term contextual embeddings with structured memory. Next, an implicit drift encoder tracks higher-order distributional statistics at the same aligned timescales, producing drift signals that reflect ongoing temporal dynamics. These signals drive a drift-aware routing mechanism that adaptively weights the explicit temporal pathways, emphasizing the most relevant timescales under current conditions. Finally, a feature-temporal fusion layer integrates the routed temporal representation with original features, injecting context-aware bias. Extensive experiments on eight real-world datasets from the TabReD benchmark show that TARS consistently outperforms the competitive compared methods across various backbone models, achieving up to +2.38% average relative improvement on MLP, +4.08% on DCNv2, etc. Ablation studies verify the complementary contributions of all four modules. These results highlight the effectiveness of TARS for improving the temporal robustness of existing deep tabular models.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108670"},"PeriodicalIF":6.3,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-04DOI: 10.1016/j.neunet.2026.108671
Yiyao Xu , Mengxin Wang , Ruoyu Yuan , Sitian Qin
Game theory provides important theoretical and methodological support for the application expansion of deep learning in complex interaction scenarios, multi-player systems and other aspects. Networked game as a key problem has been investigated in this paper. Since it is inevitable for neighbor players to communicate with each other, key challenges in practical applications is to reduce the communication cost and improve the convergence rate. Also, as the fast development of cyber physical system, players need to obey identical intrinsic dynamics. Hence, in this paper, the communication channel is equipped with one-to-one gradient-based event trigger and logarithmic quantizer, which effectively alleviate the communication burden and reduce communication frequency. Moreover, the passivity-based strategies is used to compensate the lack of complete information, while a piecewise time-varying function is introduced to ensure prescribed time convergence. Besides, proper control input is designed for heterogeneous dynamics players to track Nash equilibrium(NE). It is proven by Lyapunov method that the two-level neurodynamic owns convergence within adjustable time. Additionally, Zeno behavior is excluded. Finally, a numerical example of connectivity control problem for autonomous mobile robots is provided to demonstrate the effectiveness of the proposed neurodynamic approach.
{"title":"A two-level neurodynamic approach for heterogeneous networked game under event-triggered quantized mechanism","authors":"Yiyao Xu , Mengxin Wang , Ruoyu Yuan , Sitian Qin","doi":"10.1016/j.neunet.2026.108671","DOIUrl":"10.1016/j.neunet.2026.108671","url":null,"abstract":"<div><div>Game theory provides important theoretical and methodological support for the application expansion of deep learning in complex interaction scenarios, multi-player systems and other aspects. Networked game as a key problem has been investigated in this paper. Since it is inevitable for neighbor players to communicate with each other, key challenges in practical applications is to reduce the communication cost and improve the convergence rate. Also, as the fast development of cyber physical system, players need to obey identical intrinsic dynamics. Hence, in this paper, the communication channel is equipped with one-to-one gradient-based event trigger and logarithmic quantizer, which effectively alleviate the communication burden and reduce communication frequency. Moreover, the passivity-based strategies is used to compensate the lack of complete information, while a piecewise time-varying function is introduced to ensure prescribed time convergence. Besides, proper control input is designed for heterogeneous dynamics players to track Nash equilibrium(NE). It is proven by Lyapunov method that the two-level neurodynamic owns convergence within adjustable time. Additionally, Zeno behavior is excluded. Finally, a numerical example of connectivity control problem for autonomous mobile robots is provided to demonstrate the effectiveness of the proposed neurodynamic approach.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108671"},"PeriodicalIF":6.3,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146158725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-03DOI: 10.1016/j.neunet.2026.108674
Kang Liu , Yuqi Zhang , Shunzhi Yang , Chang-Dong Wang , Yunwen Chen , Xiaowen Ma , Zhenhua Huang
Existing graph knowledge distillation methods suffer from limited absorption of the teacher’s “dark knowledge” because they rely on simple logit alignment, which often causes overfitting or incomplete capture of underlying patterns. Additionally, relying on a single perspective severely restricts the student’s learning effectiveness and generalization ability. To address these issues, we develop a novel Multiple Interpretation Ensemble Distillation (MIED) method. It constructs a multi-interpreter composed of multiple single-layer MLPs for the student, termed the Student Interpretation (SI) component, to interpret knowledge from diversified outputs, thus avoiding representational bias from a single student output. Based on this, it introduces two effective strategies, i.e., Hybrid Sampling and Hierarchical Update. The former employs different sampling strategies for the outputs of the teacher and student (including the SI component). Specifically, the teacher’s output adopts a percentage random sampler, while the outputs of the student and SI component both leverage a positive-negative sampler. With this design, MIED can facilitate better coordination of sample selection and the learning process among the teacher, student, and SI component. The latter updates the parameters of the last layer in the student using the exponential moving average of the fused parameters of the SI component, while the parameters of other layers are updated via a regular optimizer. This enhances the robustness and generalization performance of MIED. Extensive experiments on seven real-world public datasets demonstrate that MIED outperforms existing methods in node classification tasks, resulting in an average improvement of 5.56% over GCN and 27.43% over MLP, respectively. Moreover, compared with directly using multiple students (where the number is consistent with the number of layers in the SI component), MIED achieves improvements approximately 6.00% in time, 50.00% in space, and 0.20% in accuracy. These results indicate that MIED is scalable and generalizable, and exhibits robustness on complex samples.
{"title":"Multiple interpretation ensemble distillation for graph neural networks","authors":"Kang Liu , Yuqi Zhang , Shunzhi Yang , Chang-Dong Wang , Yunwen Chen , Xiaowen Ma , Zhenhua Huang","doi":"10.1016/j.neunet.2026.108674","DOIUrl":"10.1016/j.neunet.2026.108674","url":null,"abstract":"<div><div>Existing graph knowledge distillation methods suffer from limited absorption of the teacher’s “dark knowledge” because they rely on simple logit alignment, which often causes overfitting or incomplete capture of underlying patterns. Additionally, relying on a single perspective severely restricts the student’s learning effectiveness and generalization ability. To address these issues, we develop a novel Multiple Interpretation Ensemble Distillation (MIED) method. It constructs a multi-interpreter composed of multiple single-layer MLPs for the student, termed the Student Interpretation (SI) component, to interpret knowledge from diversified outputs, thus avoiding representational bias from a single student output. Based on this, it introduces two effective strategies, i.e., Hybrid Sampling and Hierarchical Update. The former employs different sampling strategies for the outputs of the teacher and student (including the SI component). Specifically, the teacher’s output adopts a percentage random sampler, while the outputs of the student and SI component both leverage a positive-negative sampler. With this design, MIED can facilitate better coordination of sample selection and the learning process among the teacher, student, and SI component. The latter updates the parameters of the last layer in the student using the exponential moving average of the fused parameters of the SI component, while the parameters of other layers are updated via a regular optimizer. This enhances the robustness and generalization performance of MIED. Extensive experiments on seven real-world public datasets demonstrate that MIED outperforms existing methods in node classification tasks, resulting in an average improvement of 5.56% over GCN and 27.43% over MLP, respectively. Moreover, compared with directly using multiple students (where the number is consistent with the number of layers in the SI component), MIED achieves improvements approximately 6.00% in time, 50.00% in space, and 0.20% in accuracy. These results indicate that MIED is scalable and generalizable, and exhibits robustness on complex samples.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108674"},"PeriodicalIF":6.3,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146167504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-29DOI: 10.1016/j.neunet.2026.108661
Shuai Pang, Chunhua Hu, Juan Zhao, Haifang Yu
To address the differences and correlations between features, as well as to fully utilize the importance of salient semantics in medical image classification tasks, this paper proposes an Interactive Axial Feature Selection Network (IAFSNet), aimed at improving feature representation, effectively filtering noise during classification, thereby enhancing classification performance. The paper introduces a newly designed Feature Interaction Module (FIM), which learns spatial differences between various features and enhances the interdependence and complementarity between local spatial features and global contextual semantics. Additionally, the paper implements a novel Axial Feature Selection Module (AFSM), which filters salient feature semantics from three perspectives: horizontal, vertical, and spatial. By adjusting thresholds, salient features are emphasized while irrelevant noise is eliminated, allowing these key features to cross-aggregate layer by layer and establish interactions among them, ultimately improving classification accuracy. Experimental results on four benchmark datasets demonstrate that the proposed IAFSNet exhibits excellent classification performance and robustness, significantly outperforming many existing classification methods.
{"title":"An interactive axial feature selection network for medical image classification","authors":"Shuai Pang, Chunhua Hu, Juan Zhao, Haifang Yu","doi":"10.1016/j.neunet.2026.108661","DOIUrl":"10.1016/j.neunet.2026.108661","url":null,"abstract":"<div><div>To address the differences and correlations between features, as well as to fully utilize the importance of salient semantics in medical image classification tasks, this paper proposes an Interactive Axial Feature Selection Network (IAFSNet), aimed at improving feature representation, effectively filtering noise during classification, thereby enhancing classification performance. The paper introduces a newly designed Feature Interaction Module (FIM), which learns spatial differences between various features and enhances the interdependence and complementarity between local spatial features and global contextual semantics. Additionally, the paper implements a novel Axial Feature Selection Module (AFSM), which filters salient feature semantics from three perspectives: horizontal, vertical, and spatial. By adjusting thresholds, salient features are emphasized while irrelevant noise is eliminated, allowing these key features to cross-aggregate layer by layer and establish interactions among them, ultimately improving classification accuracy. Experimental results on four benchmark datasets demonstrate that the proposed IAFSNet exhibits excellent classification performance and robustness, significantly outperforming many existing classification methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108661"},"PeriodicalIF":6.3,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}