Recent advancements in visual speech recognition (VSR) have promoted progress in lip-to-speech synthesis, where pre-trained VSR models enhance the intelligibility of synthesized speech by providing valuable semantic information. The success achieved by cascade frameworks, which combine pseudo-VSR with pseudo-text-to-speech (TTS) or implicitly utilize the transcribed text, highlights the benefits of leveraging VSR models. However, these methods typically rely on mel-spectrograms as an intermediate representation, which may introduce a key bottleneck: the domain gap between synthetic mel-spectrograms, generated from inherently error-prone lip-to-speech mappings, and real mel-spectrograms used to train vocoders. This mismatch inevitably degrades synthesis quality. To bridge this gap, we propose Natural Lip-to-Speech (NaturalL2S), an end-to-end framework that jointly trains the vocoder with the acoustic inductive priors. Specifically, our architecture introduces a fundamental frequency (F0) predictor to explicitly model prosodic variations, where the predicted F0 contour drives a differentiable digital signal processing (DDSP) synthesizer to provide acoustic priors for subsequent refinement. Notably, the proposed system achieves satisfactory performance on speaker similarity without requiring explicit speaker embeddings. Both objective metrics and subjective listening tests demonstrate that NaturalL2S significantly enhances synthesized speech quality compared to existing state-of-the-art methods. Audio samples are available on our demonstration page: https://yifan-liang.github.io/NaturalL2S/.
{"title":"NaturalL2S: End-to-end high-quality multispeaker lip-to-speech synthesis with differential digital signal processing.","authors":"Yifan Liang, Fangkun Liu, Andong Li, Xiaodong Li, Chengyou Lei, Chengshi Zheng","doi":"10.1016/j.neunet.2025.108163","DOIUrl":"10.1016/j.neunet.2025.108163","url":null,"abstract":"<p><p>Recent advancements in visual speech recognition (VSR) have promoted progress in lip-to-speech synthesis, where pre-trained VSR models enhance the intelligibility of synthesized speech by providing valuable semantic information. The success achieved by cascade frameworks, which combine pseudo-VSR with pseudo-text-to-speech (TTS) or implicitly utilize the transcribed text, highlights the benefits of leveraging VSR models. However, these methods typically rely on mel-spectrograms as an intermediate representation, which may introduce a key bottleneck: the domain gap between synthetic mel-spectrograms, generated from inherently error-prone lip-to-speech mappings, and real mel-spectrograms used to train vocoders. This mismatch inevitably degrades synthesis quality. To bridge this gap, we propose Natural Lip-to-Speech (NaturalL2S), an end-to-end framework that jointly trains the vocoder with the acoustic inductive priors. Specifically, our architecture introduces a fundamental frequency (F0) predictor to explicitly model prosodic variations, where the predicted F0 contour drives a differentiable digital signal processing (DDSP) synthesizer to provide acoustic priors for subsequent refinement. Notably, the proposed system achieves satisfactory performance on speaker similarity without requiring explicit speaker embeddings. Both objective metrics and subjective listening tests demonstrate that NaturalL2S significantly enhances synthesized speech quality compared to existing state-of-the-art methods. Audio samples are available on our demonstration page: https://yifan-liang.github.io/NaturalL2S/.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"108163"},"PeriodicalIF":6.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145294364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-31DOI: 10.1016/j.neunet.2026.108675
Teng Zhang , Gen Li , Yanhui Xiao , Huawei Tian , Yun Cao
With the continuous advancement of Deepfake techniques, traditional unimodal detection methods struggle to address the challenges posed by multimodal manipulations. Most existing approaches rely on large-scale training data, which limits their generalization to unseen identities or different manipulation types in few-shot settings. In this paper, we propose an emotion-aware multimodal Deepfake detection method that exploits emotion signals for forgery detection. Specifically, we design an emotion embedding extractor (Emoencoder) to capture emotion representations within modalities. Then, we employ Emotion-Aware Contrastive Learning and Cross-Modal Contrastive Learning to capture cross-modal inconsistencies and enhance modality feature extraction. Furthermore, we propose a Text-Guided Semantic Fusion module, where the text modality serves as a semantic anchor to guide audio-visual feature interactions for multimodal feature fusion. To validate our approach under data-limited conditions and unseen identities, we employ a cross-identity few-shot training strategy on benchmark datasets. Experimental results demonstrate that our method outperforms SOTAs and demonstrates superior generalization to both unseen identities and manipulation types.
{"title":"Emotion-Aware multimodal deepfake detection","authors":"Teng Zhang , Gen Li , Yanhui Xiao , Huawei Tian , Yun Cao","doi":"10.1016/j.neunet.2026.108675","DOIUrl":"10.1016/j.neunet.2026.108675","url":null,"abstract":"<div><div>With the continuous advancement of Deepfake techniques, traditional unimodal detection methods struggle to address the challenges posed by multimodal manipulations. Most existing approaches rely on large-scale training data, which limits their generalization to unseen identities or different manipulation types in few-shot settings. In this paper, we propose an emotion-aware multimodal Deepfake detection method that exploits emotion signals for forgery detection. Specifically, we design an emotion embedding extractor (Emoencoder) to capture emotion representations within modalities. Then, we employ Emotion-Aware Contrastive Learning and Cross-Modal Contrastive Learning to capture cross-modal inconsistencies and enhance modality feature extraction. Furthermore, we propose a Text-Guided Semantic Fusion module, where the text modality serves as a semantic anchor to guide audio-visual feature interactions for multimodal feature fusion. To validate our approach under data-limited conditions and unseen identities, we employ a cross-identity few-shot training strategy on benchmark datasets. Experimental results demonstrate that our method outperforms SOTAs and demonstrates superior generalization to both unseen identities and manipulation types.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108675"},"PeriodicalIF":6.3,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-31DOI: 10.1016/j.neunet.2026.108646
Wenqian Du , Mingduo Lin , Guoling Yuan , Bo Zhao
In this paper, an event-triggered decentralized adaptive critic learning (ACL) control method is proposed for interconnected systems with nonlinear inequality state constraints. First, by introducing a slack function, the nonlinear inequality state constraints of original isolated subsystem are transformed into equality forms, and then the original isolated subsystem is augmented to an unconstrained one. Then, by establishing a cost function with discount factors for each isolated subsystem, a local policy iteration-based decentralized control law is developed by solving the Hamilton–Jacobi–Bellman equation with the help of a local critic neural network (NN) for each isolated subsystem. Through developing a novel event-triggering mechanism for each isolated subsystem, the decentralized control policy is updated at the triggering instants only, which assists to save the computational and communication resources. Hereafter, the event-triggered decentralized control law of isolated subsystem is derived. Then, the overall optimal control for the entire interconnected system is derived by constituting an array of developed event-triggered decentralized control laws. Furthermore, the closed-loop nonlinear interconnected system and the weight estimation errors of local critic NNs are guaranteed to be uniformly ultimately bounded. Finally, the effectiveness of the proposed method is validated through two comparative simulation examples.
{"title":"Event-triggered decentralized adaptive critic learning control for interconnected systems with nonlinear inequality state constraints","authors":"Wenqian Du , Mingduo Lin , Guoling Yuan , Bo Zhao","doi":"10.1016/j.neunet.2026.108646","DOIUrl":"10.1016/j.neunet.2026.108646","url":null,"abstract":"<div><div>In this paper, an event-triggered decentralized adaptive critic learning (ACL) control method is proposed for interconnected systems with nonlinear inequality state constraints. First, by introducing a slack function, the nonlinear inequality state constraints of original isolated subsystem are transformed into equality forms, and then the original isolated subsystem is augmented to an unconstrained one. Then, by establishing a cost function with discount factors for each isolated subsystem, a local policy iteration-based decentralized control law is developed by solving the Hamilton–Jacobi–Bellman equation with the help of a local critic neural network (NN) for each isolated subsystem. Through developing a novel event-triggering mechanism for each isolated subsystem, the decentralized control policy is updated at the triggering instants only, which assists to save the computational and communication resources. Hereafter, the event-triggered decentralized control law of isolated subsystem is derived. Then, the overall optimal control for the entire interconnected system is derived by constituting an array of developed event-triggered decentralized control laws. Furthermore, the closed-loop nonlinear interconnected system and the weight estimation errors of local critic NNs are guaranteed to be uniformly ultimately bounded. Finally, the effectiveness of the proposed method is validated through two comparative simulation examples.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108646"},"PeriodicalIF":6.3,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1016/j.neunet.2026.108652
Yu Hao , Xin Gao , Xinping Diao , Yuan Li , Yukun Lin , Tianyang Chen , Qiangwei Li , Jiawen Lu
Enhancing model classification capability for samples within overlapping regions in complex feature spaces remains a key challenge in imbalanced classification research. Existing mainstream methods at the data-level and algorithm-level primarily rely on original sample distribution information to reduce overlap impact, without deeply modeling the causal relationship between features and labels. Furthermore, these approaches often overlook instance-level explanations that could guide deep discriminative information mining for samples of different classes in overlapping regions, thus the improvement on classification performance and model credibility may be constrained. This paper proposes an explainable imbalanced classification framework with adaptive sample repulsion against class-specific counterfactuals (CSCF-SR), forming a closed-loop between explanation generation and classification decisions by dynamically regulating the feature-space distribution through generated counterfactual samples. Two core phases are jointly optimized. (1) Counterfactual searching: a class-specific dual-actor architecture based on reinforcement learning decouples perturbation policy learning for majority and minority classes. A multi-step dynamic perturbation mechanism is designed to control counterfactual search behavior more precisely and smoothly, effectively generating reliable counterfactual samples. (2) Adaptive sample repulsion against counterfactuals: exploiting the inter-class discriminative information in displacement vectors between counterfactual and original samples, each original sample is adaptively perturbed along the direction opposite to its counterfactual. This fine-grained regulation gradually displaces samples from the overlapping region and clarifies class boundaries. Experiments on 50 imbalanced datasets demonstrate that CSCF-SR has a performance advantage over 27 typical imbalanced classification methods on both F1-score and G-mean, with more pronounced improvements on 25 datasets with severe class overlap.
{"title":"Adaptive sample repulsion against class-specific counterfactuals for explainable imbalanced classification","authors":"Yu Hao , Xin Gao , Xinping Diao , Yuan Li , Yukun Lin , Tianyang Chen , Qiangwei Li , Jiawen Lu","doi":"10.1016/j.neunet.2026.108652","DOIUrl":"10.1016/j.neunet.2026.108652","url":null,"abstract":"<div><div>Enhancing model classification capability for samples within overlapping regions in complex feature spaces remains a key challenge in imbalanced classification research. Existing mainstream methods at the data-level and algorithm-level primarily rely on original sample distribution information to reduce overlap impact, without deeply modeling the causal relationship between features and labels. Furthermore, these approaches often overlook instance-level explanations that could guide deep discriminative information mining for samples of different classes in overlapping regions, thus the improvement on classification performance and model credibility may be constrained. This paper proposes an explainable imbalanced classification framework with adaptive sample repulsion against class-specific counterfactuals (CSCF-SR), forming a closed-loop between explanation generation and classification decisions by dynamically regulating the feature-space distribution through generated counterfactual samples. Two core phases are jointly optimized. (1) Counterfactual searching: a class-specific dual-actor architecture based on reinforcement learning decouples perturbation policy learning for majority and minority classes. A multi-step dynamic perturbation mechanism is designed to control counterfactual search behavior more precisely and smoothly, effectively generating reliable counterfactual samples. (2) Adaptive sample repulsion against counterfactuals: exploiting the inter-class discriminative information in displacement vectors between counterfactual and original samples, each original sample is adaptively perturbed along the direction opposite to its counterfactual. This fine-grained regulation gradually displaces samples from the overlapping region and clarifies class boundaries. Experiments on 50 imbalanced datasets demonstrate that CSCF-SR has a performance advantage over 27 typical imbalanced classification methods on both F1-score and G-mean, with more pronounced improvements on 25 datasets with severe class overlap.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108652"},"PeriodicalIF":6.3,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1016/j.neunet.2026.108670
Tianyu Wang , Maite Zhang , Mingxuan Lu , Mian Li
In real-world applications, tabular datasets often evolve over time, leading to temporal shift that degrades the long-range neural network performance. Most existing temporal encoding or adaptation solutions treat time cues as fixed auxiliary variables at a single scale. Motivated by the multi-horizon nature of temporal shifts with heterogeneous temporal dynamics, this paper presents TARS (Temporal Abstraction with Routed Scales), a novel plug-and-play method for robust tabular learning under temporal shift, applicable to various deep learning model backbones. First, an explicit temporal encoder decomposes timestamps into short-term recency, mid-term periodicity, and long-term contextual embeddings with structured memory. Next, an implicit drift encoder tracks higher-order distributional statistics at the same aligned timescales, producing drift signals that reflect ongoing temporal dynamics. These signals drive a drift-aware routing mechanism that adaptively weights the explicit temporal pathways, emphasizing the most relevant timescales under current conditions. Finally, a feature-temporal fusion layer integrates the routed temporal representation with original features, injecting context-aware bias. Extensive experiments on eight real-world datasets from the TabReD benchmark show that TARS consistently outperforms the competitive compared methods across various backbone models, achieving up to +2.38% average relative improvement on MLP, +4.08% on DCNv2, etc. Ablation studies verify the complementary contributions of all four modules. These results highlight the effectiveness of TARS for improving the temporal robustness of existing deep tabular models.
在现实世界的应用中,表格数据集经常随着时间的推移而变化,导致时间的变化,从而降低了远程神经网络的性能。大多数现有的时间编码或自适应解决方案将时间线索视为单一尺度上的固定辅助变量。摘要针对具有异构时间动态的时间转移的多视界特性,提出了一种适用于各种深度学习模型主干的时间转移鲁棒表格学习的即插即用方法TARS (temporal Abstraction with routing Scales)。首先,显式时间编码器将时间戳分解为具有结构化记忆的短期近期性、中期周期性和长期上下文嵌入。接下来,隐式漂移编码器在相同的对齐时间尺度上跟踪高阶分布统计数据,产生反映持续时间动态的漂移信号。这些信号驱动漂移感知路由机制,该机制自适应地加权显式时间路径,强调当前条件下最相关的时间尺度。最后,特征时间融合层将路由的时间表示与原始特征集成在一起,注入上下文感知偏差。在TabReD基准测试的8个真实数据集上进行的大量实验表明,TARS在各种骨干模型中始终优于竞争性比较方法,在MLP上实现了+2.38%的平均相对改进,在DCNv2等上实现了+4.08%的平均相对改进。消融研究证实了所有四个模块的互补贡献。这些结果突出了TARS在提高现有深度表格模型的时间鲁棒性方面的有效性。
{"title":"Multi-timescale representation with adaptive routing for deep tabular learning under temporal shift","authors":"Tianyu Wang , Maite Zhang , Mingxuan Lu , Mian Li","doi":"10.1016/j.neunet.2026.108670","DOIUrl":"10.1016/j.neunet.2026.108670","url":null,"abstract":"<div><div>In real-world applications, tabular datasets often evolve over time, leading to temporal shift that degrades the long-range neural network performance. Most existing temporal encoding or adaptation solutions treat time cues as fixed auxiliary variables at a single scale. Motivated by the multi-horizon nature of temporal shifts with heterogeneous temporal dynamics, this paper presents TARS (Temporal Abstraction with Routed Scales), a novel plug-and-play method for robust tabular learning under temporal shift, applicable to various deep learning model backbones. First, an explicit temporal encoder decomposes timestamps into short-term recency, mid-term periodicity, and long-term contextual embeddings with structured memory. Next, an implicit drift encoder tracks higher-order distributional statistics at the same aligned timescales, producing drift signals that reflect ongoing temporal dynamics. These signals drive a drift-aware routing mechanism that adaptively weights the explicit temporal pathways, emphasizing the most relevant timescales under current conditions. Finally, a feature-temporal fusion layer integrates the routed temporal representation with original features, injecting context-aware bias. Extensive experiments on eight real-world datasets from the TabReD benchmark show that TARS consistently outperforms the competitive compared methods across various backbone models, achieving up to +2.38% average relative improvement on MLP, +4.08% on DCNv2, etc. Ablation studies verify the complementary contributions of all four modules. These results highlight the effectiveness of TARS for improving the temporal robustness of existing deep tabular models.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108670"},"PeriodicalIF":6.3,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge Distillation (KD) is a critical technique for model compression, facilitating the transfer of implicit knowledge from a teacher model to a more compact, deployable student model. KD can be generally divided into two categories: logit distillation and feature distillation. Feature distillation has been predominant in achieving state-of-the-art (SOTA) performance, but recent advances in logit distillation have begun to narrow the gap. We propose a Logit-guided Feature Distillation (LFD) framework that combines the strengths of both logit and feature distillation to enhance the efficacy of knowledge transfer, particularly leveraging the rich classification information inherent in logits for semantic segmentation tasks. Furthermore, it is observed that Deep Neural Networks (DNNs) only manifest task-relevant characteristics at sufficient depths, which may be a limiting factor in achieving higher accuracy. In this work, we introduce a collaborative distillation method that preemptively focuses on critical pixels and categories in the early stage. We employ logits from deep layers to generate fine-grained spatial masks that are directly conveyed to the feature distillation stage, thereby inducing spatial gradient disparities. Additionally, we generate class masks that dynamically modulate the weights of shallow auxiliary heads, ensuring that class-relevant features can be calibrated by the primary head. A novel shared auxiliary head distillation approach is also presented. Experiments on the Cityscapes, Pascal VOC, and CamVid datasets show that the proposed method achieves competitive performance while maintaining low memory usage. Our codes will be released in https://github.com/fate2715/LFD.
{"title":"Efficient semantic segmentation via logit-guided feature distillation","authors":"Xuyi Yu , Shang Lou , Yinghai Zhao , Huipeng Zhang , Kuizhi Mei","doi":"10.1016/j.neunet.2026.108663","DOIUrl":"10.1016/j.neunet.2026.108663","url":null,"abstract":"<div><div>Knowledge Distillation (KD) is a critical technique for model compression, facilitating the transfer of implicit knowledge from a teacher model to a more compact, deployable student model. KD can be generally divided into two categories: logit distillation and feature distillation. Feature distillation has been predominant in achieving state-of-the-art (SOTA) performance, but recent advances in logit distillation have begun to narrow the gap. We propose a Logit-guided Feature Distillation (LFD) framework that combines the strengths of both logit and feature distillation to enhance the efficacy of knowledge transfer, particularly leveraging the rich classification information inherent in logits for semantic segmentation tasks. Furthermore, it is observed that Deep Neural Networks (DNNs) only manifest task-relevant characteristics at sufficient depths, which may be a limiting factor in achieving higher accuracy. In this work, we introduce a collaborative distillation method that preemptively focuses on critical pixels and categories in the early stage. We employ logits from deep layers to generate fine-grained spatial masks that are directly conveyed to the feature distillation stage, thereby inducing spatial gradient disparities. Additionally, we generate class masks that dynamically modulate the weights of shallow auxiliary heads, ensuring that class-relevant features can be calibrated by the primary head. A novel shared auxiliary head distillation approach is also presented. Experiments on the Cityscapes, Pascal VOC, and CamVid datasets show that the proposed method achieves competitive performance while maintaining low memory usage. Our codes will be released in <span><span>https://github.com/fate2715/LFD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108663"},"PeriodicalIF":6.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.neunet.2026.108650
Aoyu Song , Afizan Azman , Shanzhi Gu , Fangjian Jiang , Jianchi Du , Tailong Wu , Mingyang Geng , Jia Li
Code refinement is a vital aspect of software development, involving the review and enhancement of code contributions made by developers. A critical challenge in this process arises from unclear or ambiguous review comments, which can hinder developers’ understanding of the required changes. Our preliminary study reveals that conversations between developers and reviewers often contain valuable information that can help resolve such ambiguous review suggestions. However, leveraging conversational data to address this issue poses two key challenges: (1) enabling the model to autonomously determine whether a review suggestion is ambiguous, and (2) effectively extracting the relevant segments from the conversation that can aid in resolving the ambiguity.
In this paper, we propose a novel method for addressing ambiguous review suggestions by leveraging conversations between reviewers and developers. To tackle the above two challenges, we introduce an Ambiguous Discriminator that uses multi-task learning to classify ambiguity and generate type-aware confusion points from a GPT-4-labeled dataset. These confusion points guide a Type-Driven Multi-Strategy Retrieval Framework that applies targeted strategies based on categories like Inaccurate Localization, Unclear Expression, and Lack of Specific Guidance to extract actionable information from the conversation context. To support this, we construct a semantic auxiliary instruction library containing spatial indicators, clarification patterns, and action-oriented verbs, enabling precise alignment between review suggestions and informative conversation segments. Our method is evaluated on two widely-used code refinement datasets CodeReview and CodeReview-New, where we demonstrate that our method significantly enhances the performance of various state-of-the-art models, including TransReview, T5-Review, CodeT5, CodeReviewer and ChatGPT. Furthermore, we explore in depth how conversational information improves the model’s ability to address fine-grained situations, and we conduct human evaluations to assess the accuracy of ambiguity detection and the correctness of generated confusion points. We are the first to introduce the issue of ambiguous review suggestions in the code refinement domain and propose a solution that not only addresses these challenges but also sets the foundation for future research. Our method provides valuable insights into improving the clarity and effectiveness of review suggestions, offering a promising direction for advancing code refinement techniques.
{"title":"Resolving ambiguity in code refinement via conidfine: A conversationally-Aware framework with disambiguation and targeted retrieval","authors":"Aoyu Song , Afizan Azman , Shanzhi Gu , Fangjian Jiang , Jianchi Du , Tailong Wu , Mingyang Geng , Jia Li","doi":"10.1016/j.neunet.2026.108650","DOIUrl":"10.1016/j.neunet.2026.108650","url":null,"abstract":"<div><div>Code refinement is a vital aspect of software development, involving the review and enhancement of code contributions made by developers. A critical challenge in this process arises from unclear or ambiguous review comments, which can hinder developers’ understanding of the required changes. Our preliminary study reveals that conversations between developers and reviewers often contain valuable information that can help resolve such ambiguous review suggestions. However, leveraging conversational data to address this issue poses two key challenges: (1) enabling the model to autonomously determine whether a review suggestion is ambiguous, and (2) effectively extracting the relevant segments from the conversation that can aid in resolving the ambiguity.</div><div>In this paper, we propose a novel method for addressing ambiguous review suggestions by leveraging conversations between reviewers and developers. To tackle the above two challenges, we introduce an <strong>Ambiguous Discriminator</strong> that uses multi-task learning to classify ambiguity and generate type-aware confusion points from a GPT-4-labeled dataset. These confusion points guide a <strong>Type-Driven Multi-Strategy Retrieval Framework</strong> that applies targeted strategies based on categories like <em>Inaccurate Localization, Unclear Expression</em>, and <em>Lack of Specific Guidance</em> to extract actionable information from the conversation context. To support this, we construct a semantic auxiliary instruction library containing spatial indicators, clarification patterns, and action-oriented verbs, enabling precise alignment between review suggestions and informative conversation segments. Our method is evaluated on two widely-used code refinement datasets CodeReview and CodeReview-New, where we demonstrate that our method significantly enhances the performance of various state-of-the-art models, including TransReview, T5-Review, CodeT5, CodeReviewer and ChatGPT. Furthermore, we explore in depth how conversational information improves the model’s ability to address fine-grained situations, and we conduct human evaluations to assess the accuracy of ambiguity detection and the correctness of generated confusion points. We are the first to introduce the issue of ambiguous review suggestions in the code refinement domain and propose a solution that not only addresses these challenges but also sets the foundation for future research. Our method provides valuable insights into improving the clarity and effectiveness of review suggestions, offering a promising direction for advancing code refinement techniques.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108650"},"PeriodicalIF":6.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.neunet.2026.108647
Yubo Zhou , Jun Shu , Chengli Tan , Haishan Ye , Quanziang Wang , Junmin Liu , Deyu Meng , Ivor Tsang , Guang Dai
Bilevel optimization (BO) has garnered increasing attention in hyperparameter tuning. BO methods are commonly employed with two distinct strategies for the inner-level: cold-start, which uses a fixed initialization, and warm-start, which uses the last inner approximation solution as the starting point for the inner solver each time, respectively. Previous studies mainly stated that warm-start exhibits better convergence properties, while we provide a detailed comparison of these two strategies from a generalization perspective. Our findings indicate that, compared to the cold-start strategy, warm-start strategy exhibits worse generalization performance, such as more severe overfitting on the validation set. To explain this, we establish generalization bounds for the two strategies. We reveal that warm-start strategy produces a worse generalization upper bound due to its closer interaction with the inner-level dynamics, naturally leading to poor generalization performance. Inspired by the theoretical results, we propose several approaches to enhance the generalization capability of warm-start strategy and narrow its gap with cold-start, especially a novel random perturbation initialization method. Experiments validate the soundness of our theoretical analysis and the effectiveness of the proposed approaches.
{"title":"Warm-start or cold-start? A comparison of generalizability in gradient-based hyperparameter tuning","authors":"Yubo Zhou , Jun Shu , Chengli Tan , Haishan Ye , Quanziang Wang , Junmin Liu , Deyu Meng , Ivor Tsang , Guang Dai","doi":"10.1016/j.neunet.2026.108647","DOIUrl":"10.1016/j.neunet.2026.108647","url":null,"abstract":"<div><div>Bilevel optimization (BO) has garnered increasing attention in hyperparameter tuning. BO methods are commonly employed with two distinct strategies for the inner-level: cold-start, which uses a fixed initialization, and warm-start, which uses the last inner approximation solution as the starting point for the inner solver each time, respectively. Previous studies mainly stated that warm-start exhibits better convergence properties, while we provide a detailed comparison of these two strategies from a generalization perspective. Our findings indicate that, compared to the cold-start strategy, warm-start strategy exhibits worse generalization performance, such as more severe overfitting on the validation set. To explain this, we establish generalization bounds for the two strategies. We reveal that warm-start strategy produces a worse generalization upper bound due to its closer interaction with the inner-level dynamics, naturally leading to poor generalization performance. Inspired by the theoretical results, we propose several approaches to enhance the generalization capability of warm-start strategy and narrow its gap with cold-start, especially a novel random perturbation initialization method. Experiments validate the soundness of our theoretical analysis and the effectiveness of the proposed approaches.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108647"},"PeriodicalIF":6.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spiking neural networks (SNNs) are designed for low-power neuromorphic computing. A widely adopted hybrid paradigm decouples feature extraction from classification to improve biological plausibility and modularity. However, this decoupling concentrates decision making in the downstream classifier, which in many systems becomes the limiting factor for both accuracy and efficiency. Hand-preset, fixed topologies risk either redundancy or insufficient capacity, and surrogate-gradient training remains computationally costly. Biological neurogenesis is the brain’s mechanism for adaptively adding new neurons to build efficient, task-specific circuits. Inspired by this process, we propose the neurogenesis-inspired spiking neural network (NG-SNN), a dynamic adaptive framework that uses two key innovations to address these challenges. Specifically, we first introduce a supervised incremental construction mechanism that dynamically grows a task-optimal structure by selectively integrating neurons under a contribution criterion. Second, we devise an activity-dependent analytical learning method that replaces iterative optimization with single-shot and adaptive weight computation for each structural update, drastically improving training efficiency. Therefore, NG-SNN uniquely integrates dynamic structural adaptation with efficient non-iterative learning, forming a self-organizing and rapidly converging classification system. Moreover, this neurogenesis-driven process endows NG-SNN with a highly compact structure that requires significantly fewer parameters. Extensive experiments demonstrate that our NG-SNN matches or outperforms its competitors on diverse datasets, without the overhead of iterative training and manual architecture tuning.
{"title":"NG-SNN: A neurogenesis-inspired dynamic adaptive framework for efficient spike classification","authors":"Jing Tang , Depeng Li , Zhenyu Zhang , Zhigang Zeng","doi":"10.1016/j.neunet.2026.108656","DOIUrl":"10.1016/j.neunet.2026.108656","url":null,"abstract":"<div><div>Spiking neural networks (SNNs) are designed for low-power neuromorphic computing. A widely adopted hybrid paradigm decouples feature extraction from classification to improve biological plausibility and modularity. However, this decoupling concentrates decision making in the downstream classifier, which in many systems becomes the limiting factor for both accuracy and efficiency. Hand-preset, fixed topologies risk either redundancy or insufficient capacity, and surrogate-gradient training remains computationally costly. Biological neurogenesis is the brain’s mechanism for adaptively adding new neurons to build efficient, task-specific circuits. Inspired by this process, we propose the neurogenesis-inspired spiking neural network (NG-SNN), a dynamic adaptive framework that uses two key innovations to address these challenges. Specifically, we first introduce a supervised incremental construction mechanism that dynamically grows a task-optimal structure by selectively integrating neurons under a contribution criterion. Second, we devise an activity-dependent analytical learning method that replaces iterative optimization with single-shot and adaptive weight computation for each structural update, drastically improving training efficiency. Therefore, NG-SNN uniquely integrates dynamic structural adaptation with efficient non-iterative learning, forming a self-organizing and rapidly converging classification system. Moreover, this neurogenesis-driven process endows NG-SNN with a highly compact structure that requires significantly fewer parameters. Extensive experiments demonstrate that our NG-SNN matches or outperforms its competitors on diverse datasets, without the overhead of iterative training and manual architecture tuning.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108656"},"PeriodicalIF":6.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.neunet.2026.108668
Yiping Song , Juhua Zhang , Zhiliang Tian , Taishu Sheng , Yuxin Yang , Minlie Huang , Xinwang Liu , Dongsheng Li
Data augmentation (DA) is a widely adopted approach for mitigating data insufficiency. Conducting DA in private domains requires privacy-preserving text generation, including anonymization or perturbation applied to sensitive textual data. The above methods lack formal protection guarantees. Existing Differential Privacy (DP) learning methods provide theoretical guarantees by adding calibrated noise to models or outputs. However, the large output space and model scales in text generation require substantial noise, which severely degrades synthesis quality. In this paper, we transfer DP-based synthetic sample generation to DP-based sample discrimination. Specifically, we propose a DP-based DA framework with a large language model (LLM) and a DP-based discriminator for private-domain text generation. Our key idea is to (1) leverage LLMs to generate large-scale high-quality samples, (2) select synthesized samples fitting the private domain, and (3) align the label distribution with the private domain. To achieve this, we use knowledge distillation to construct a DP-based discriminator: teacher models, accessing private data, guide a student model to select samples under calibrated noise. A DP-based tutor further constrains the label distribution of synthesized samples with a low privacy budget. We theoretically analyze the privacy guarantees and empirically validate our method on three medical text classification datasets, showing that our DP-synthesized samples significantly outperform state-of-the-art DP fine-tuning baselines in utility.
{"title":"Differentially private data augmentation via LLM generation with discriminative and distribution-aligned filtering","authors":"Yiping Song , Juhua Zhang , Zhiliang Tian , Taishu Sheng , Yuxin Yang , Minlie Huang , Xinwang Liu , Dongsheng Li","doi":"10.1016/j.neunet.2026.108668","DOIUrl":"10.1016/j.neunet.2026.108668","url":null,"abstract":"<div><div>Data augmentation (DA) is a widely adopted approach for mitigating data insufficiency. Conducting DA in private domains requires privacy-preserving text generation, including anonymization or perturbation applied to sensitive textual data. The above methods lack formal protection guarantees. Existing Differential Privacy (DP) learning methods provide theoretical guarantees by adding calibrated noise to models or outputs. However, the large output space and model scales in text generation require substantial noise, which severely degrades synthesis quality. In this paper, we transfer DP-based synthetic sample generation to DP-based sample discrimination. Specifically, we propose a DP-based DA framework with a large language model (LLM) and a DP-based discriminator for private-domain text generation. Our key idea is to (1) leverage LLMs to generate large-scale high-quality samples, (2) select synthesized samples fitting the private domain, and (3) align the label distribution with the private domain. To achieve this, we use knowledge distillation to construct a DP-based discriminator: teacher models, accessing private data, guide a student model to select samples under calibrated noise. A DP-based tutor further constrains the label distribution of synthesized samples with a low privacy budget. We theoretically analyze the privacy guarantees and empirically validate our method on three medical text classification datasets, showing that our DP-synthesized samples significantly outperform state-of-the-art DP fine-tuning baselines in utility.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108668"},"PeriodicalIF":6.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}