首页 > 最新文献

Neural Networks最新文献

英文 中文
NaturalL2S: End-to-end high-quality multispeaker lip-to-speech synthesis with differential digital signal processing. NaturalL2S:端到端高品质多扬声器唇到语音合成与差分数字信号处理。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 Epub Date: 2025-10-01 DOI: 10.1016/j.neunet.2025.108163
Yifan Liang, Fangkun Liu, Andong Li, Xiaodong Li, Chengyou Lei, Chengshi Zheng

Recent advancements in visual speech recognition (VSR) have promoted progress in lip-to-speech synthesis, where pre-trained VSR models enhance the intelligibility of synthesized speech by providing valuable semantic information. The success achieved by cascade frameworks, which combine pseudo-VSR with pseudo-text-to-speech (TTS) or implicitly utilize the transcribed text, highlights the benefits of leveraging VSR models. However, these methods typically rely on mel-spectrograms as an intermediate representation, which may introduce a key bottleneck: the domain gap between synthetic mel-spectrograms, generated from inherently error-prone lip-to-speech mappings, and real mel-spectrograms used to train vocoders. This mismatch inevitably degrades synthesis quality. To bridge this gap, we propose Natural Lip-to-Speech (NaturalL2S), an end-to-end framework that jointly trains the vocoder with the acoustic inductive priors. Specifically, our architecture introduces a fundamental frequency (F0) predictor to explicitly model prosodic variations, where the predicted F0 contour drives a differentiable digital signal processing (DDSP) synthesizer to provide acoustic priors for subsequent refinement. Notably, the proposed system achieves satisfactory performance on speaker similarity without requiring explicit speaker embeddings. Both objective metrics and subjective listening tests demonstrate that NaturalL2S significantly enhances synthesized speech quality compared to existing state-of-the-art methods. Audio samples are available on our demonstration page: https://yifan-liang.github.io/NaturalL2S/.

视觉语音识别(VSR)的最新进展促进了唇语合成的进展,其中预训练的VSR模型通过提供有价值的语义信息来提高合成语音的可理解性。级联框架将伪VSR与伪文本到语音(TTS)相结合,或者隐式地利用转录文本,这些框架取得的成功突出了利用VSR模型的好处。然而,这些方法通常依赖于mel-谱图作为中间表示,这可能会引入一个关键的瓶颈:合成mel-谱图(由固有的容易出错的嘴唇到语音映射生成)与用于训练声码器的真实mel-谱图之间的域差距。这种不匹配不可避免地降低了合成质量。为了弥补这一差距,我们提出了自然唇到语音(NaturalL2S),这是一个端到端框架,可以联合训练声编码器和声感应先验。具体来说,我们的架构引入了一个基频(F0)预测器来明确地模拟韵律变化,其中预测的F0轮廓驱动可微数字信号处理(DDSP)合成器,为随后的细化提供声学先验。值得注意的是,该系统在不需要显式的说话人嵌入的情况下,在说话人相似度方面取得了令人满意的性能。客观指标和主观听力测试都表明,与现有最先进的方法相比,NaturalL2S显著提高了合成语音质量。音频样本可以在我们的演示页面上找到:https://yifan-liang.github.io/NaturalL2S/。
{"title":"NaturalL2S: End-to-end high-quality multispeaker lip-to-speech synthesis with differential digital signal processing.","authors":"Yifan Liang, Fangkun Liu, Andong Li, Xiaodong Li, Chengyou Lei, Chengshi Zheng","doi":"10.1016/j.neunet.2025.108163","DOIUrl":"10.1016/j.neunet.2025.108163","url":null,"abstract":"<p><p>Recent advancements in visual speech recognition (VSR) have promoted progress in lip-to-speech synthesis, where pre-trained VSR models enhance the intelligibility of synthesized speech by providing valuable semantic information. The success achieved by cascade frameworks, which combine pseudo-VSR with pseudo-text-to-speech (TTS) or implicitly utilize the transcribed text, highlights the benefits of leveraging VSR models. However, these methods typically rely on mel-spectrograms as an intermediate representation, which may introduce a key bottleneck: the domain gap between synthetic mel-spectrograms, generated from inherently error-prone lip-to-speech mappings, and real mel-spectrograms used to train vocoders. This mismatch inevitably degrades synthesis quality. To bridge this gap, we propose Natural Lip-to-Speech (NaturalL2S), an end-to-end framework that jointly trains the vocoder with the acoustic inductive priors. Specifically, our architecture introduces a fundamental frequency (F0) predictor to explicitly model prosodic variations, where the predicted F0 contour drives a differentiable digital signal processing (DDSP) synthesizer to provide acoustic priors for subsequent refinement. Notably, the proposed system achieves satisfactory performance on speaker similarity without requiring explicit speaker embeddings. Both objective metrics and subjective listening tests demonstrate that NaturalL2S significantly enhances synthesized speech quality compared to existing state-of-the-art methods. Audio samples are available on our demonstration page: https://yifan-liang.github.io/NaturalL2S/.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"108163"},"PeriodicalIF":6.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145294364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emotion-Aware multimodal deepfake detection 情感感知多模态深度假检测。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-31 DOI: 10.1016/j.neunet.2026.108675
Teng Zhang , Gen Li , Yanhui Xiao , Huawei Tian , Yun Cao
With the continuous advancement of Deepfake techniques, traditional unimodal detection methods struggle to address the challenges posed by multimodal manipulations. Most existing approaches rely on large-scale training data, which limits their generalization to unseen identities or different manipulation types in few-shot settings. In this paper, we propose an emotion-aware multimodal Deepfake detection method that exploits emotion signals for forgery detection. Specifically, we design an emotion embedding extractor (Emoencoder) to capture emotion representations within modalities. Then, we employ Emotion-Aware Contrastive Learning and Cross-Modal Contrastive Learning to capture cross-modal inconsistencies and enhance modality feature extraction. Furthermore, we propose a Text-Guided Semantic Fusion module, where the text modality serves as a semantic anchor to guide audio-visual feature interactions for multimodal feature fusion. To validate our approach under data-limited conditions and unseen identities, we employ a cross-identity few-shot training strategy on benchmark datasets. Experimental results demonstrate that our method outperforms SOTAs and demonstrates superior generalization to both unseen identities and manipulation types.
随着Deepfake技术的不断进步,传统的单峰检测方法难以应对多峰操作带来的挑战。大多数现有的方法依赖于大规模的训练数据,这限制了它们在少数镜头设置中对看不见的身份或不同操作类型的泛化。在本文中,我们提出了一种利用情感信号进行伪造检测的情感感知多模态深度伪造检测方法。具体来说,我们设计了一个情感嵌入提取器(Emoencoder)来捕获模态中的情感表征。然后,我们采用情绪感知对比学习和跨模态对比学习来捕捉跨模态不一致性,增强模态特征提取。此外,我们提出了一个文本引导语义融合模块,其中文本情态作为语义锚来指导多模态特征融合的视听特征交互。为了在数据有限的条件和不可见的身份下验证我们的方法,我们在基准数据集上采用了交叉身份的少量训练策略。实验结果表明,我们的方法优于sota,并且对看不见的身份和操作类型都有更好的泛化。
{"title":"Emotion-Aware multimodal deepfake detection","authors":"Teng Zhang ,&nbsp;Gen Li ,&nbsp;Yanhui Xiao ,&nbsp;Huawei Tian ,&nbsp;Yun Cao","doi":"10.1016/j.neunet.2026.108675","DOIUrl":"10.1016/j.neunet.2026.108675","url":null,"abstract":"<div><div>With the continuous advancement of Deepfake techniques, traditional unimodal detection methods struggle to address the challenges posed by multimodal manipulations. Most existing approaches rely on large-scale training data, which limits their generalization to unseen identities or different manipulation types in few-shot settings. In this paper, we propose an emotion-aware multimodal Deepfake detection method that exploits emotion signals for forgery detection. Specifically, we design an emotion embedding extractor (Emoencoder) to capture emotion representations within modalities. Then, we employ Emotion-Aware Contrastive Learning and Cross-Modal Contrastive Learning to capture cross-modal inconsistencies and enhance modality feature extraction. Furthermore, we propose a Text-Guided Semantic Fusion module, where the text modality serves as a semantic anchor to guide audio-visual feature interactions for multimodal feature fusion. To validate our approach under data-limited conditions and unseen identities, we employ a cross-identity few-shot training strategy on benchmark datasets. Experimental results demonstrate that our method outperforms SOTAs and demonstrates superior generalization to both unseen identities and manipulation types.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108675"},"PeriodicalIF":6.3,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Event-triggered decentralized adaptive critic learning control for interconnected systems with nonlinear inequality state constraints 具有非线性不等式状态约束的互联系统的事件触发分散自适应批评学习控制
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-31 DOI: 10.1016/j.neunet.2026.108646
Wenqian Du , Mingduo Lin , Guoling Yuan , Bo Zhao
In this paper, an event-triggered decentralized adaptive critic learning (ACL) control method is proposed for interconnected systems with nonlinear inequality state constraints. First, by introducing a slack function, the nonlinear inequality state constraints of original isolated subsystem are transformed into equality forms, and then the original isolated subsystem is augmented to an unconstrained one. Then, by establishing a cost function with discount factors for each isolated subsystem, a local policy iteration-based decentralized control law is developed by solving the Hamilton–Jacobi–Bellman equation with the help of a local critic neural network (NN) for each isolated subsystem. Through developing a novel event-triggering mechanism for each isolated subsystem, the decentralized control policy is updated at the triggering instants only, which assists to save the computational and communication resources. Hereafter, the event-triggered decentralized control law of isolated subsystem is derived. Then, the overall optimal control for the entire interconnected system is derived by constituting an array of developed event-triggered decentralized control laws. Furthermore, the closed-loop nonlinear interconnected system and the weight estimation errors of local critic NNs are guaranteed to be uniformly ultimately bounded. Finally, the effectiveness of the proposed method is validated through two comparative simulation examples.
针对具有非线性不等式状态约束的互联系统,提出了一种事件触发的分散自适应批评学习(ACL)控制方法。首先,通过引入松弛函数,将原隔离子系统的非线性不等式状态约束转化为等式形式,然后将原隔离子系统扩充为无约束子系统。然后,通过建立每个孤立子系统的带有折扣因子的成本函数,利用局部批评神经网络(NN)求解Hamilton-Jacobi-Bellman方程,建立了基于局部策略迭代的分散控制律。通过为每个隔离子系统开发一种新的事件触发机制,使分散控制策略只在触发时刻更新,从而节省了计算资源和通信资源。在此基础上,推导了孤立子系统的事件触发分散控制律。然后,通过构建一系列成熟的事件触发分散控制律,推导出整个互联系统的整体最优控制。此外,还保证了闭环非线性互联系统和局部临界神经网络的权值估计误差最终一致有界。最后,通过两个对比仿真算例验证了所提方法的有效性。
{"title":"Event-triggered decentralized adaptive critic learning control for interconnected systems with nonlinear inequality state constraints","authors":"Wenqian Du ,&nbsp;Mingduo Lin ,&nbsp;Guoling Yuan ,&nbsp;Bo Zhao","doi":"10.1016/j.neunet.2026.108646","DOIUrl":"10.1016/j.neunet.2026.108646","url":null,"abstract":"<div><div>In this paper, an event-triggered decentralized adaptive critic learning (ACL) control method is proposed for interconnected systems with nonlinear inequality state constraints. First, by introducing a slack function, the nonlinear inequality state constraints of original isolated subsystem are transformed into equality forms, and then the original isolated subsystem is augmented to an unconstrained one. Then, by establishing a cost function with discount factors for each isolated subsystem, a local policy iteration-based decentralized control law is developed by solving the Hamilton–Jacobi–Bellman equation with the help of a local critic neural network (NN) for each isolated subsystem. Through developing a novel event-triggering mechanism for each isolated subsystem, the decentralized control policy is updated at the triggering instants only, which assists to save the computational and communication resources. Hereafter, the event-triggered decentralized control law of isolated subsystem is derived. Then, the overall optimal control for the entire interconnected system is derived by constituting an array of developed event-triggered decentralized control laws. Furthermore, the closed-loop nonlinear interconnected system and the weight estimation errors of local critic NNs are guaranteed to be uniformly ultimately bounded. Finally, the effectiveness of the proposed method is validated through two comparative simulation examples.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108646"},"PeriodicalIF":6.3,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive sample repulsion against class-specific counterfactuals for explainable imbalanced classification 针对可解释的不平衡分类的类特定反事实的自适应样本排斥。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-30 DOI: 10.1016/j.neunet.2026.108652
Yu Hao , Xin Gao , Xinping Diao , Yuan Li , Yukun Lin , Tianyang Chen , Qiangwei Li , Jiawen Lu
Enhancing model classification capability for samples within overlapping regions in complex feature spaces remains a key challenge in imbalanced classification research. Existing mainstream methods at the data-level and algorithm-level primarily rely on original sample distribution information to reduce overlap impact, without deeply modeling the causal relationship between features and labels. Furthermore, these approaches often overlook instance-level explanations that could guide deep discriminative information mining for samples of different classes in overlapping regions, thus the improvement on classification performance and model credibility may be constrained. This paper proposes an explainable imbalanced classification framework with adaptive sample repulsion against class-specific counterfactuals (CSCF-SR), forming a closed-loop between explanation generation and classification decisions by dynamically regulating the feature-space distribution through generated counterfactual samples. Two core phases are jointly optimized. (1) Counterfactual searching: a class-specific dual-actor architecture based on reinforcement learning decouples perturbation policy learning for majority and minority classes. A multi-step dynamic perturbation mechanism is designed to control counterfactual search behavior more precisely and smoothly, effectively generating reliable counterfactual samples. (2) Adaptive sample repulsion against counterfactuals: exploiting the inter-class discriminative information in displacement vectors between counterfactual and original samples, each original sample is adaptively perturbed along the direction opposite to its counterfactual. This fine-grained regulation gradually displaces samples from the overlapping region and clarifies class boundaries. Experiments on 50 imbalanced datasets demonstrate that CSCF-SR has a performance advantage over 27 typical imbalanced classification methods on both F1-score and G-mean, with more pronounced improvements on 25 datasets with severe class overlap.
提高模型对复杂特征空间中重叠区域样本的分类能力一直是不平衡分类研究的关键挑战。现有数据级和算法级的主流方法主要依靠原始样本分布信息来减少重叠影响,没有对特征与标签之间的因果关系进行深入建模。此外,这些方法往往忽略了实例级解释,而实例级解释可以指导对重叠区域中不同类别的样本进行深度判别信息挖掘,从而限制了分类性能和模型可信度的提高。本文提出了一种针对类特定反事实的自适应样本排斥的可解释不平衡分类框架(CSCF-SR),通过生成的反事实样本动态调节特征空间分布,在解释生成和分类决策之间形成闭环。两个核心相联合优化。(1)反事实搜索:一种基于强化学习的类特定双参与者架构,解耦了多数类和少数类的扰动策略学习。设计了多步动态摄动机制,更精确、流畅地控制反事实搜索行为,有效地生成可靠的反事实样本。(2)自适应样本对反事实的排斥:利用反事实和原始样本之间位移向量中的类间判别信息,每个原始样本沿与其反事实相反的方向自适应扰动。这种细粒度的调节逐渐取代了重叠区域的样本,并澄清了类边界。在50个不平衡数据集上的实验表明,CSCF-SR在f1得分和g均值上都比27种典型的不平衡分类方法具有性能优势,在25个类重叠严重的数据集上的改进更为明显。
{"title":"Adaptive sample repulsion against class-specific counterfactuals for explainable imbalanced classification","authors":"Yu Hao ,&nbsp;Xin Gao ,&nbsp;Xinping Diao ,&nbsp;Yuan Li ,&nbsp;Yukun Lin ,&nbsp;Tianyang Chen ,&nbsp;Qiangwei Li ,&nbsp;Jiawen Lu","doi":"10.1016/j.neunet.2026.108652","DOIUrl":"10.1016/j.neunet.2026.108652","url":null,"abstract":"<div><div>Enhancing model classification capability for samples within overlapping regions in complex feature spaces remains a key challenge in imbalanced classification research. Existing mainstream methods at the data-level and algorithm-level primarily rely on original sample distribution information to reduce overlap impact, without deeply modeling the causal relationship between features and labels. Furthermore, these approaches often overlook instance-level explanations that could guide deep discriminative information mining for samples of different classes in overlapping regions, thus the improvement on classification performance and model credibility may be constrained. This paper proposes an explainable imbalanced classification framework with adaptive sample repulsion against class-specific counterfactuals (CSCF-SR), forming a closed-loop between explanation generation and classification decisions by dynamically regulating the feature-space distribution through generated counterfactual samples. Two core phases are jointly optimized. (1) Counterfactual searching: a class-specific dual-actor architecture based on reinforcement learning decouples perturbation policy learning for majority and minority classes. A multi-step dynamic perturbation mechanism is designed to control counterfactual search behavior more precisely and smoothly, effectively generating reliable counterfactual samples. (2) Adaptive sample repulsion against counterfactuals: exploiting the inter-class discriminative information in displacement vectors between counterfactual and original samples, each original sample is adaptively perturbed along the direction opposite to its counterfactual. This fine-grained regulation gradually displaces samples from the overlapping region and clarifies class boundaries. Experiments on 50 imbalanced datasets demonstrate that CSCF-SR has a performance advantage over 27 typical imbalanced classification methods on both F1-score and G-mean, with more pronounced improvements on 25 datasets with severe class overlap.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108652"},"PeriodicalIF":6.3,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-timescale representation with adaptive routing for deep tabular learning under temporal shift 基于自适应路径的深度表学习多时间尺度表示。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-30 DOI: 10.1016/j.neunet.2026.108670
Tianyu Wang , Maite Zhang , Mingxuan Lu , Mian Li
In real-world applications, tabular datasets often evolve over time, leading to temporal shift that degrades the long-range neural network performance. Most existing temporal encoding or adaptation solutions treat time cues as fixed auxiliary variables at a single scale. Motivated by the multi-horizon nature of temporal shifts with heterogeneous temporal dynamics, this paper presents TARS (Temporal Abstraction with Routed Scales), a novel plug-and-play method for robust tabular learning under temporal shift, applicable to various deep learning model backbones. First, an explicit temporal encoder decomposes timestamps into short-term recency, mid-term periodicity, and long-term contextual embeddings with structured memory. Next, an implicit drift encoder tracks higher-order distributional statistics at the same aligned timescales, producing drift signals that reflect ongoing temporal dynamics. These signals drive a drift-aware routing mechanism that adaptively weights the explicit temporal pathways, emphasizing the most relevant timescales under current conditions. Finally, a feature-temporal fusion layer integrates the routed temporal representation with original features, injecting context-aware bias. Extensive experiments on eight real-world datasets from the TabReD benchmark show that TARS consistently outperforms the competitive compared methods across various backbone models, achieving up to +2.38% average relative improvement on MLP, +4.08% on DCNv2, etc. Ablation studies verify the complementary contributions of all four modules. These results highlight the effectiveness of TARS for improving the temporal robustness of existing deep tabular models.
在现实世界的应用中,表格数据集经常随着时间的推移而变化,导致时间的变化,从而降低了远程神经网络的性能。大多数现有的时间编码或自适应解决方案将时间线索视为单一尺度上的固定辅助变量。摘要针对具有异构时间动态的时间转移的多视界特性,提出了一种适用于各种深度学习模型主干的时间转移鲁棒表格学习的即插即用方法TARS (temporal Abstraction with routing Scales)。首先,显式时间编码器将时间戳分解为具有结构化记忆的短期近期性、中期周期性和长期上下文嵌入。接下来,隐式漂移编码器在相同的对齐时间尺度上跟踪高阶分布统计数据,产生反映持续时间动态的漂移信号。这些信号驱动漂移感知路由机制,该机制自适应地加权显式时间路径,强调当前条件下最相关的时间尺度。最后,特征时间融合层将路由的时间表示与原始特征集成在一起,注入上下文感知偏差。在TabReD基准测试的8个真实数据集上进行的大量实验表明,TARS在各种骨干模型中始终优于竞争性比较方法,在MLP上实现了+2.38%的平均相对改进,在DCNv2等上实现了+4.08%的平均相对改进。消融研究证实了所有四个模块的互补贡献。这些结果突出了TARS在提高现有深度表格模型的时间鲁棒性方面的有效性。
{"title":"Multi-timescale representation with adaptive routing for deep tabular learning under temporal shift","authors":"Tianyu Wang ,&nbsp;Maite Zhang ,&nbsp;Mingxuan Lu ,&nbsp;Mian Li","doi":"10.1016/j.neunet.2026.108670","DOIUrl":"10.1016/j.neunet.2026.108670","url":null,"abstract":"<div><div>In real-world applications, tabular datasets often evolve over time, leading to temporal shift that degrades the long-range neural network performance. Most existing temporal encoding or adaptation solutions treat time cues as fixed auxiliary variables at a single scale. Motivated by the multi-horizon nature of temporal shifts with heterogeneous temporal dynamics, this paper presents TARS (Temporal Abstraction with Routed Scales), a novel plug-and-play method for robust tabular learning under temporal shift, applicable to various deep learning model backbones. First, an explicit temporal encoder decomposes timestamps into short-term recency, mid-term periodicity, and long-term contextual embeddings with structured memory. Next, an implicit drift encoder tracks higher-order distributional statistics at the same aligned timescales, producing drift signals that reflect ongoing temporal dynamics. These signals drive a drift-aware routing mechanism that adaptively weights the explicit temporal pathways, emphasizing the most relevant timescales under current conditions. Finally, a feature-temporal fusion layer integrates the routed temporal representation with original features, injecting context-aware bias. Extensive experiments on eight real-world datasets from the TabReD benchmark show that TARS consistently outperforms the competitive compared methods across various backbone models, achieving up to +2.38% average relative improvement on MLP, +4.08% on DCNv2, etc. Ablation studies verify the complementary contributions of all four modules. These results highlight the effectiveness of TARS for improving the temporal robustness of existing deep tabular models.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108670"},"PeriodicalIF":6.3,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient semantic segmentation via logit-guided feature distillation 基于对数引导特征蒸馏的高效语义分割。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1016/j.neunet.2026.108663
Xuyi Yu , Shang Lou , Yinghai Zhao , Huipeng Zhang , Kuizhi Mei
Knowledge Distillation (KD) is a critical technique for model compression, facilitating the transfer of implicit knowledge from a teacher model to a more compact, deployable student model. KD can be generally divided into two categories: logit distillation and feature distillation. Feature distillation has been predominant in achieving state-of-the-art (SOTA) performance, but recent advances in logit distillation have begun to narrow the gap. We propose a Logit-guided Feature Distillation (LFD) framework that combines the strengths of both logit and feature distillation to enhance the efficacy of knowledge transfer, particularly leveraging the rich classification information inherent in logits for semantic segmentation tasks. Furthermore, it is observed that Deep Neural Networks (DNNs) only manifest task-relevant characteristics at sufficient depths, which may be a limiting factor in achieving higher accuracy. In this work, we introduce a collaborative distillation method that preemptively focuses on critical pixels and categories in the early stage. We employ logits from deep layers to generate fine-grained spatial masks that are directly conveyed to the feature distillation stage, thereby inducing spatial gradient disparities. Additionally, we generate class masks that dynamically modulate the weights of shallow auxiliary heads, ensuring that class-relevant features can be calibrated by the primary head. A novel shared auxiliary head distillation approach is also presented. Experiments on the Cityscapes, Pascal VOC, and CamVid datasets show that the proposed method achieves competitive performance while maintaining low memory usage. Our codes will be released in https://github.com/fate2715/LFD.
知识蒸馏(Knowledge Distillation, KD)是模型压缩的一项关键技术,有助于将隐性知识从教师模型转移到更紧凑、可部署的学生模型。KD一般可分为两类:logit精馏和特征精馏。特征蒸馏在实现最先进(SOTA)性能方面占主导地位,但logit蒸馏的最新进展已经开始缩小差距。我们提出了一个logit引导的特征蒸馏(LFD)框架,该框架结合了logit和特征蒸馏的优点,以提高知识转移的效率,特别是利用logit中固有的丰富分类信息进行语义分割任务。此外,我们观察到深度神经网络(dnn)仅在足够深度下表现出与任务相关的特征,这可能是实现更高精度的限制因素。在这项工作中,我们引入了一种协作蒸馏方法,在早期阶段先发制人地关注关键像素和类别。我们使用来自深层的逻辑来生成细粒度的空间掩模,这些掩模直接传递到特征蒸馏阶段,从而产生空间梯度差异。此外,我们生成动态调节浅辅助头部权重的类掩码,确保主头部可以校准与类相关的特征。提出了一种新的共享辅助水头蒸馏方法。在cityscape、Pascal VOC和CamVid数据集上的实验表明,该方法在保持较低内存占用的同时取得了具有竞争力的性能。我们的代码将在https://github.com/fate2715/LFD上发布。
{"title":"Efficient semantic segmentation via logit-guided feature distillation","authors":"Xuyi Yu ,&nbsp;Shang Lou ,&nbsp;Yinghai Zhao ,&nbsp;Huipeng Zhang ,&nbsp;Kuizhi Mei","doi":"10.1016/j.neunet.2026.108663","DOIUrl":"10.1016/j.neunet.2026.108663","url":null,"abstract":"<div><div>Knowledge Distillation (KD) is a critical technique for model compression, facilitating the transfer of implicit knowledge from a teacher model to a more compact, deployable student model. KD can be generally divided into two categories: logit distillation and feature distillation. Feature distillation has been predominant in achieving state-of-the-art (SOTA) performance, but recent advances in logit distillation have begun to narrow the gap. We propose a Logit-guided Feature Distillation (LFD) framework that combines the strengths of both logit and feature distillation to enhance the efficacy of knowledge transfer, particularly leveraging the rich classification information inherent in logits for semantic segmentation tasks. Furthermore, it is observed that Deep Neural Networks (DNNs) only manifest task-relevant characteristics at sufficient depths, which may be a limiting factor in achieving higher accuracy. In this work, we introduce a collaborative distillation method that preemptively focuses on critical pixels and categories in the early stage. We employ logits from deep layers to generate fine-grained spatial masks that are directly conveyed to the feature distillation stage, thereby inducing spatial gradient disparities. Additionally, we generate class masks that dynamically modulate the weights of shallow auxiliary heads, ensuring that class-relevant features can be calibrated by the primary head. A novel shared auxiliary head distillation approach is also presented. Experiments on the Cityscapes, Pascal VOC, and CamVid datasets show that the proposed method achieves competitive performance while maintaining low memory usage. Our codes will be released in <span><span>https://github.com/fate2715/LFD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108663"},"PeriodicalIF":6.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resolving ambiguity in code refinement via conidfine: A conversationally-Aware framework with disambiguation and targeted retrieval 通过conidfine解决代码细化中的歧义:具有消歧义和目标检索的会话感知框架。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1016/j.neunet.2026.108650
Aoyu Song , Afizan Azman , Shanzhi Gu , Fangjian Jiang , Jianchi Du , Tailong Wu , Mingyang Geng , Jia Li
Code refinement is a vital aspect of software development, involving the review and enhancement of code contributions made by developers. A critical challenge in this process arises from unclear or ambiguous review comments, which can hinder developers’ understanding of the required changes. Our preliminary study reveals that conversations between developers and reviewers often contain valuable information that can help resolve such ambiguous review suggestions. However, leveraging conversational data to address this issue poses two key challenges: (1) enabling the model to autonomously determine whether a review suggestion is ambiguous, and (2) effectively extracting the relevant segments from the conversation that can aid in resolving the ambiguity.
In this paper, we propose a novel method for addressing ambiguous review suggestions by leveraging conversations between reviewers and developers. To tackle the above two challenges, we introduce an Ambiguous Discriminator that uses multi-task learning to classify ambiguity and generate type-aware confusion points from a GPT-4-labeled dataset. These confusion points guide a Type-Driven Multi-Strategy Retrieval Framework that applies targeted strategies based on categories like Inaccurate Localization, Unclear Expression, and Lack of Specific Guidance to extract actionable information from the conversation context. To support this, we construct a semantic auxiliary instruction library containing spatial indicators, clarification patterns, and action-oriented verbs, enabling precise alignment between review suggestions and informative conversation segments. Our method is evaluated on two widely-used code refinement datasets CodeReview and CodeReview-New, where we demonstrate that our method significantly enhances the performance of various state-of-the-art models, including TransReview, T5-Review, CodeT5, CodeReviewer and ChatGPT. Furthermore, we explore in depth how conversational information improves the model’s ability to address fine-grained situations, and we conduct human evaluations to assess the accuracy of ambiguity detection and the correctness of generated confusion points. We are the first to introduce the issue of ambiguous review suggestions in the code refinement domain and propose a solution that not only addresses these challenges but also sets the foundation for future research. Our method provides valuable insights into improving the clarity and effectiveness of review suggestions, offering a promising direction for advancing code refinement techniques.
代码细化是软件开发的一个重要方面,涉及到对开发人员贡献的代码的审查和增强。在这个过程中,一个关键的挑战来自于不清楚或模棱两可的评审评论,这可能会阻碍开发人员对所需变更的理解。我们的初步研究表明,开发人员和评审人员之间的对话通常包含有价值的信息,这些信息可以帮助解决这种模糊的评审建议。然而,利用会话数据来解决这个问题提出了两个关键挑战:(1)使模型能够自主地确定审查建议是否含糊不清,以及(2)有效地从对话中提取有助于解决含糊不清的相关片段。在本文中,我们提出了一种新的方法,通过利用审稿人和开发人员之间的对话来处理模棱两可的审查建议。为了解决上述两个挑战,我们引入了一个歧义判别器,它使用多任务学习对歧义进行分类,并从gpt -4标记的数据集中生成类型感知的混淆点。这些混淆点指导了一个类型驱动的多策略检索框架,该框架基于诸如定位不准确、表达不清和缺乏具体指导等类别应用目标策略,从对话上下文中提取可操作的信息。为了支持这一点,我们构建了一个包含空间指示、澄清模式和动作导向动词的语义辅助指令库,使复习建议和信息会话片段之间能够精确对齐。我们的方法在两个广泛使用的代码优化数据集CodeReview和CodeReview- new上进行了评估,我们证明了我们的方法显着提高了各种最先进的模型的性能,包括TransReview, T5-Review, CodeT5, CodeReviewer和ChatGPT。此外,我们深入探讨了会话信息如何提高模型处理细粒度情况的能力,并进行了人工评估,以评估歧义检测的准确性和生成混淆点的正确性。我们是第一个在代码细化领域引入模棱两可的评审建议问题的人,并提出了一个解决方案,不仅解决了这些挑战,而且为未来的研究奠定了基础。我们的方法为改进评审建议的清晰度和有效性提供了有价值的见解,为推进代码精化技术提供了一个有希望的方向。
{"title":"Resolving ambiguity in code refinement via conidfine: A conversationally-Aware framework with disambiguation and targeted retrieval","authors":"Aoyu Song ,&nbsp;Afizan Azman ,&nbsp;Shanzhi Gu ,&nbsp;Fangjian Jiang ,&nbsp;Jianchi Du ,&nbsp;Tailong Wu ,&nbsp;Mingyang Geng ,&nbsp;Jia Li","doi":"10.1016/j.neunet.2026.108650","DOIUrl":"10.1016/j.neunet.2026.108650","url":null,"abstract":"<div><div>Code refinement is a vital aspect of software development, involving the review and enhancement of code contributions made by developers. A critical challenge in this process arises from unclear or ambiguous review comments, which can hinder developers’ understanding of the required changes. Our preliminary study reveals that conversations between developers and reviewers often contain valuable information that can help resolve such ambiguous review suggestions. However, leveraging conversational data to address this issue poses two key challenges: (1) enabling the model to autonomously determine whether a review suggestion is ambiguous, and (2) effectively extracting the relevant segments from the conversation that can aid in resolving the ambiguity.</div><div>In this paper, we propose a novel method for addressing ambiguous review suggestions by leveraging conversations between reviewers and developers. To tackle the above two challenges, we introduce an <strong>Ambiguous Discriminator</strong> that uses multi-task learning to classify ambiguity and generate type-aware confusion points from a GPT-4-labeled dataset. These confusion points guide a <strong>Type-Driven Multi-Strategy Retrieval Framework</strong> that applies targeted strategies based on categories like <em>Inaccurate Localization, Unclear Expression</em>, and <em>Lack of Specific Guidance</em> to extract actionable information from the conversation context. To support this, we construct a semantic auxiliary instruction library containing spatial indicators, clarification patterns, and action-oriented verbs, enabling precise alignment between review suggestions and informative conversation segments. Our method is evaluated on two widely-used code refinement datasets CodeReview and CodeReview-New, where we demonstrate that our method significantly enhances the performance of various state-of-the-art models, including TransReview, T5-Review, CodeT5, CodeReviewer and ChatGPT. Furthermore, we explore in depth how conversational information improves the model’s ability to address fine-grained situations, and we conduct human evaluations to assess the accuracy of ambiguity detection and the correctness of generated confusion points. We are the first to introduce the issue of ambiguous review suggestions in the code refinement domain and propose a solution that not only addresses these challenges but also sets the foundation for future research. Our method provides valuable insights into improving the clarity and effectiveness of review suggestions, offering a promising direction for advancing code refinement techniques.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108650"},"PeriodicalIF":6.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Warm-start or cold-start? A comparison of generalizability in gradient-based hyperparameter tuning 热启动还是冷启动?基于梯度的超参数整定的泛化性比较。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1016/j.neunet.2026.108647
Yubo Zhou , Jun Shu , Chengli Tan , Haishan Ye , Quanziang Wang , Junmin Liu , Deyu Meng , Ivor Tsang , Guang Dai
Bilevel optimization (BO) has garnered increasing attention in hyperparameter tuning. BO methods are commonly employed with two distinct strategies for the inner-level: cold-start, which uses a fixed initialization, and warm-start, which uses the last inner approximation solution as the starting point for the inner solver each time, respectively. Previous studies mainly stated that warm-start exhibits better convergence properties, while we provide a detailed comparison of these two strategies from a generalization perspective. Our findings indicate that, compared to the cold-start strategy, warm-start strategy exhibits worse generalization performance, such as more severe overfitting on the validation set. To explain this, we establish generalization bounds for the two strategies. We reveal that warm-start strategy produces a worse generalization upper bound due to its closer interaction with the inner-level dynamics, naturally leading to poor generalization performance. Inspired by the theoretical results, we propose several approaches to enhance the generalization capability of warm-start strategy and narrow its gap with cold-start, especially a novel random perturbation initialization method. Experiments validate the soundness of our theoretical analysis and the effectiveness of the proposed approaches.
双层优化(BO)在超参数调优中引起了越来越多的关注。BO方法通常在内部层使用两种不同的策略:冷启动,使用固定初始化;热启动,每次分别使用最后一个内部近似解作为内部求解器的起点。以往的研究主要表明暖启动策略具有更好的收敛性,本文从广义的角度对两种策略进行了详细的比较。结果表明,与冷启动策略相比,热启动策略表现出更差的泛化性能,如验证集的过拟合更严重。为了解释这一点,我们建立了两种策略的泛化界限。研究发现,热启动策略由于与内部动态的相互作用更密切,产生了更差的泛化上界,自然导致了较差的泛化性能。在理论结果的启发下,我们提出了几种方法来提高热启动策略的泛化能力,缩小其与冷启动策略的差距,特别是一种新的随机摄动初始化方法。实验验证了理论分析的正确性和所提方法的有效性。
{"title":"Warm-start or cold-start? A comparison of generalizability in gradient-based hyperparameter tuning","authors":"Yubo Zhou ,&nbsp;Jun Shu ,&nbsp;Chengli Tan ,&nbsp;Haishan Ye ,&nbsp;Quanziang Wang ,&nbsp;Junmin Liu ,&nbsp;Deyu Meng ,&nbsp;Ivor Tsang ,&nbsp;Guang Dai","doi":"10.1016/j.neunet.2026.108647","DOIUrl":"10.1016/j.neunet.2026.108647","url":null,"abstract":"<div><div>Bilevel optimization (BO) has garnered increasing attention in hyperparameter tuning. BO methods are commonly employed with two distinct strategies for the inner-level: cold-start, which uses a fixed initialization, and warm-start, which uses the last inner approximation solution as the starting point for the inner solver each time, respectively. Previous studies mainly stated that warm-start exhibits better convergence properties, while we provide a detailed comparison of these two strategies from a generalization perspective. Our findings indicate that, compared to the cold-start strategy, warm-start strategy exhibits worse generalization performance, such as more severe overfitting on the validation set. To explain this, we establish generalization bounds for the two strategies. We reveal that warm-start strategy produces a worse generalization upper bound due to its closer interaction with the inner-level dynamics, naturally leading to poor generalization performance. Inspired by the theoretical results, we propose several approaches to enhance the generalization capability of warm-start strategy and narrow its gap with cold-start, especially a novel random perturbation initialization method. Experiments validate the soundness of our theoretical analysis and the effectiveness of the proposed approaches.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108647"},"PeriodicalIF":6.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NG-SNN: A neurogenesis-inspired dynamic adaptive framework for efficient spike classification NG-SNN:一种神经发生启发的动态自适应框架,用于有效的尖峰分类
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1016/j.neunet.2026.108656
Jing Tang , Depeng Li , Zhenyu Zhang , Zhigang Zeng
Spiking neural networks (SNNs) are designed for low-power neuromorphic computing. A widely adopted hybrid paradigm decouples feature extraction from classification to improve biological plausibility and modularity. However, this decoupling concentrates decision making in the downstream classifier, which in many systems becomes the limiting factor for both accuracy and efficiency. Hand-preset, fixed topologies risk either redundancy or insufficient capacity, and surrogate-gradient training remains computationally costly. Biological neurogenesis is the brain’s mechanism for adaptively adding new neurons to build efficient, task-specific circuits. Inspired by this process, we propose the neurogenesis-inspired spiking neural network (NG-SNN), a dynamic adaptive framework that uses two key innovations to address these challenges. Specifically, we first introduce a supervised incremental construction mechanism that dynamically grows a task-optimal structure by selectively integrating neurons under a contribution criterion. Second, we devise an activity-dependent analytical learning method that replaces iterative optimization with single-shot and adaptive weight computation for each structural update, drastically improving training efficiency. Therefore, NG-SNN uniquely integrates dynamic structural adaptation with efficient non-iterative learning, forming a self-organizing and rapidly converging classification system. Moreover, this neurogenesis-driven process endows NG-SNN with a highly compact structure that requires significantly fewer parameters. Extensive experiments demonstrate that our NG-SNN matches or outperforms its competitors on diverse datasets, without the overhead of iterative training and manual architecture tuning.
脉冲神经网络(snn)是为低功耗神经形态计算而设计的。一种被广泛采用的混合范式将特征提取与分类分离,以提高生物的可信性和模块化。然而,这种解耦将决策集中在下游分类器上,这在许多系统中成为准确性和效率的限制因素。手动预置的固定拓扑存在冗余或容量不足的风险,并且代理梯度训练的计算成本仍然很高。生物神经发生是大脑自适应地增加新的神经元以建立有效的、特定任务的回路的机制。受这一过程的启发,我们提出了神经发生启发的峰值神经网络(NG-SNN),这是一个动态自适应框架,使用两个关键创新来解决这些挑战。具体而言,我们首先引入了一种监督增量构建机制,该机制通过在贡献准则下选择性地整合神经元来动态地增长任务最优结构。其次,我们设计了一种活动依赖的分析学习方法,该方法将每次结构更新的迭代优化替换为单次和自适应权重计算,大大提高了训练效率。因此,NG-SNN独特地将动态结构自适应与高效的非迭代学习相结合,形成了一个自组织、快速收敛的分类系统。此外,这种神经发生驱动的过程使NG-SNN具有高度紧凑的结构,需要的参数显著减少。大量的实验表明,我们的NG-SNN在不同的数据集上匹配或优于其竞争对手,而无需迭代训练和手动架构调优的开销。
{"title":"NG-SNN: A neurogenesis-inspired dynamic adaptive framework for efficient spike classification","authors":"Jing Tang ,&nbsp;Depeng Li ,&nbsp;Zhenyu Zhang ,&nbsp;Zhigang Zeng","doi":"10.1016/j.neunet.2026.108656","DOIUrl":"10.1016/j.neunet.2026.108656","url":null,"abstract":"<div><div>Spiking neural networks (SNNs) are designed for low-power neuromorphic computing. A widely adopted hybrid paradigm decouples feature extraction from classification to improve biological plausibility and modularity. However, this decoupling concentrates decision making in the downstream classifier, which in many systems becomes the limiting factor for both accuracy and efficiency. Hand-preset, fixed topologies risk either redundancy or insufficient capacity, and surrogate-gradient training remains computationally costly. Biological neurogenesis is the brain’s mechanism for adaptively adding new neurons to build efficient, task-specific circuits. Inspired by this process, we propose the neurogenesis-inspired spiking neural network (NG-SNN), a dynamic adaptive framework that uses two key innovations to address these challenges. Specifically, we first introduce a supervised incremental construction mechanism that dynamically grows a task-optimal structure by selectively integrating neurons under a contribution criterion. Second, we devise an activity-dependent analytical learning method that replaces iterative optimization with single-shot and adaptive weight computation for each structural update, drastically improving training efficiency. Therefore, NG-SNN uniquely integrates dynamic structural adaptation with efficient non-iterative learning, forming a self-organizing and rapidly converging classification system. Moreover, this neurogenesis-driven process endows NG-SNN with a highly compact structure that requires significantly fewer parameters. Extensive experiments demonstrate that our NG-SNN matches or outperforms its competitors on diverse datasets, without the overhead of iterative training and manual architecture tuning.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108656"},"PeriodicalIF":6.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differentially private data augmentation via LLM generation with discriminative and distribution-aligned filtering 通过具有判别和分布对齐过滤的LLM生成差分私有数据增强。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1016/j.neunet.2026.108668
Yiping Song , Juhua Zhang , Zhiliang Tian , Taishu Sheng , Yuxin Yang , Minlie Huang , Xinwang Liu , Dongsheng Li
Data augmentation (DA) is a widely adopted approach for mitigating data insufficiency. Conducting DA in private domains requires privacy-preserving text generation, including anonymization or perturbation applied to sensitive textual data. The above methods lack formal protection guarantees. Existing Differential Privacy (DP) learning methods provide theoretical guarantees by adding calibrated noise to models or outputs. However, the large output space and model scales in text generation require substantial noise, which severely degrades synthesis quality. In this paper, we transfer DP-based synthetic sample generation to DP-based sample discrimination. Specifically, we propose a DP-based DA framework with a large language model (LLM) and a DP-based discriminator for private-domain text generation. Our key idea is to (1) leverage LLMs to generate large-scale high-quality samples, (2) select synthesized samples fitting the private domain, and (3) align the label distribution with the private domain. To achieve this, we use knowledge distillation to construct a DP-based discriminator: teacher models, accessing private data, guide a student model to select samples under calibrated noise. A DP-based tutor further constrains the label distribution of synthesized samples with a low privacy budget. We theoretically analyze the privacy guarantees and empirically validate our method on three medical text classification datasets, showing that our DP-synthesized samples significantly outperform state-of-the-art DP fine-tuning baselines in utility.
数据增强(DA)是一种广泛采用的缓解数据不足的方法。在私有领域进行数据分析需要保护隐私的文本生成,包括对敏感文本数据进行匿名化或扰动处理。上述方法缺乏正式的保护保证。现有的差分隐私(DP)学习方法通过在模型或输出中添加校准噪声来提供理论保证。然而,文本生成中较大的输出空间和模型尺度需要大量的噪声,这严重降低了合成质量。在本文中,我们将基于dp的合成样本生成转移到基于dp的样本判别。具体来说,我们提出了一个基于dp的数据分析框架,该框架具有大型语言模型(LLM)和一个用于私有领域文本生成的基于dp的鉴别器。我们的关键思想是:(1)利用llm来生成大规模的高质量样本,(2)选择拟合私有域的合成样本,(3)将标签分布与私有域对齐。为了实现这一点,我们使用知识蒸馏来构建一个基于dp的鉴别器:教师模型,访问私人数据,引导学生模型在校准噪声下选择样本。基于dp的导师进一步约束了合成样本的标签分布,隐私预算较低。我们从理论上分析了隐私保证,并在三个医学文本分类数据集上实证验证了我们的方法,表明我们的DP合成样本在实用性上明显优于最先进的DP微调基线。
{"title":"Differentially private data augmentation via LLM generation with discriminative and distribution-aligned filtering","authors":"Yiping Song ,&nbsp;Juhua Zhang ,&nbsp;Zhiliang Tian ,&nbsp;Taishu Sheng ,&nbsp;Yuxin Yang ,&nbsp;Minlie Huang ,&nbsp;Xinwang Liu ,&nbsp;Dongsheng Li","doi":"10.1016/j.neunet.2026.108668","DOIUrl":"10.1016/j.neunet.2026.108668","url":null,"abstract":"<div><div>Data augmentation (DA) is a widely adopted approach for mitigating data insufficiency. Conducting DA in private domains requires privacy-preserving text generation, including anonymization or perturbation applied to sensitive textual data. The above methods lack formal protection guarantees. Existing Differential Privacy (DP) learning methods provide theoretical guarantees by adding calibrated noise to models or outputs. However, the large output space and model scales in text generation require substantial noise, which severely degrades synthesis quality. In this paper, we transfer DP-based synthetic sample generation to DP-based sample discrimination. Specifically, we propose a DP-based DA framework with a large language model (LLM) and a DP-based discriminator for private-domain text generation. Our key idea is to (1) leverage LLMs to generate large-scale high-quality samples, (2) select synthesized samples fitting the private domain, and (3) align the label distribution with the private domain. To achieve this, we use knowledge distillation to construct a DP-based discriminator: teacher models, accessing private data, guide a student model to select samples under calibrated noise. A DP-based tutor further constrains the label distribution of synthesized samples with a low privacy budget. We theoretically analyze the privacy guarantees and empirically validate our method on three medical text classification datasets, showing that our DP-synthesized samples significantly outperform state-of-the-art DP fine-tuning baselines in utility.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108668"},"PeriodicalIF":6.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neural Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1