首页 > 最新文献

IEEE transactions on artificial intelligence最新文献

英文 中文
Context-Guided Multiscale Attention for Real-Time Semantic Segmentation of Road Scene 基于上下文引导的多尺度注意力道路场景实时语义分割
Pub Date : 2025-09-08 DOI: 10.1109/TAI.2025.3606904
Saquib Mazhar;Nadeem Atif;M.K. Bhuyan;Shaik Rafi Ahamed
Lightweight deep neural networks have played a pivotal role in real-time semantic segmentation for autonomous driving in resource-constrained devices, which need to effectively learn the local semantics and global context at multiple scales due to varying object sizes. Recent methods design shallow and lightweight backbones with a small receptive field for faster inference, along with additional mechanisms such as attention to compensate for the accuracy loss due to the lightweight design. While some methods have exploited multiscale feature learning by attaching pyramid modules at the encoder end, it is often neglected at the fundamental block level due to increased inference time. Furthermore, the attention weights are mostly generated at a single object scale by only using the high-level feature representations. To solve the first problem, a key module for the basic block, the fast hybrid module, has been proposed. This module uses a hybrid approach to learn multiscale features by combining dilated kernels and downsampling operations in a parallel three-branch structure. To solve the second problem, a novel attention module, the multiscale attention module (MSAM), is proposed. MSAM uniquely generates context weights at varying scales from the low-level features rich in object boundary and edge information and multiplies them by the high-level semantic features obtained from the encoder. With these modules, a novel encoder–decoder network named context guided multiscale attention network is proposed. With only 0.54 million parameters, the network achieves 73.4% and 68.1% mean IoU accuracy at 128.24 and 85.5 FPS on the Cityscapes and CamVid datasets, respectively. In addition, the network can run in real-time on embedded GPUs with resource constraints. Through extensive ablation studies, the effectiveness of the proposed modules and network is shown. The qualitative results on unseen data demonstrate the robustness of the method.
轻量级深度神经网络在资源受限设备下自动驾驶的实时语义分割中发挥了关键作用,由于对象大小不同,需要在多个尺度上有效地学习局部语义和全局上下文。最近的方法设计了浅和轻量的骨干,具有较小的接受场,以更快的推理,以及额外的机制,如注意力,以补偿由于轻量设计造成的精度损失。虽然一些方法通过在编码器端附加金字塔模块来利用多尺度特征学习,但由于增加了推理时间,在基本块级别经常被忽略。此外,注意力权重大多是在单个对象尺度上生成的,只使用高级特征表示。为了解决第一个问题,提出了一个基本模块的关键模块——快速混合模块。本模块采用混合方法,通过在并行三分支结构中结合扩展核和下采样操作来学习多尺度特征。为了解决第二个问题,提出了一种新的注意模块——多尺度注意模块(MSAM)。MSAM从富含目标边界和边缘信息的低级特征中唯一地生成不同尺度的上下文权重,并将其乘以从编码器获得的高级语义特征。利用这些模块,提出了一种新的编码器-解码器网络——上下文引导多尺度注意网络。该网络仅使用54万个参数,在cityscape和CamVid数据集上分别以128.24帧和85.5帧的平均IoU精度达到73.4%和68.1%。此外,网络可以在资源受限的嵌入式gpu上实时运行。通过广泛的烧蚀研究,表明了所提出的模块和网络的有效性。对未知数据的定性结果表明了该方法的鲁棒性。
{"title":"Context-Guided Multiscale Attention for Real-Time Semantic Segmentation of Road Scene","authors":"Saquib Mazhar;Nadeem Atif;M.K. Bhuyan;Shaik Rafi Ahamed","doi":"10.1109/TAI.2025.3606904","DOIUrl":"https://doi.org/10.1109/TAI.2025.3606904","url":null,"abstract":"Lightweight deep neural networks have played a pivotal role in real-time semantic segmentation for autonomous driving in resource-constrained devices, which need to effectively learn the local semantics and global context at multiple scales due to varying object sizes. Recent methods design shallow and lightweight backbones with a small receptive field for faster inference, along with additional mechanisms such as attention to compensate for the accuracy loss due to the lightweight design. While some methods have exploited multiscale feature learning by attaching pyramid modules at the encoder end, it is often neglected at the fundamental block level due to increased inference time. Furthermore, the attention weights are mostly generated at a single object scale by only using the high-level feature representations. To solve the first problem, a key module for the basic block, the fast hybrid module, has been proposed. This module uses a hybrid approach to learn multiscale features by combining dilated kernels and downsampling operations in a parallel three-branch structure. To solve the second problem, a novel attention module, the multiscale attention module (MSAM), is proposed. MSAM uniquely generates context weights at varying scales from the low-level features rich in object boundary and edge information and multiplies them by the high-level semantic features obtained from the encoder. With these modules, a novel encoder–decoder network named context guided multiscale attention network is proposed. With only 0.54 million parameters, the network achieves 73.4% and 68.1% mean IoU accuracy at 128.24 and 85.5 FPS on the Cityscapes and CamVid datasets, respectively. In addition, the network can run in real-time on embedded GPUs with resource constraints. Through extensive ablation studies, the effectiveness of the proposed modules and network is shown. The qualitative results on unseen data demonstrate the robustness of the method.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1761-1775"},"PeriodicalIF":0.0,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Let Invariant Learning Inspire Neighbor-Shift Generalization on Graphs 让不变学习启发图的邻移泛化
Pub Date : 2025-09-05 DOI: 10.1109/TAI.2025.3605894
Jiaxing Li;Jiayi Gao;Binhao Gu;Youyong Kong
Graph neural networks (GNNs) have achieved strong performance on various graph learning tasks under the assumption of independently and identically distributed (IID) data. However, recent studies reveal that GNNs suffer from performance drops under distribution shifts, prompting growing interest in out-of-distribution (OOD) generalization. In this work, we identify a previously underexplored challenge Neighbor Shift, which refers to structural inconsistencies in node neighborhoods across environments. We analyze its characteristics and demonstrate its negative impact on node-level classification. To tackle this issue, we propose the neighbor-shift robust GNN (NSRGNN), which disentangles invariant and variant subgraphs through conflict-based structure analysis, infers latent environments using the variant components, and regularizes semantic consistency of node representations across inferred environments. Extensive experiments on both real-world and synthetic benchmarks show that NSRGNN consistently outperforms strong OOD baselines and exhibits robust generalization under diverse structural shifts.
在独立同分布(IID)数据假设下,图神经网络(gnn)在各种图学习任务上取得了较好的性能。然而,最近的研究表明,gnn在分布变化下的性能下降,促使人们对分布外(OOD)泛化的兴趣日益增加。在这项工作中,我们确定了一个以前未被充分探索的挑战邻居转移,它指的是跨环境节点邻居的结构不一致性。我们分析了它的特征,并证明了它对节点级分类的负面影响。为了解决这个问题,我们提出了邻移鲁棒GNN (NSRGNN),它通过基于冲突的结构分析分离不变子图和变子图,使用变分量推断潜在环境,并在推断环境中规范节点表示的语义一致性。在真实世界和合成基准上的大量实验表明,NSRGNN始终优于强OOD基线,并在不同的结构变化下表现出强大的泛化能力。
{"title":"Let Invariant Learning Inspire Neighbor-Shift Generalization on Graphs","authors":"Jiaxing Li;Jiayi Gao;Binhao Gu;Youyong Kong","doi":"10.1109/TAI.2025.3605894","DOIUrl":"https://doi.org/10.1109/TAI.2025.3605894","url":null,"abstract":"Graph neural networks (GNNs) have achieved strong performance on various graph learning tasks under the assumption of independently and identically distributed (IID) data. However, recent studies reveal that GNNs suffer from performance drops under distribution shifts, prompting growing interest in out-of-distribution (OOD) generalization. In this work, we identify a previously underexplored challenge <italic>Neighbor Shift</i>, which refers to structural inconsistencies in node neighborhoods across environments. We analyze its characteristics and demonstrate its negative impact on node-level classification. To tackle this issue, we propose the neighbor-shift robust GNN (NSRGNN), which disentangles invariant and variant subgraphs through conflict-based structure analysis, infers latent environments using the variant components, and regularizes semantic consistency of node representations across inferred environments. Extensive experiments on both real-world and synthetic benchmarks show that NSRGNN consistently outperforms strong OOD baselines and exhibits robust generalization under diverse structural shifts.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1690-1701"},"PeriodicalIF":0.0,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond Accurate Distillation: Calibrated Knowledge Distillation for Reliable Predictions 超越精确蒸馏:校准知识蒸馏可靠的预测
Pub Date : 2025-09-04 DOI: 10.1109/TAI.2025.3605902
Ishan Mishra;Vamsi Krishna Sethu;Deepak Mishra
Knowledge distillation (KD) is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which, in general, is comparatively large and deep. These teacher networks are pretrained and often uncalibrated, as no calibration technique is applied to the teacher model while training. Calibration of a network measures the probability of correctness for any of its predictions, which is crucial for high-risk domains where reliable predictions are essential. In this article, we study how to obtain a calibrated student from an uncalibrated teacher. Our approach relies on the fusion of the data-augmentation techniques, including but not limited to Mixup and CutMix, with KD. We incorporate and analyze the impact of focal loss in the distillation framework to further improve the calibration of the student model. We perform extensive experiments to validate our approach on various datasets, including CIFAR-100, TinyImageNet, ImageNet, and diabetic retinopathy (DR) datasets, and compare it with various techniques such as contrastive representation distillation (CRD), relational knowledge distillation (RKD), decoupled knowledge distillation (DKD), and multilevel logit distillation (MLLD) to obtain calibrated student models. Furthermore, we conduct an ablation study to dissect the influence of augmentation techniques and the integration of focal loss. Additionally, we assess the robustness of our approach by evaluating its performance on corrupted CIFAR-100C data, demonstrating its consistent and reliable outcomes even under challenging conditions.
知识蒸馏(Knowledge distillation, KD)是一种常用的技术,通过从教师网络中传递信息来提高浅层学生网络的性能,而教师网络通常相对较大和较深。这些教师网络是预先训练的,通常是未经校准的,因为在培训时没有对教师模型应用校准技术。网络的校准测量其任何预测的正确性概率,这对于需要可靠预测的高风险领域至关重要。在本文中,我们研究了如何从一个未校准的老师那里获得一个校准的学生。我们的方法依赖于数据增强技术的融合,包括但不限于Mixup和CutMix,以及KD。我们在蒸馏框架中纳入并分析了焦损失的影响,以进一步改进学生模型的校准。我们进行了大量的实验来验证我们的方法在各种数据集上的有效性,包括CIFAR-100、TinyImageNet、ImageNet和糖尿病视网膜病变(DR)数据集,并将其与各种技术(如对比表示蒸馏(CRD)、关系知识蒸馏(RKD)、解耦知识蒸馏(DKD)和多层logit蒸馏(MLLD))进行比较,以获得校准的学生模型。此外,我们进行消融研究,以剖析增强技术和焦点损失整合的影响。此外,我们通过评估其在损坏的CIFAR-100C数据上的性能来评估我们的方法的鲁棒性,证明即使在具有挑战性的条件下其结果也是一致和可靠的。
{"title":"Beyond Accurate Distillation: Calibrated Knowledge Distillation for Reliable Predictions","authors":"Ishan Mishra;Vamsi Krishna Sethu;Deepak Mishra","doi":"10.1109/TAI.2025.3605902","DOIUrl":"https://doi.org/10.1109/TAI.2025.3605902","url":null,"abstract":"Knowledge distillation (KD) is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which, in general, is comparatively large and deep. These teacher networks are pretrained and often uncalibrated, as no calibration technique is applied to the teacher model while training. Calibration of a network measures the probability of correctness for any of its predictions, which is crucial for high-risk domains where reliable predictions are essential. In this article, we study how to obtain a calibrated student from an uncalibrated teacher. Our approach relies on the fusion of the data-augmentation techniques, including but not limited to Mixup and CutMix, with KD. We incorporate and analyze the impact of focal loss in the distillation framework to further improve the calibration of the student model. We perform extensive experiments to validate our approach on various datasets, including CIFAR-100, TinyImageNet, ImageNet, and diabetic retinopathy (DR) datasets, and compare it with various techniques such as contrastive representation distillation (CRD), relational knowledge distillation (RKD), decoupled knowledge distillation (DKD), and multilevel logit distillation (MLLD) to obtain calibrated student models. Furthermore, we conduct an ablation study to dissect the influence of augmentation techniques and the integration of focal loss. Additionally, we assess the robustness of our approach by evaluating its performance on corrupted CIFAR-100C data, demonstrating its consistent and reliable outcomes even under challenging conditions.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1702-1714"},"PeriodicalIF":0.0,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Positive-Unlabeled Learning Approach With Self-Correcting Regularized Risk 具有自校正正则化风险的积极无标签学习方法
Pub Date : 2025-09-04 DOI: 10.1109/TAI.2025.3605890
Sayantan Saha;Atif Hassan;Jiaul H. Paik
The primary objective of positive unlabeled learning (PU learning) is to train a binary classifier with positively labeled data and unlabeled data. An inherent aspect of this approach involves incorporating the positive class prior of unlabeled data directly into the classification process, which is typically challenging in real-world scenarios. Moreover, existing studies often lack evaluations of PU classifiers without involving the positive class prior (true or estimated) of the unlabeled data, representing a significant research gap. In this article, we introduce a robust, two-step PU learning algorithm by incorporating a potential negative sampler in step 1 (warm start) and minimizing a self-correcting regularized risk function in step 2. The risk function possesses a self-correcting property that attempts to mitigate the weakness of the potential negative sampler in the warm start step. The risk function enables us to enhance robustness in the presence of mislabeled candidate negative samples. We demonstrate the effectiveness of our method on image as well as text benchmarks. Results show that the proposed method consistently outperforms the state-of-the-art (SOTA) PU learning algorithms.
正无标记学习(PU学习)的主要目标是训练一个具有正标记数据和未标记数据的二分类器。这种方法的一个固有方面涉及将未标记数据的正类先验直接纳入分类过程,这在现实场景中通常具有挑战性。此外,现有的研究往往缺乏对PU分类器的评估,而没有涉及未标记数据的正类先验(真实或估计),这代表了一个重大的研究空白。在本文中,我们介绍了一种鲁棒的两步PU学习算法,方法是在第1步(热启动)中加入一个潜在的负采样器,并在第2步中最小化一个自校正正则化风险函数。风险函数具有自校正特性,试图减轻潜在负采样器在热启动步骤中的弱点。风险函数使我们能够在存在错误标记的候选阴性样本时增强鲁棒性。我们在图像和文本基准测试中证明了我们的方法的有效性。结果表明,该方法始终优于最先进的(SOTA) PU学习算法。
{"title":"A Positive-Unlabeled Learning Approach With Self-Correcting Regularized Risk","authors":"Sayantan Saha;Atif Hassan;Jiaul H. Paik","doi":"10.1109/TAI.2025.3605890","DOIUrl":"https://doi.org/10.1109/TAI.2025.3605890","url":null,"abstract":"The primary objective of positive unlabeled learning (PU learning) is to train a binary classifier with positively labeled data and unlabeled data. An inherent aspect of this approach involves incorporating the positive class prior of unlabeled data directly into the classification process, which is typically challenging in real-world scenarios. Moreover, existing studies often lack evaluations of PU classifiers without involving the positive class prior (true or estimated) of the unlabeled data, representing a significant research gap. In this article, we introduce a robust, two-step PU learning algorithm by incorporating a potential negative sampler in step 1 (warm start) and minimizing a self-correcting regularized risk function in step 2. The risk function possesses a self-correcting property that attempts to mitigate the weakness of the potential negative sampler in the warm start step. The risk function enables us to enhance robustness in the presence of mislabeled candidate negative samples. We demonstrate the effectiveness of our method on image as well as text benchmarks. Results show that the proposed method consistently outperforms the state-of-the-art (SOTA) PU learning algorithms.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1677-1689"},"PeriodicalIF":0.0,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ARD-LoRA: Dynamic Rank Allocation for Parameter-Efficient Fine-Tuning of Foundation Models With Heterogeneous Adaptation Needs 基于异构自适应需求的基础模型参数有效微调的动态等级分配
Pub Date : 2025-09-04 DOI: 10.1109/TAI.2025.3605569
Haseeb Ullah Khan Shinwari;Muhammad Usama
Conventional low-rank adaptation (LoRA) methods employ a fixed rank, imposing uniform adaptation across transformer layers and attention heads despite their heterogeneous learning dynamics. This article introduces adaptive rank dynamic LoRA (ARD-LoRA), a novel framework that automates rank allocation through learnable scaling factors. These factors are optimized via a meta-objective balancing task performance and parameter efficiency, incorporating $ell_{1}$ sparsity for minimal rank and total variation regularization for stable rank transitions. ARD-LoRA enables continuous, differentiable, per-head rank adaptation. Experiments on LLAMA-3.1-70B and PaliGemma-2 demonstrate ARD-LoRA’s efficacy, achieving up to 99.3% of full fine-tuning performance with only 0.32% trainable parameters, outperforming strong baselines such as DoRA and AdaLoRA. Furthermore, it reduces multimodal adaptation memory by 41%. These results establish dynamic, fine-grained rank allocation as a critical paradigm for efficient foundation model adaptation.
传统的低秩自适应(LoRA)方法采用固定秩,在变压器层和注意头之间施加统一的自适应,尽管它们的学习动态是异构的。本文介绍了自适应等级动态LoRA (ARD-LoRA),这是一种通过可学习缩放因子自动分配等级的新框架。这些因素通过元目标平衡任务性能和参数效率进行优化,结合最小秩的$ell_{1}$稀疏性和稳定秩转换的总变异正则化。ARD-LoRA可以实现连续的、可微分的、按人头排序的适应。在LLAMA-3.1-70B和PaliGemma-2上的实验证明了ARD-LoRA的有效性,仅用0.32%的可训练参数就能达到99.3%的全微调性能,优于DoRA和AdaLoRA等强基线。此外,它将多模态适应记忆降低了41%。这些结果建立了动态的、细粒度的等级分配作为有效的基础模型适应的关键范式。
{"title":"ARD-LoRA: Dynamic Rank Allocation for Parameter-Efficient Fine-Tuning of Foundation Models With Heterogeneous Adaptation Needs","authors":"Haseeb Ullah Khan Shinwari;Muhammad Usama","doi":"10.1109/TAI.2025.3605569","DOIUrl":"https://doi.org/10.1109/TAI.2025.3605569","url":null,"abstract":"Conventional low-rank adaptation (LoRA) methods employ a fixed rank, imposing uniform adaptation across transformer layers and attention heads despite their heterogeneous learning dynamics. This article introduces adaptive rank dynamic LoRA (ARD-LoRA), a novel framework that automates rank allocation through learnable scaling factors. These factors are optimized via a meta-objective balancing task performance and parameter efficiency, incorporating <inline-formula><tex-math>$ell_{1}$</tex-math></inline-formula> sparsity for minimal rank and total variation regularization for stable rank transitions. ARD-LoRA enables continuous, differentiable, per-head rank adaptation. Experiments on LLAMA-3.1-70B and PaliGemma-2 demonstrate ARD-LoRA’s efficacy, achieving up to 99.3% of full fine-tuning performance with only 0.32% trainable parameters, outperforming strong baselines such as DoRA and AdaLoRA. Furthermore, it reduces multimodal adaptation memory by 41%. These results establish dynamic, fine-grained rank allocation as a critical paradigm for efficient foundation model adaptation.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1666-1676"},"PeriodicalIF":0.0,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Watermarking Language Models Through Language Models 通过语言模型对语言模型进行水印
Pub Date : 2025-09-02 DOI: 10.1109/TAI.2025.3605117
Agnibh Dasgupta;Abdullah All Tanvir;Xin Zhong
Watermarking the outputs of large language models (LLMs) is critical for provenance tracing, content regulation, and model accountability. Existing approaches often rely on access to model internals or are constrained by static rules and token-level perturbations. Moreover, the idea of steering generative behavior via prompt-based instruction control remains largely underexplored. We introduce a prompt-guided watermarking framework that operates entirely at the input level and requires no access to model parameters or decoding logits. The framework comprises three cooperating components: a prompting LM that synthesizes watermarking instructions from user prompts, a marking LM that generates watermarked outputs conditioned on these instructions, and a detecting LM trained to classify whether a response carries an embedded watermark. This modular design enables dynamic watermarking that adapts to individual prompts while remaining compatible with diverse LLM architectures, including both proprietary and open-weight models. We evaluate the framework over 25 combinations of prompting and marking LMs, such as GPT-4o, Mistral, LLaMA3, and DeepSeek. Experimental results show that watermark signals generalize across architectures and remain robust under fine-tuning, model distillation, and prompt-based adversarial attacks, demonstrating the effectiveness and robustness of the proposed approach.
对大型语言模型(llm)的输出进行水印对于溯源、内容监管和模型责任至关重要。现有的方法通常依赖于对模型内部的访问,或者受到静态规则和令牌级扰动的约束。此外,通过基于提示的指令控制来引导生成行为的想法在很大程度上仍未得到充分探索。我们引入了一个提示引导的水印框架,它完全在输入级运行,不需要访问模型参数或解码逻辑。该框架由三个协作组件组成:一个从用户提示中合成水印指令的提示LM,一个根据这些指令生成水印输出的标记LM,以及一个经过训练的检测LM,用于分类响应是否带有嵌入的水印。这种模块化设计使动态水印能够适应个人提示,同时与各种LLM架构(包括专有和开放权重模型)保持兼容。我们评估了25种提示和标记lm的组合,如gpt - 40、Mistral、LLaMA3和DeepSeek。实验结果表明,在微调、模型蒸馏和基于提示的对抗攻击下,水印信号具有跨架构的泛化性,并保持鲁棒性,证明了该方法的有效性和鲁棒性。
{"title":"Watermarking Language Models Through Language Models","authors":"Agnibh Dasgupta;Abdullah All Tanvir;Xin Zhong","doi":"10.1109/TAI.2025.3605117","DOIUrl":"https://doi.org/10.1109/TAI.2025.3605117","url":null,"abstract":"Watermarking the outputs of large language models (LLMs) is critical for provenance tracing, content regulation, and model accountability. Existing approaches often rely on access to model internals or are constrained by static rules and token-level perturbations. Moreover, the idea of steering generative behavior via prompt-based instruction control remains largely underexplored. We introduce a prompt-guided watermarking framework that operates entirely at the input level and requires no access to model parameters or decoding logits. The framework comprises three cooperating components: a prompting LM that synthesizes watermarking instructions from user prompts, a marking LM that generates watermarked outputs conditioned on these instructions, and a detecting LM trained to classify whether a response carries an embedded watermark. This modular design enables dynamic watermarking that adapts to individual prompts while remaining compatible with diverse LLM architectures, including both proprietary and open-weight models. We evaluate the framework over 25 combinations of prompting and marking LMs, such as GPT-4o, Mistral, LLaMA3, and DeepSeek. Experimental results show that watermark signals generalize across architectures and remain robust under fine-tuning, model distillation, and prompt-based adversarial attacks, demonstrating the effectiveness and robustness of the proposed approach.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1642-1651"},"PeriodicalIF":0.0,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedImp: Enhancing Federated Learning Convergence With Impurity-Based Weighting FedImp:利用基于杂质的加权增强联邦学习收敛性
Pub Date : 2025-09-02 DOI: 10.1109/TAI.2025.3605307
Hai Anh Tran;Cuong Ta;Truong X. Tran
Federated learning (FL) is a collaborative paradigm that enables multiple devices to train a global model while preserving local data privacy. A major challenge in FL is the nonindependent and identically distributed (non-IID) nature of data across devices, which hinders training efficiency and slows convergence. To tackle this, we propose federated impurity weighting (FedImp), a novel algorithm that quantifies each device’s contribution based on the informational content of its local data. These contributions are normalized to compute distinct aggregation weights for the global model update. Extensive experiments on EMNIST and CIFAR-10 datasets show that FedImp significantly improves convergence speed, reducing communication rounds by up to 64.4%, 27.8%, and 66.7% on EMNIST, and 44.2%, 44%, and 25.6% on CIFAR-10 compared to FedAvg, FedProx, and FedAdp, respectively. Under highly imbalanced data distributions, FedImp outperforms all baselines and achieves the highest accuracy. Overall, FedImp offers an effective solution to enhance FL efficiency in non-IID settings.
联邦学习(FL)是一种协作范例,它使多个设备能够在保护本地数据隐私的同时训练全局模型。FL的一个主要挑战是跨设备数据的非独立和同分布(non-IID)性质,这阻碍了训练效率并减慢了收敛速度。为了解决这个问题,我们提出了联邦杂质加权(FedImp),这是一种基于其本地数据的信息内容量化每个设备贡献的新算法。将这些贡献归一化,以便为全局模型更新计算不同的聚合权重。在EMNIST和CIFAR-10数据集上的大量实验表明,与fedag、FedProx和FedAdp相比,FedImp显著提高了收敛速度,在EMNIST上减少了64.4%、27.8%和66.7%的通信轮数,在CIFAR-10上分别减少了44.2%、44%和25.6%的通信轮数。在高度不平衡的数据分布下,fedip优于所有基线,达到最高的准确率。总的来说,fedip提供了一个有效的解决方案,以提高非iid设置的FL效率。
{"title":"FedImp: Enhancing Federated Learning Convergence With Impurity-Based Weighting","authors":"Hai Anh Tran;Cuong Ta;Truong X. Tran","doi":"10.1109/TAI.2025.3605307","DOIUrl":"https://doi.org/10.1109/TAI.2025.3605307","url":null,"abstract":"Federated learning (FL) is a collaborative paradigm that enables multiple devices to train a global model while preserving local data privacy. A major challenge in FL is the nonindependent and identically distributed (non-IID) nature of data across devices, which hinders training efficiency and slows convergence. To tackle this, we propose federated impurity weighting (FedImp), a novel algorithm that quantifies each device’s contribution based on the informational content of its local data. These contributions are normalized to compute distinct aggregation weights for the global model update. Extensive experiments on EMNIST and CIFAR-10 datasets show that FedImp significantly improves convergence speed, reducing communication rounds by up to 64.4%, 27.8%, and 66.7% on EMNIST, and 44.2%, 44%, and 25.6% on CIFAR-10 compared to FedAvg, FedProx, and FedAdp, respectively. Under highly imbalanced data distributions, FedImp outperforms all baselines and achieves the highest accuracy. Overall, FedImp offers an effective solution to enhance FL efficiency in non-IID settings.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1652-1665"},"PeriodicalIF":0.0,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Artificial Intelligence Publication Information IEEE人工智能学报
Pub Date : 2025-09-01 DOI: 10.1109/TAI.2025.3599608
{"title":"IEEE Transactions on Artificial Intelligence Publication Information","authors":"","doi":"10.1109/TAI.2025.3599608","DOIUrl":"https://doi.org/10.1109/TAI.2025.3599608","url":null,"abstract":"","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 9","pages":"C2-C2"},"PeriodicalIF":0.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11146283","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Safety Analysis for LLMs: A Benchmark, an Assessment, and a Path Forward 法学硕士在线安全分析:基准、评估和前进之路
Pub Date : 2025-08-29 DOI: 10.1109/TAI.2025.3603547
Xuan Xie;Jiayang Song;Zhehua Zhou;Yuheng Huang;Da Song;Lei Ma
While large language models (LLMs) have seen widespread applications across numerous fields, their limited interpretability poses concerns regarding their safe operations from multiple aspects, e.g., truthfulness and toxicity. Recent research has started developing quality assurance methods for LLMs, introducing techniques such as offline detectors or uncertainty estimation methods. However, these approaches mainly focus on postgeneration analysis, leaving the online safety analysis for LLMs during the generation phase an unexplored area. To bridge this gap, we conduct in this work a comprehensive evaluation of the effectiveness of existing online safety analysis methods on LLMs. We begin with a pilot study that validates the feasibility of detecting unsafe outputs in the early generation process. Following this, we establish the first publicly available benchmark of online safety analysis for LLMs, including a broad spectrum of methods, models, tasks, datasets, and evaluation metrics. Utilizing this benchmark, we extensively analyze the performance of state-of-the-art online safety analysis methods on both open-source and closed-source LLMs. This analysis reveals the strengths and weaknesses of individual methods and offers valuable insights into selecting the most appropriate method based on specific application scenarios and task requirements. Furthermore, we also explore the potential of using hybridization methods, i.e., combining multiple methods to derive a collective safety conclusion, to enhance the efficacy of online safety analysis. Our findings indicate a promising direction for the development of trustworthy assurance methodologies for LLMs, facilitating their reliable deployments across diverse domains.
虽然大型语言模型(llm)已经在许多领域得到了广泛的应用,但它们有限的可解释性从多个方面引起了对其安全操作的担忧,例如,真实性和毒性。最近的研究已经开始为法学硕士开发质量保证方法,引入离线检测器或不确定度估计方法等技术。然而,这些方法主要集中在生成后分析,使法学硕士在生成阶段的在线安全分析成为一个未开发的领域。为了弥补这一差距,我们在这项工作中对现有的法学硕士在线安全分析方法的有效性进行了全面的评估。我们从一项试点研究开始,验证在早期生产过程中检测不安全输出的可行性。在此之后,我们为法学硕士建立了第一个公开可用的在线安全分析基准,包括广泛的方法、模型、任务、数据集和评估指标。利用这个基准,我们广泛地分析了最先进的在线安全分析方法在开源和闭源llm上的性能。该分析揭示了各个方法的优缺点,并为根据特定的应用程序场景和任务需求选择最合适的方法提供了有价值的见解。此外,我们还探讨了使用杂交方法的潜力,即结合多种方法得出集体安全结论,以提高在线安全分析的有效性。我们的研究结果为法学硕士可信保证方法的发展指明了一个有希望的方向,促进了法学硕士在不同领域的可靠部署。
{"title":"Online Safety Analysis for LLMs: A Benchmark, an Assessment, and a Path Forward","authors":"Xuan Xie;Jiayang Song;Zhehua Zhou;Yuheng Huang;Da Song;Lei Ma","doi":"10.1109/TAI.2025.3603547","DOIUrl":"https://doi.org/10.1109/TAI.2025.3603547","url":null,"abstract":"While large language models (LLMs) have seen widespread applications across numerous fields, their limited interpretability poses concerns regarding their safe operations from multiple aspects, e.g., truthfulness and toxicity. Recent research has started developing quality assurance methods for LLMs, introducing techniques such as offline detectors or uncertainty estimation methods. However, these approaches mainly focus on postgeneration analysis, leaving the online safety analysis for LLMs during the generation phase an unexplored area. To bridge this gap, we conduct in this work a comprehensive evaluation of the effectiveness of existing online safety analysis methods on LLMs. We begin with a pilot study that validates the feasibility of detecting unsafe outputs in the early generation process. Following this, we establish the first publicly available benchmark of online safety analysis for LLMs, including a broad spectrum of methods, models, tasks, datasets, and evaluation metrics. Utilizing this benchmark, we extensively analyze the performance of state-of-the-art online safety analysis methods on both open-source and closed-source LLMs. This analysis reveals the strengths and weaknesses of individual methods and offers valuable insights into selecting the most appropriate method based on specific application scenarios and task requirements. Furthermore, we also explore the potential of using hybridization methods, i.e., combining multiple methods to derive a collective safety conclusion, to enhance the efficacy of online safety analysis. Our findings indicate a promising direction for the development of trustworthy assurance methodologies for LLMs, facilitating their reliable deployments across diverse domains.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1626-1641"},"PeriodicalIF":0.0,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeuroAMP: A Novel End-to-End General Purpose Deep Neural Amplifier for Personalized Hearing Aids NeuroAMP:一种新型的端到端通用深度神经放大器,用于个性化助听器
Pub Date : 2025-08-29 DOI: 10.1109/TAI.2025.3603538
Shafique Ahmed;Ryandhimas E. Zezario;Hui-Guan Yuan;Amir Hussain;Hsin-Min Wang;Wei-Ho Chung;Yu Tsao
The prevalence of hearing aids is increasing. However, optimizing their amplification remains challenging due to the complexity of integrating multiple components in traditional methods. To address this, we present NeuroAMP, a novel deep neural network for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages spectral features and the listener’s audiogram as inputs, and we explore four architectures: convolutional neural network (CNN), long short-term memory (LSTM), convolutional recurrent neural network (CRNN), and Transformer. We also introduce Speech Enhancement NeuroAMP (SE-NeuroAMP), an extension that integrates noise reduction with amplification for improved real-world performance. To enhance generalization, we employed a comprehensive data augmentation strategy during training on diverse speech (TIMIT, TMHINT) and music (Cadenza Challenge MUSIC) datasets. Evaluation using the Hearing Aid Speech Perception Index (HASPI), Hearing Aid Speech Quality Index (HASQI), and Hearing Aid Audio Quality Index (HAAQI) shows that the Transformer-based NeuroAMP achieves the best performance, with SRCC scores of 0.992 (HASPI) and 0.990 (HASQI) on TIMIT, and 0.9738 (HAAQI) on Cadenza dataset. Notably, the augmentation strategy maintains robust performance on unseen datasets (e.g., VoiceBank-DEMAND and MUSDB18-HQ). Furthermore, SE-NeuroAMP outperforms both the conventional NAL-R+WDRC method and a two-stage baseline on the VoiceBank-DEMAND dataset, achieving HASPI of 0.90 and HASQI of 0.59. These results highlight the strong potential of NeuroAMP and SE-NeuroAMP to provide a novel and effective framework for personalized hearing aid amplification.
助听器的普及程度越来越高。然而,由于传统方法中集成多个组件的复杂性,优化其放大仍然具有挑战性。为了解决这个问题,我们提出了NeuroAMP,一种新型的深度神经网络,用于助听器的端到端个性化放大。NeuroAMP利用频谱特征和听者的听力图作为输入,我们探索了四种架构:卷积神经网络(CNN)、长短期记忆(LSTM)、卷积循环神经网络(CRNN)和变压器。我们还介绍了语音增强NeuroAMP (SE-NeuroAMP),这是一种扩展,集成了降噪和放大,以提高现实世界的性能。为了增强泛化,我们在不同语音(TIMIT, TMHINT)和音乐(Cadenza Challenge music)数据集的训练中采用了全面的数据增强策略。使用助听器语音感知指数(HASPI)、助听器语音质量指数(HASQI)和助听器音频质量指数(HAAQI)进行评价,结果表明,基于变压器的NeuroAMP表现最佳,在TIMIT上的SRCC得分分别为0.992 (HASPI)和0.990 (HASQI),在Cadenza数据集上的SRCC得分为0.9738 (HAAQI)。值得注意的是,增强策略在未见过的数据集(例如,VoiceBank-DEMAND和MUSDB18-HQ)上保持了强大的性能。此外,SE-NeuroAMP在语音银行-需求数据集上优于传统的NAL-R+WDRC方法和两阶段基线,实现了0.90的HASPI和0.59的HASQI。这些结果突出了NeuroAMP和SE-NeuroAMP的强大潜力,为个性化助听器放大提供了一种新颖有效的框架。
{"title":"NeuroAMP: A Novel End-to-End General Purpose Deep Neural Amplifier for Personalized Hearing Aids","authors":"Shafique Ahmed;Ryandhimas E. Zezario;Hui-Guan Yuan;Amir Hussain;Hsin-Min Wang;Wei-Ho Chung;Yu Tsao","doi":"10.1109/TAI.2025.3603538","DOIUrl":"https://doi.org/10.1109/TAI.2025.3603538","url":null,"abstract":"The prevalence of hearing aids is increasing. However, optimizing their amplification remains challenging due to the complexity of integrating multiple components in traditional methods. To address this, we present NeuroAMP, a novel deep neural network for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages spectral features and the listener’s audiogram as inputs, and we explore four architectures: convolutional neural network (CNN), long short-term memory (LSTM), convolutional recurrent neural network (CRNN), and Transformer. We also introduce Speech Enhancement NeuroAMP (SE-NeuroAMP), an extension that integrates noise reduction with amplification for improved real-world performance. To enhance generalization, we employed a comprehensive data augmentation strategy during training on diverse speech (TIMIT, TMHINT) and music (Cadenza Challenge MUSIC) datasets. Evaluation using the Hearing Aid Speech Perception Index (HASPI), Hearing Aid Speech Quality Index (HASQI), and Hearing Aid Audio Quality Index (HAAQI) shows that the Transformer-based NeuroAMP achieves the best performance, with SRCC scores of 0.992 (HASPI) and 0.990 (HASQI) on TIMIT, and 0.9738 (HAAQI) on Cadenza dataset. Notably, the augmentation strategy maintains robust performance on unseen datasets (e.g., VoiceBank-DEMAND and MUSDB18-HQ). Furthermore, SE-NeuroAMP outperforms both the conventional NAL-R+WDRC method and a two-stage baseline on the VoiceBank-DEMAND dataset, achieving HASPI of 0.90 and HASQI of 0.59. These results highlight the strong potential of NeuroAMP and SE-NeuroAMP to provide a novel and effective framework for personalized hearing aid amplification.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1610-1625"},"PeriodicalIF":0.0,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11145141","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on artificial intelligence
全部 Basin Res. Chem. Ecol. Engineering Science and Technology, an International Journal Ann. Glaciol. Big Earth Data Environmental Claims Journal Appl. Clay Sci. Environ. Educ. Res, Mon. Weather Rev. COMP BIOCHEM PHYS C Espacio Tiempo y Forma. Serie VI, Geografía Geochem. J. "Radiation and Risk" Bulletin of the National Radiation and Epidemiological Registry 国际生物医学工程杂志 ACTA CHIR ORTHOP TR Atmos. Meas. Tech. Biomed Eng (NY) Nat. Hazards Earth Syst. Sci. European Journal of Biological Research Aust. J. Earth Sci. Yan Ke Xue Bao (Hong Kong) npj Clim. Atmos. Sci. ERN: Other Microeconomics: General Equilibrium & Disequilibrium Models of Financial Markets (Topic) J APPL METEOROL CLIM ERN: Other Macroeconomics: Aggregative Models (Topic) 2011 IEEE 2nd International Conference on Computing, Control and Industrial Engineering FOLIA PHONIATR LOGO Acta Oceanolog. Sin. ICARUS ACTA PETROL SIN ACTA HAEMATOL-BASEL Energy Systems ARCT ANTARCT ALP RES ArcheoSci.-Rev. Archeom. Revista de Obstetricia y Ginecologia de Venezuela "Laboratorio;" analisis clinicos, bacteriologia, inmunologia, parasitologia, hematologia, anatomia patologica, quimica clinica ACTA DIABETOL Contrib. Mineral. Petrol. Environ. Toxicol. Pharmacol. INDIAN J PURE AP PHY J. Geog. Sci. Clim. Change Miner. Deposita Ecol. Monogr. Ecol. Processes J. Meteorolog. Res. Curr. Appl Phys. J. Hydrol. GEOLOGY Engineering Structures and Technologies ERN: Other IO: Empirical Studies of Firms & Markets (Topic) APL Photonics ACTA GEOL SIN-ENGL ACTA ORTHOP BELG J. Atmos. Chem. Adv. Meteorol. ECOLOGY Conserv. Biol. Ecol. Eng. J. Clim. ENVIRON HEALTH-GLOB European Journal of Clinical and Experimental Medicine ACTA GASTRO-ENT BELG AAPG Bull. Hydrogeol. J. Acta Geochimica Acta Pharmaceutica Sinica B 非金属矿 ACTA CHIR BELG EUR PHYS J-SPEC TOP Geochim. Cosmochim. Acta ACTA CARDIOL SIN Archaeol. Anthropol. Sci. Space Weather Acta Geophys. Clean Technol. Environ. Policy Geosci. Front. ACTA GEOL POL Int. Geol. Rev. ARCHAEOMETRY Environ. Res. Lett. IZV-PHYS SOLID EART+ Geostand. Geoanal. Res. Carbon Balance Manage. Appl. Geochem. Geophys. Prospect. Adv. Atmos. Sci. Energy Storage ATMOSPHERE-BASEL [1993] Proceedings Eighth Annual IEEE Symposium on Logic in Computer Science Asia-Pac. J. Atmos. Sci. Environ. Pollut. Bioavailability Atmos. Res. Annu. Rev. Earth Planet. Sci. Environ. Chem. Am. Mineral. Am. J. Sci. Am. J. Phys. Anthropol. Clean-Soil Air Water Environ. Geochem. Health
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1