Lightweight deep neural networks have played a pivotal role in real-time semantic segmentation for autonomous driving in resource-constrained devices, which need to effectively learn the local semantics and global context at multiple scales due to varying object sizes. Recent methods design shallow and lightweight backbones with a small receptive field for faster inference, along with additional mechanisms such as attention to compensate for the accuracy loss due to the lightweight design. While some methods have exploited multiscale feature learning by attaching pyramid modules at the encoder end, it is often neglected at the fundamental block level due to increased inference time. Furthermore, the attention weights are mostly generated at a single object scale by only using the high-level feature representations. To solve the first problem, a key module for the basic block, the fast hybrid module, has been proposed. This module uses a hybrid approach to learn multiscale features by combining dilated kernels and downsampling operations in a parallel three-branch structure. To solve the second problem, a novel attention module, the multiscale attention module (MSAM), is proposed. MSAM uniquely generates context weights at varying scales from the low-level features rich in object boundary and edge information and multiplies them by the high-level semantic features obtained from the encoder. With these modules, a novel encoder–decoder network named context guided multiscale attention network is proposed. With only 0.54 million parameters, the network achieves 73.4% and 68.1% mean IoU accuracy at 128.24 and 85.5 FPS on the Cityscapes and CamVid datasets, respectively. In addition, the network can run in real-time on embedded GPUs with resource constraints. Through extensive ablation studies, the effectiveness of the proposed modules and network is shown. The qualitative results on unseen data demonstrate the robustness of the method.
{"title":"Context-Guided Multiscale Attention for Real-Time Semantic Segmentation of Road Scene","authors":"Saquib Mazhar;Nadeem Atif;M.K. Bhuyan;Shaik Rafi Ahamed","doi":"10.1109/TAI.2025.3606904","DOIUrl":"https://doi.org/10.1109/TAI.2025.3606904","url":null,"abstract":"Lightweight deep neural networks have played a pivotal role in real-time semantic segmentation for autonomous driving in resource-constrained devices, which need to effectively learn the local semantics and global context at multiple scales due to varying object sizes. Recent methods design shallow and lightweight backbones with a small receptive field for faster inference, along with additional mechanisms such as attention to compensate for the accuracy loss due to the lightweight design. While some methods have exploited multiscale feature learning by attaching pyramid modules at the encoder end, it is often neglected at the fundamental block level due to increased inference time. Furthermore, the attention weights are mostly generated at a single object scale by only using the high-level feature representations. To solve the first problem, a key module for the basic block, the fast hybrid module, has been proposed. This module uses a hybrid approach to learn multiscale features by combining dilated kernels and downsampling operations in a parallel three-branch structure. To solve the second problem, a novel attention module, the multiscale attention module (MSAM), is proposed. MSAM uniquely generates context weights at varying scales from the low-level features rich in object boundary and edge information and multiplies them by the high-level semantic features obtained from the encoder. With these modules, a novel encoder–decoder network named context guided multiscale attention network is proposed. With only 0.54 million parameters, the network achieves 73.4% and 68.1% mean IoU accuracy at 128.24 and 85.5 FPS on the Cityscapes and CamVid datasets, respectively. In addition, the network can run in real-time on embedded GPUs with resource constraints. Through extensive ablation studies, the effectiveness of the proposed modules and network is shown. The qualitative results on unseen data demonstrate the robustness of the method.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1761-1775"},"PeriodicalIF":0.0,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-05DOI: 10.1109/TAI.2025.3605894
Jiaxing Li;Jiayi Gao;Binhao Gu;Youyong Kong
Graph neural networks (GNNs) have achieved strong performance on various graph learning tasks under the assumption of independently and identically distributed (IID) data. However, recent studies reveal that GNNs suffer from performance drops under distribution shifts, prompting growing interest in out-of-distribution (OOD) generalization. In this work, we identify a previously underexplored challenge Neighbor Shift, which refers to structural inconsistencies in node neighborhoods across environments. We analyze its characteristics and demonstrate its negative impact on node-level classification. To tackle this issue, we propose the neighbor-shift robust GNN (NSRGNN), which disentangles invariant and variant subgraphs through conflict-based structure analysis, infers latent environments using the variant components, and regularizes semantic consistency of node representations across inferred environments. Extensive experiments on both real-world and synthetic benchmarks show that NSRGNN consistently outperforms strong OOD baselines and exhibits robust generalization under diverse structural shifts.
{"title":"Let Invariant Learning Inspire Neighbor-Shift Generalization on Graphs","authors":"Jiaxing Li;Jiayi Gao;Binhao Gu;Youyong Kong","doi":"10.1109/TAI.2025.3605894","DOIUrl":"https://doi.org/10.1109/TAI.2025.3605894","url":null,"abstract":"Graph neural networks (GNNs) have achieved strong performance on various graph learning tasks under the assumption of independently and identically distributed (IID) data. However, recent studies reveal that GNNs suffer from performance drops under distribution shifts, prompting growing interest in out-of-distribution (OOD) generalization. In this work, we identify a previously underexplored challenge <italic>Neighbor Shift</i>, which refers to structural inconsistencies in node neighborhoods across environments. We analyze its characteristics and demonstrate its negative impact on node-level classification. To tackle this issue, we propose the neighbor-shift robust GNN (NSRGNN), which disentangles invariant and variant subgraphs through conflict-based structure analysis, infers latent environments using the variant components, and regularizes semantic consistency of node representations across inferred environments. Extensive experiments on both real-world and synthetic benchmarks show that NSRGNN consistently outperforms strong OOD baselines and exhibits robust generalization under diverse structural shifts.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1690-1701"},"PeriodicalIF":0.0,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-04DOI: 10.1109/TAI.2025.3605902
Ishan Mishra;Vamsi Krishna Sethu;Deepak Mishra
Knowledge distillation (KD) is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which, in general, is comparatively large and deep. These teacher networks are pretrained and often uncalibrated, as no calibration technique is applied to the teacher model while training. Calibration of a network measures the probability of correctness for any of its predictions, which is crucial for high-risk domains where reliable predictions are essential. In this article, we study how to obtain a calibrated student from an uncalibrated teacher. Our approach relies on the fusion of the data-augmentation techniques, including but not limited to Mixup and CutMix, with KD. We incorporate and analyze the impact of focal loss in the distillation framework to further improve the calibration of the student model. We perform extensive experiments to validate our approach on various datasets, including CIFAR-100, TinyImageNet, ImageNet, and diabetic retinopathy (DR) datasets, and compare it with various techniques such as contrastive representation distillation (CRD), relational knowledge distillation (RKD), decoupled knowledge distillation (DKD), and multilevel logit distillation (MLLD) to obtain calibrated student models. Furthermore, we conduct an ablation study to dissect the influence of augmentation techniques and the integration of focal loss. Additionally, we assess the robustness of our approach by evaluating its performance on corrupted CIFAR-100C data, demonstrating its consistent and reliable outcomes even under challenging conditions.
{"title":"Beyond Accurate Distillation: Calibrated Knowledge Distillation for Reliable Predictions","authors":"Ishan Mishra;Vamsi Krishna Sethu;Deepak Mishra","doi":"10.1109/TAI.2025.3605902","DOIUrl":"https://doi.org/10.1109/TAI.2025.3605902","url":null,"abstract":"Knowledge distillation (KD) is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which, in general, is comparatively large and deep. These teacher networks are pretrained and often uncalibrated, as no calibration technique is applied to the teacher model while training. Calibration of a network measures the probability of correctness for any of its predictions, which is crucial for high-risk domains where reliable predictions are essential. In this article, we study how to obtain a calibrated student from an uncalibrated teacher. Our approach relies on the fusion of the data-augmentation techniques, including but not limited to Mixup and CutMix, with KD. We incorporate and analyze the impact of focal loss in the distillation framework to further improve the calibration of the student model. We perform extensive experiments to validate our approach on various datasets, including CIFAR-100, TinyImageNet, ImageNet, and diabetic retinopathy (DR) datasets, and compare it with various techniques such as contrastive representation distillation (CRD), relational knowledge distillation (RKD), decoupled knowledge distillation (DKD), and multilevel logit distillation (MLLD) to obtain calibrated student models. Furthermore, we conduct an ablation study to dissect the influence of augmentation techniques and the integration of focal loss. Additionally, we assess the robustness of our approach by evaluating its performance on corrupted CIFAR-100C data, demonstrating its consistent and reliable outcomes even under challenging conditions.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1702-1714"},"PeriodicalIF":0.0,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-04DOI: 10.1109/TAI.2025.3605890
Sayantan Saha;Atif Hassan;Jiaul H. Paik
The primary objective of positive unlabeled learning (PU learning) is to train a binary classifier with positively labeled data and unlabeled data. An inherent aspect of this approach involves incorporating the positive class prior of unlabeled data directly into the classification process, which is typically challenging in real-world scenarios. Moreover, existing studies often lack evaluations of PU classifiers without involving the positive class prior (true or estimated) of the unlabeled data, representing a significant research gap. In this article, we introduce a robust, two-step PU learning algorithm by incorporating a potential negative sampler in step 1 (warm start) and minimizing a self-correcting regularized risk function in step 2. The risk function possesses a self-correcting property that attempts to mitigate the weakness of the potential negative sampler in the warm start step. The risk function enables us to enhance robustness in the presence of mislabeled candidate negative samples. We demonstrate the effectiveness of our method on image as well as text benchmarks. Results show that the proposed method consistently outperforms the state-of-the-art (SOTA) PU learning algorithms.
{"title":"A Positive-Unlabeled Learning Approach With Self-Correcting Regularized Risk","authors":"Sayantan Saha;Atif Hassan;Jiaul H. Paik","doi":"10.1109/TAI.2025.3605890","DOIUrl":"https://doi.org/10.1109/TAI.2025.3605890","url":null,"abstract":"The primary objective of positive unlabeled learning (PU learning) is to train a binary classifier with positively labeled data and unlabeled data. An inherent aspect of this approach involves incorporating the positive class prior of unlabeled data directly into the classification process, which is typically challenging in real-world scenarios. Moreover, existing studies often lack evaluations of PU classifiers without involving the positive class prior (true or estimated) of the unlabeled data, representing a significant research gap. In this article, we introduce a robust, two-step PU learning algorithm by incorporating a potential negative sampler in step 1 (warm start) and minimizing a self-correcting regularized risk function in step 2. The risk function possesses a self-correcting property that attempts to mitigate the weakness of the potential negative sampler in the warm start step. The risk function enables us to enhance robustness in the presence of mislabeled candidate negative samples. We demonstrate the effectiveness of our method on image as well as text benchmarks. Results show that the proposed method consistently outperforms the state-of-the-art (SOTA) PU learning algorithms.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1677-1689"},"PeriodicalIF":0.0,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-04DOI: 10.1109/TAI.2025.3605569
Haseeb Ullah Khan Shinwari;Muhammad Usama
Conventional low-rank adaptation (LoRA) methods employ a fixed rank, imposing uniform adaptation across transformer layers and attention heads despite their heterogeneous learning dynamics. This article introduces adaptive rank dynamic LoRA (ARD-LoRA), a novel framework that automates rank allocation through learnable scaling factors. These factors are optimized via a meta-objective balancing task performance and parameter efficiency, incorporating $ell_{1}$ sparsity for minimal rank and total variation regularization for stable rank transitions. ARD-LoRA enables continuous, differentiable, per-head rank adaptation. Experiments on LLAMA-3.1-70B and PaliGemma-2 demonstrate ARD-LoRA’s efficacy, achieving up to 99.3% of full fine-tuning performance with only 0.32% trainable parameters, outperforming strong baselines such as DoRA and AdaLoRA. Furthermore, it reduces multimodal adaptation memory by 41%. These results establish dynamic, fine-grained rank allocation as a critical paradigm for efficient foundation model adaptation.
{"title":"ARD-LoRA: Dynamic Rank Allocation for Parameter-Efficient Fine-Tuning of Foundation Models With Heterogeneous Adaptation Needs","authors":"Haseeb Ullah Khan Shinwari;Muhammad Usama","doi":"10.1109/TAI.2025.3605569","DOIUrl":"https://doi.org/10.1109/TAI.2025.3605569","url":null,"abstract":"Conventional low-rank adaptation (LoRA) methods employ a fixed rank, imposing uniform adaptation across transformer layers and attention heads despite their heterogeneous learning dynamics. This article introduces adaptive rank dynamic LoRA (ARD-LoRA), a novel framework that automates rank allocation through learnable scaling factors. These factors are optimized via a meta-objective balancing task performance and parameter efficiency, incorporating <inline-formula><tex-math>$ell_{1}$</tex-math></inline-formula> sparsity for minimal rank and total variation regularization for stable rank transitions. ARD-LoRA enables continuous, differentiable, per-head rank adaptation. Experiments on LLAMA-3.1-70B and PaliGemma-2 demonstrate ARD-LoRA’s efficacy, achieving up to 99.3% of full fine-tuning performance with only 0.32% trainable parameters, outperforming strong baselines such as DoRA and AdaLoRA. Furthermore, it reduces multimodal adaptation memory by 41%. These results establish dynamic, fine-grained rank allocation as a critical paradigm for efficient foundation model adaptation.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1666-1676"},"PeriodicalIF":0.0,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-02DOI: 10.1109/TAI.2025.3605117
Agnibh Dasgupta;Abdullah All Tanvir;Xin Zhong
Watermarking the outputs of large language models (LLMs) is critical for provenance tracing, content regulation, and model accountability. Existing approaches often rely on access to model internals or are constrained by static rules and token-level perturbations. Moreover, the idea of steering generative behavior via prompt-based instruction control remains largely underexplored. We introduce a prompt-guided watermarking framework that operates entirely at the input level and requires no access to model parameters or decoding logits. The framework comprises three cooperating components: a prompting LM that synthesizes watermarking instructions from user prompts, a marking LM that generates watermarked outputs conditioned on these instructions, and a detecting LM trained to classify whether a response carries an embedded watermark. This modular design enables dynamic watermarking that adapts to individual prompts while remaining compatible with diverse LLM architectures, including both proprietary and open-weight models. We evaluate the framework over 25 combinations of prompting and marking LMs, such as GPT-4o, Mistral, LLaMA3, and DeepSeek. Experimental results show that watermark signals generalize across architectures and remain robust under fine-tuning, model distillation, and prompt-based adversarial attacks, demonstrating the effectiveness and robustness of the proposed approach.
{"title":"Watermarking Language Models Through Language Models","authors":"Agnibh Dasgupta;Abdullah All Tanvir;Xin Zhong","doi":"10.1109/TAI.2025.3605117","DOIUrl":"https://doi.org/10.1109/TAI.2025.3605117","url":null,"abstract":"Watermarking the outputs of large language models (LLMs) is critical for provenance tracing, content regulation, and model accountability. Existing approaches often rely on access to model internals or are constrained by static rules and token-level perturbations. Moreover, the idea of steering generative behavior via prompt-based instruction control remains largely underexplored. We introduce a prompt-guided watermarking framework that operates entirely at the input level and requires no access to model parameters or decoding logits. The framework comprises three cooperating components: a prompting LM that synthesizes watermarking instructions from user prompts, a marking LM that generates watermarked outputs conditioned on these instructions, and a detecting LM trained to classify whether a response carries an embedded watermark. This modular design enables dynamic watermarking that adapts to individual prompts while remaining compatible with diverse LLM architectures, including both proprietary and open-weight models. We evaluate the framework over 25 combinations of prompting and marking LMs, such as GPT-4o, Mistral, LLaMA3, and DeepSeek. Experimental results show that watermark signals generalize across architectures and remain robust under fine-tuning, model distillation, and prompt-based adversarial attacks, demonstrating the effectiveness and robustness of the proposed approach.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1642-1651"},"PeriodicalIF":0.0,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-02DOI: 10.1109/TAI.2025.3605307
Hai Anh Tran;Cuong Ta;Truong X. Tran
Federated learning (FL) is a collaborative paradigm that enables multiple devices to train a global model while preserving local data privacy. A major challenge in FL is the nonindependent and identically distributed (non-IID) nature of data across devices, which hinders training efficiency and slows convergence. To tackle this, we propose federated impurity weighting (FedImp), a novel algorithm that quantifies each device’s contribution based on the informational content of its local data. These contributions are normalized to compute distinct aggregation weights for the global model update. Extensive experiments on EMNIST and CIFAR-10 datasets show that FedImp significantly improves convergence speed, reducing communication rounds by up to 64.4%, 27.8%, and 66.7% on EMNIST, and 44.2%, 44%, and 25.6% on CIFAR-10 compared to FedAvg, FedProx, and FedAdp, respectively. Under highly imbalanced data distributions, FedImp outperforms all baselines and achieves the highest accuracy. Overall, FedImp offers an effective solution to enhance FL efficiency in non-IID settings.
{"title":"FedImp: Enhancing Federated Learning Convergence With Impurity-Based Weighting","authors":"Hai Anh Tran;Cuong Ta;Truong X. Tran","doi":"10.1109/TAI.2025.3605307","DOIUrl":"https://doi.org/10.1109/TAI.2025.3605307","url":null,"abstract":"Federated learning (FL) is a collaborative paradigm that enables multiple devices to train a global model while preserving local data privacy. A major challenge in FL is the nonindependent and identically distributed (non-IID) nature of data across devices, which hinders training efficiency and slows convergence. To tackle this, we propose federated impurity weighting (FedImp), a novel algorithm that quantifies each device’s contribution based on the informational content of its local data. These contributions are normalized to compute distinct aggregation weights for the global model update. Extensive experiments on EMNIST and CIFAR-10 datasets show that FedImp significantly improves convergence speed, reducing communication rounds by up to 64.4%, 27.8%, and 66.7% on EMNIST, and 44.2%, 44%, and 25.6% on CIFAR-10 compared to FedAvg, FedProx, and FedAdp, respectively. Under highly imbalanced data distributions, FedImp outperforms all baselines and achieves the highest accuracy. Overall, FedImp offers an effective solution to enhance FL efficiency in non-IID settings.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1652-1665"},"PeriodicalIF":0.0,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29DOI: 10.1109/TAI.2025.3603547
Xuan Xie;Jiayang Song;Zhehua Zhou;Yuheng Huang;Da Song;Lei Ma
While large language models (LLMs) have seen widespread applications across numerous fields, their limited interpretability poses concerns regarding their safe operations from multiple aspects, e.g., truthfulness and toxicity. Recent research has started developing quality assurance methods for LLMs, introducing techniques such as offline detectors or uncertainty estimation methods. However, these approaches mainly focus on postgeneration analysis, leaving the online safety analysis for LLMs during the generation phase an unexplored area. To bridge this gap, we conduct in this work a comprehensive evaluation of the effectiveness of existing online safety analysis methods on LLMs. We begin with a pilot study that validates the feasibility of detecting unsafe outputs in the early generation process. Following this, we establish the first publicly available benchmark of online safety analysis for LLMs, including a broad spectrum of methods, models, tasks, datasets, and evaluation metrics. Utilizing this benchmark, we extensively analyze the performance of state-of-the-art online safety analysis methods on both open-source and closed-source LLMs. This analysis reveals the strengths and weaknesses of individual methods and offers valuable insights into selecting the most appropriate method based on specific application scenarios and task requirements. Furthermore, we also explore the potential of using hybridization methods, i.e., combining multiple methods to derive a collective safety conclusion, to enhance the efficacy of online safety analysis. Our findings indicate a promising direction for the development of trustworthy assurance methodologies for LLMs, facilitating their reliable deployments across diverse domains.
{"title":"Online Safety Analysis for LLMs: A Benchmark, an Assessment, and a Path Forward","authors":"Xuan Xie;Jiayang Song;Zhehua Zhou;Yuheng Huang;Da Song;Lei Ma","doi":"10.1109/TAI.2025.3603547","DOIUrl":"https://doi.org/10.1109/TAI.2025.3603547","url":null,"abstract":"While large language models (LLMs) have seen widespread applications across numerous fields, their limited interpretability poses concerns regarding their safe operations from multiple aspects, e.g., truthfulness and toxicity. Recent research has started developing quality assurance methods for LLMs, introducing techniques such as offline detectors or uncertainty estimation methods. However, these approaches mainly focus on postgeneration analysis, leaving the online safety analysis for LLMs during the generation phase an unexplored area. To bridge this gap, we conduct in this work a comprehensive evaluation of the effectiveness of existing online safety analysis methods on LLMs. We begin with a pilot study that validates the feasibility of detecting unsafe outputs in the early generation process. Following this, we establish the first publicly available benchmark of online safety analysis for LLMs, including a broad spectrum of methods, models, tasks, datasets, and evaluation metrics. Utilizing this benchmark, we extensively analyze the performance of state-of-the-art online safety analysis methods on both open-source and closed-source LLMs. This analysis reveals the strengths and weaknesses of individual methods and offers valuable insights into selecting the most appropriate method based on specific application scenarios and task requirements. Furthermore, we also explore the potential of using hybridization methods, i.e., combining multiple methods to derive a collective safety conclusion, to enhance the efficacy of online safety analysis. Our findings indicate a promising direction for the development of trustworthy assurance methodologies for LLMs, facilitating their reliable deployments across diverse domains.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1626-1641"},"PeriodicalIF":0.0,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29DOI: 10.1109/TAI.2025.3603538
Shafique Ahmed;Ryandhimas E. Zezario;Hui-Guan Yuan;Amir Hussain;Hsin-Min Wang;Wei-Ho Chung;Yu Tsao
The prevalence of hearing aids is increasing. However, optimizing their amplification remains challenging due to the complexity of integrating multiple components in traditional methods. To address this, we present NeuroAMP, a novel deep neural network for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages spectral features and the listener’s audiogram as inputs, and we explore four architectures: convolutional neural network (CNN), long short-term memory (LSTM), convolutional recurrent neural network (CRNN), and Transformer. We also introduce Speech Enhancement NeuroAMP (SE-NeuroAMP), an extension that integrates noise reduction with amplification for improved real-world performance. To enhance generalization, we employed a comprehensive data augmentation strategy during training on diverse speech (TIMIT, TMHINT) and music (Cadenza Challenge MUSIC) datasets. Evaluation using the Hearing Aid Speech Perception Index (HASPI), Hearing Aid Speech Quality Index (HASQI), and Hearing Aid Audio Quality Index (HAAQI) shows that the Transformer-based NeuroAMP achieves the best performance, with SRCC scores of 0.992 (HASPI) and 0.990 (HASQI) on TIMIT, and 0.9738 (HAAQI) on Cadenza dataset. Notably, the augmentation strategy maintains robust performance on unseen datasets (e.g., VoiceBank-DEMAND and MUSDB18-HQ). Furthermore, SE-NeuroAMP outperforms both the conventional NAL-R+WDRC method and a two-stage baseline on the VoiceBank-DEMAND dataset, achieving HASPI of 0.90 and HASQI of 0.59. These results highlight the strong potential of NeuroAMP and SE-NeuroAMP to provide a novel and effective framework for personalized hearing aid amplification.
{"title":"NeuroAMP: A Novel End-to-End General Purpose Deep Neural Amplifier for Personalized Hearing Aids","authors":"Shafique Ahmed;Ryandhimas E. Zezario;Hui-Guan Yuan;Amir Hussain;Hsin-Min Wang;Wei-Ho Chung;Yu Tsao","doi":"10.1109/TAI.2025.3603538","DOIUrl":"https://doi.org/10.1109/TAI.2025.3603538","url":null,"abstract":"The prevalence of hearing aids is increasing. However, optimizing their amplification remains challenging due to the complexity of integrating multiple components in traditional methods. To address this, we present NeuroAMP, a novel deep neural network for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages spectral features and the listener’s audiogram as inputs, and we explore four architectures: convolutional neural network (CNN), long short-term memory (LSTM), convolutional recurrent neural network (CRNN), and Transformer. We also introduce Speech Enhancement NeuroAMP (SE-NeuroAMP), an extension that integrates noise reduction with amplification for improved real-world performance. To enhance generalization, we employed a comprehensive data augmentation strategy during training on diverse speech (TIMIT, TMHINT) and music (Cadenza Challenge MUSIC) datasets. Evaluation using the Hearing Aid Speech Perception Index (HASPI), Hearing Aid Speech Quality Index (HASQI), and Hearing Aid Audio Quality Index (HAAQI) shows that the Transformer-based NeuroAMP achieves the best performance, with SRCC scores of 0.992 (HASPI) and 0.990 (HASQI) on TIMIT, and 0.9738 (HAAQI) on Cadenza dataset. Notably, the augmentation strategy maintains robust performance on unseen datasets (e.g., VoiceBank-DEMAND and MUSDB18-HQ). Furthermore, SE-NeuroAMP outperforms both the conventional NAL-R+WDRC method and a two-stage baseline on the VoiceBank-DEMAND dataset, achieving HASPI of 0.90 and HASQI of 0.59. These results highlight the strong potential of NeuroAMP and SE-NeuroAMP to provide a novel and effective framework for personalized hearing aid amplification.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1610-1625"},"PeriodicalIF":0.0,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11145141","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}