Pub Date : 2025-03-28DOI: 10.1109/TAI.2025.3556092
Yang Wang;Xue Li;Siguang Chen
Existing federated unlearning methods to eliminate the negative impact of malicious clients on the global model are influenced by unreasonable assumptions (e.g., an auxiliary dataset) or fail to balance model performance and efficiency. To overcome these shortcomings, we propose a malicious clients and contribution co-aware federated unlearning (MCC-Fed) method. Specifically, we introduce a method for detecting malicious clients to reduce their impact on the global model. Next, we design a contribution-aware metric, which accurately quantifies the negative impact of malicious clients on the global calculating their historical contribution ratio. Then, based on this metric, we propose a novel federated unlearning method in which benign clients use the contribution-aware metric as a regularization term to unlearn the influence of malicious clients, and restoring model performance. Experimental results demonstrate that our method effectively addresses the issue of excessive unlearning during the unlearning process, improves the efficiency of performance recovery, and enhances robustness against malicious clients. Federated unlearning effectively removes malicious clients’ influence while reducing training costs compared to retraining.
{"title":"Malicious Clients and Contribution Co-Aware Federated Unlearning","authors":"Yang Wang;Xue Li;Siguang Chen","doi":"10.1109/TAI.2025.3556092","DOIUrl":"https://doi.org/10.1109/TAI.2025.3556092","url":null,"abstract":"Existing federated unlearning methods to eliminate the negative impact of malicious clients on the global model are influenced by unreasonable assumptions (e.g., an auxiliary dataset) or fail to balance model performance and efficiency. To overcome these shortcomings, we propose a malicious clients and contribution co-aware federated unlearning (MCC-Fed) method. Specifically, we introduce a method for detecting malicious clients to reduce their impact on the global model. Next, we design a contribution-aware metric, which accurately quantifies the negative impact of malicious clients on the global calculating their historical contribution ratio. Then, based on this metric, we propose a novel federated unlearning method in which benign clients use the contribution-aware metric as a regularization term to unlearn the influence of malicious clients, and restoring model performance. Experimental results demonstrate that our method effectively addresses the issue of excessive unlearning during the unlearning process, improves the efficiency of performance recovery, and enhances robustness against malicious clients. Federated unlearning effectively removes malicious clients’ influence while reducing training costs compared to retraining.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 10","pages":"2848-2857"},"PeriodicalIF":0.0,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145196041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1109/TAI.2025.3574292
Hadi Al Khansa;Mariette Awad
The field of natural language generation (NLG) has undergone remarkable expansion, largely enabled by enhanced model architectures, affordable computing, and the availability of large datasets. With NLG systems finding increasing adoption across many applications, the imperative to evaluate their performance has grown exponentially. However, relying solely on human evaluation for evaluation is nonscalable. To address this challenge, it is important to explore more scalable evaluation methodologies that can ensure the continued development and efficacy of NLG systems. Presently, only a few automated evaluation metrics are commonly utilized, with BLEU and ROUGE being the predominant choices. Yet, these metrics have faced criticism for their limited correlation with human judgment, their focus on surface-level similarity, and their tendency to overlook semantic nuances. While transformer metrics have been introduced to capture semantic similarity, our study reveals scenarios where even these metrics fail. Considering these limitations, we propose and validate a novel metric called “COSMIC,” which incorporates contradiction detection with contextual embedding similarity. To illustrate these limitations and showcase the performance of COSMIC, we conducted a case study using a fine-tuned LLAMA model to transform questions and short answers into declarative sentences. This task, despite its significance in generating natural language inference datasets, has not received widespread exploration since 2018. Results show that COSMIC can capture cases of contradiction between the reference and generated text while staying highly correlated with embeddings similarity when the reference and generated text are consistent and semantically similar. BLEU, ROUGE, and most transformer-based metrics demonstrate an inability to identify contradictions.
{"title":"COSMIC: A Novel Contextualized Orientation Similarity Metric Incorporating Consistency for NLG Assessment","authors":"Hadi Al Khansa;Mariette Awad","doi":"10.1109/TAI.2025.3574292","DOIUrl":"https://doi.org/10.1109/TAI.2025.3574292","url":null,"abstract":"The field of natural language generation (NLG) has undergone remarkable expansion, largely enabled by enhanced model architectures, affordable computing, and the availability of large datasets. With NLG systems finding increasing adoption across many applications, the imperative to evaluate their performance has grown exponentially. However, relying solely on human evaluation for evaluation is nonscalable. To address this challenge, it is important to explore more scalable evaluation methodologies that can ensure the continued development and efficacy of NLG systems. Presently, only a few automated evaluation metrics are commonly utilized, with BLEU and ROUGE being the predominant choices. Yet, these metrics have faced criticism for their limited correlation with human judgment, their focus on surface-level similarity, and their tendency to overlook semantic nuances. While transformer metrics have been introduced to capture semantic similarity, our study reveals scenarios where even these metrics fail. Considering these limitations, we propose and validate a novel metric called “COSMIC,” which incorporates contradiction detection with contextual embedding similarity. To illustrate these limitations and showcase the performance of COSMIC, we conducted a case study using a fine-tuned LLAMA model to transform questions and short answers into declarative sentences. This task, despite its significance in generating natural language inference datasets, has not received widespread exploration since 2018. Results show that COSMIC can capture cases of contradiction between the reference and generated text while staying highly correlated with embeddings similarity when the reference and generated text are consistent and semantically similar. BLEU, ROUGE, and most transformer-based metrics demonstrate an inability to identify contradictions.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"332-346"},"PeriodicalIF":0.0,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1109/TAI.2025.3574299
Chuan Xue;Jianli Gao;Zhou Gu
As machine learning technologies progress and are increasingly applied to critical and sensitive fields, the reliability issues of earlier technologies are becoming more evident. For the new generation of machine learning solutions, trustworthiness frequently takes precedence over performance when evaluating their applicability for specific applications. This manuscript introduces the IT2-ENFIS neuro-fuzzy model, a robust and trustworthy single-network solution specifically designed for data regression tasks affected by substantial label noise and outliers. The primary architecture applies interval type-2 fuzzy logic and the Sugeno inference engine. A meta-heuristic gradient-based optimizer (GBO), the Huber loss function, and the Cauchy M-estimator are employed for robust learning. IT2-ENFIS demonstrates superior performance on noise-contaminated datasets and excels in real-world scenarios, with excellent generalization capability and interpretability.
随着机器学习技术的进步和越来越多地应用于关键和敏感领域,早期技术的可靠性问题变得越来越明显。对于新一代机器学习解决方案,在评估其对特定应用的适用性时,可信度通常优先于性能。本文介绍了IT2-ENFIS神经模糊模型,这是一种鲁棒且值得信赖的单网络解决方案,专为受大量标签噪声和异常值影响的数据回归任务而设计。主架构采用区间2型模糊逻辑和Sugeno推理引擎。采用基于梯度的元启发式优化器(GBO)、Huber损失函数和Cauchy m -估计器进行鲁棒学习。IT2-ENFIS在噪声污染数据集上表现优异,在现实场景中表现出色,具有出色的泛化能力和可解释性。
{"title":"IT2-ENFIS: Interval Type-2 Exclusionary Neuro-Fuzzy Inference System, an Attempt Toward Trustworthy Regression Learning","authors":"Chuan Xue;Jianli Gao;Zhou Gu","doi":"10.1109/TAI.2025.3574299","DOIUrl":"https://doi.org/10.1109/TAI.2025.3574299","url":null,"abstract":"As machine learning technologies progress and are increasingly applied to critical and sensitive fields, the reliability issues of earlier technologies are becoming more evident. For the new generation of machine learning solutions, trustworthiness frequently takes precedence over performance when evaluating their applicability for specific applications. This manuscript introduces the IT2-ENFIS neuro-fuzzy model, a robust and trustworthy single-network solution specifically designed for data regression tasks affected by substantial label noise and outliers. The primary architecture applies interval type-2 fuzzy logic and the Sugeno inference engine. A meta-heuristic gradient-based optimizer (GBO), the Huber loss function, and the Cauchy M-estimator are employed for robust learning. IT2-ENFIS demonstrates superior performance on noise-contaminated datasets and excels in real-world scenarios, with excellent generalization capability and interpretability.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"347-361"},"PeriodicalIF":0.0,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-26DOI: 10.1109/TAI.2025.3573303
Shanika Iroshi Nanayakkara;Shiva Raj Pokhrel
Quantum machine learning models, like quantum neural networks (QNN) and quantum support vector classifiers (QSVC), often struggle with overfitting, slow convergence, and suboptimal generalization across various datasets. This article explores the advantages of integrating deep unfolding techniques into quantum models and develops a framework focusing on deep unfolded variational quantum classifiers (DVQC), deep unfolded quantum neural networks (DQNN), and deep unfolded QSVC (DQSVC). Our novel unfolding transforms quantum circuit training into a sequence of learnable layers, with each layer representing an optimization step that concurrently renews both circuit parameters and QNN hyperparameters. The proposed framework significantly improves training and test accuracy by dynamically adjusting learning rate, perturbations, and other similar hyperparameters, particularly on complex datasets like genomic and breast cancer. Our evaluation and experiment show that proposed DVQC and DQNN outperform baseline VQC and QNN, achieving 90% training accuracy and up to 20% higher test accuracy on genomic and adhoc datasets. DQSVC achieves 100% accuracy on adhoc and 97% on genomic datasets, surpassing the 90% test accuracy of traditional QSVC. Our implementation details will be publicly available.
{"title":"Modeling Deep Unfolded Quantum Machine Learning Framework","authors":"Shanika Iroshi Nanayakkara;Shiva Raj Pokhrel","doi":"10.1109/TAI.2025.3573303","DOIUrl":"https://doi.org/10.1109/TAI.2025.3573303","url":null,"abstract":"Quantum machine learning models, like quantum neural networks (QNN) and quantum support vector classifiers (QSVC), often struggle with overfitting, slow convergence, and suboptimal generalization across various datasets. This article explores the advantages of integrating deep unfolding techniques into quantum models and develops a framework focusing on deep unfolded variational quantum classifiers (DVQC), deep unfolded quantum neural networks (DQNN), and deep unfolded QSVC (DQSVC). Our novel unfolding transforms quantum circuit training into a sequence of learnable layers, with each layer representing an optimization step that concurrently renews both circuit parameters and QNN hyperparameters. The proposed framework significantly improves training and test accuracy by dynamically adjusting learning rate, perturbations, and other similar hyperparameters, particularly on complex datasets like genomic and breast cancer. Our evaluation and experiment show that proposed DVQC and DQNN outperform baseline VQC and QNN, achieving 90% training accuracy and up to 20% higher test accuracy on genomic and adhoc datasets. DQSVC achieves 100% accuracy on adhoc and 97% on genomic datasets, surpassing the 90% test accuracy of traditional QSVC. Our implementation details will be publicly available.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"321-331"},"PeriodicalIF":0.0,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-26DOI: 10.1109/TAI.2025.3572849
Shivam Mishra;Amit Vishwakarma;Anil Kumar
An automated nuclei segmentation is an important technique for understanding and analyzing cellular characteristics that ease computer-aided digital pathology and are useful for disease diagnosis. However, this task is difficult because of the diversity in nuclei size, blurry boundaries, and several imaging modalities. A convolutional neural network (CNN)-based multiheaded U-Net (M-UNet) framework has been proposed to address such issues. This architecture uses filters of different kernel sizes for multiple heads to extract multiresolution features of an image. Shearlet-based unsharp masking (SBUM) method is proposed for preprocessing, which primarily emphasizes features like contours, boundaries, and minute details of the source image. In this article, a hybrid loss function is formulated, which includes intersection over union (IOU) loss and Dice loss along with binary cross entropy loss. The hybrid loss function is tried to be minimized by the optimization algorithm, and the higher metrics values during the testing phase represent better segmentation performance in the spatial domain. The proposed method yields superior segmentation images and quantitative findings as compared to the state-of-the-art nuclei segmentation techniques. The proposed technique attains IOU, F1Score, accuracy, and precision values of 0.8325, 0.9086, 0.9651, and 0.9001, respectively.
{"title":"Nuclei Segmentation Using Multiheaded U-Net and Shearlet-Based Unsharp Masking","authors":"Shivam Mishra;Amit Vishwakarma;Anil Kumar","doi":"10.1109/TAI.2025.3572849","DOIUrl":"https://doi.org/10.1109/TAI.2025.3572849","url":null,"abstract":"An automated nuclei segmentation is an important technique for understanding and analyzing cellular characteristics that ease computer-aided digital pathology and are useful for disease diagnosis. However, this task is difficult because of the diversity in nuclei size, blurry boundaries, and several imaging modalities. A convolutional neural network (CNN)-based multiheaded U-Net (M-UNet) framework has been proposed to address such issues. This architecture uses filters of different kernel sizes for multiple heads to extract multiresolution features of an image. Shearlet-based unsharp masking (SBUM) method is proposed for preprocessing, which primarily emphasizes features like contours, boundaries, and minute details of the source image. In this article, a hybrid loss function is formulated, which includes intersection over union (IOU) loss and Dice loss along with binary cross entropy loss. The hybrid loss function is tried to be minimized by the optimization algorithm, and the higher metrics values during the testing phase represent better segmentation performance in the spatial domain. The proposed method yields superior segmentation images and quantitative findings as compared to the state-of-the-art nuclei segmentation techniques. The proposed technique attains IOU, F1Score, accuracy, and precision values of 0.8325, 0.9086, 0.9651, and 0.9001, respectively.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"297-307"},"PeriodicalIF":0.0,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24DOI: 10.1109/TAI.2025.3554461
Md Abrar Jahin;Md. Akmol Masud;Md Wahiduzzaman Suva;M. F. Mridha;Nilanjan Dey
The rapid data surge from the high-luminosity Large Hadron Collider introduces critical computational challenges requiring novel approaches for efficient data processing in particle physics. Quantum machine learning, with its capability to leverage the extensive Hilbert space of quantum hardware, offers a promising solution. However, current quantum graph neural networks (GNNs) lack robustness to noise and are often constrained by fixed symmetry groups, limiting adaptability in complex particle interaction modeling. This article demonstrates that replacing the classical Lorentz group equivariant block modules in LorentzNet with a dressed quantum circuit significantly enhances performance despite using $approx 5.5$ times fewer parameters. Additionally, quantum circuits effectively replace MLPs by inherently preserving symmetries, with Lorentz symmetry integration ensuring robust handling of relativistic invariance. Our Lorentz-equivariant quantum graph neural network (Lorentz-EQGNN) achieved 74.00% test accuracy and an AUC of 87.38% on the Quark-Gluon jet tagging dataset, outperforming the classical and quantum GNNs with a reduced architecture using only 4 qubits. On the electron–photon dataset, Lorentz-EQGNN reached 67.00% test accuracy and an AUC of 68.20%, demonstrating competitive results with just 800 training samples. Evaluation of our model on generic MNIST and FashionMNIST datasets confirmed Lorentz-EQGNN’s efficiency, achieving 88.10% and 74.80% test accuracy, respectively. Ablation studies validated the impact of quantum components on performance, with notable improvements in background rejection rates over classical counterparts. These results highlight Lorentz-EQGNN’s potential for immediate applications in noise-resilient jet tagging, event classification, and broader data-scarce HEP tasks.
{"title":"Lorentz-Equivariant Quantum Graph Neural Network for High-Energy Physics","authors":"Md Abrar Jahin;Md. Akmol Masud;Md Wahiduzzaman Suva;M. F. Mridha;Nilanjan Dey","doi":"10.1109/TAI.2025.3554461","DOIUrl":"https://doi.org/10.1109/TAI.2025.3554461","url":null,"abstract":"The rapid data surge from the high-luminosity Large Hadron Collider introduces critical computational challenges requiring novel approaches for efficient data processing in particle physics. Quantum machine learning, with its capability to leverage the extensive Hilbert space of quantum hardware, offers a promising solution. However, current quantum graph neural networks (GNNs) lack robustness to noise and are often constrained by fixed symmetry groups, limiting adaptability in complex particle interaction modeling. This article demonstrates that replacing the classical Lorentz group equivariant block modules in LorentzNet with a dressed quantum circuit significantly enhances performance despite using <inline-formula><tex-math>$approx 5.5$</tex-math></inline-formula> times fewer parameters. Additionally, quantum circuits effectively replace MLPs by inherently preserving symmetries, with Lorentz symmetry integration ensuring robust handling of relativistic invariance. Our <underline>Lorentz</u>-<underline>e</u>quivariant <underline>q</u>uantum <underline>g</u>raph <underline>n</u>eural <underline>n</u>etwork (Lorentz-EQGNN) achieved 74.00% test accuracy and an AUC of 87.38% on the Quark-Gluon jet tagging dataset, outperforming the classical and quantum GNNs with a reduced architecture using only 4 qubits. On the electron–photon dataset, Lorentz-EQGNN reached 67.00% test accuracy and an AUC of 68.20%, demonstrating competitive results with just 800 training samples. Evaluation of our model on generic MNIST and FashionMNIST datasets confirmed Lorentz-EQGNN’s efficiency, achieving 88.10% and 74.80% test accuracy, respectively. Ablation studies validated the impact of quantum components on performance, with notable improvements in background rejection rates over classical counterparts. These results highlight Lorentz-EQGNN’s potential for immediate applications in noise-resilient jet tagging, event classification, and broader data-scarce HEP tasks.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3195-3206"},"PeriodicalIF":0.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-23DOI: 10.1109/TAI.2025.3572852
Bikash K. Behera;Saif Al-Kuwari;Ahmed Farouk
A brain–computer interface (BCI) system enables direct communication between the brain and external devices, offering significant potential for assistive technologies and advanced human–computer interaction. Despite progress, BCI systems face persistent challenges, including signal variability, classification inefficiency, and difficulty adapting to individual users in real time. In this study, we propose a novel hybrid quantum learning model, termed QSVM-QNN, which integrates a quantum support vector machine (QSVM) with a quantum neural network (QNN), to improve classification accuracy and robustness in EEG-based BCI tasks. Unlike existing models, QSVM-QNN combines the decision boundary capabilities of QSVM with the expressive learning power of QNN, leading to superior generalization performance. The proposed model is evaluated on two benchmark EEG datasets, achieving high accuracies of 0.990 and 0.950, outperforming both classical and standalone quantum models. To demonstrate real-world viability, we further validated the robustness of QNN, QSVM, and QSVM-QNN against six realistic quantum noise models, including bit flip and phase damping. These experiments reveal that QSVM-QNN maintains stable performance under noisy conditions, establishing its applicability for deployment in practical, noisy quantum environments. Beyond BCI, the proposed hybrid quantum architecture is generalizable to other biomedical and time-series classification tasks, offering a scalable and noise-resilient solution for next-generation neurotechnological systems.
{"title":"QSVM-QNN: Quantum Support Vector Machine Based Quantum Neural Network Learning Algorithm for Brain–Computer Interfacing Systems","authors":"Bikash K. Behera;Saif Al-Kuwari;Ahmed Farouk","doi":"10.1109/TAI.2025.3572852","DOIUrl":"https://doi.org/10.1109/TAI.2025.3572852","url":null,"abstract":"A brain–computer interface (BCI) system enables direct communication between the brain and external devices, offering significant potential for assistive technologies and advanced human–computer interaction. Despite progress, BCI systems face persistent challenges, including signal variability, classification inefficiency, and difficulty adapting to individual users in real time. In this study, we propose a novel hybrid quantum learning model, termed QSVM-QNN, which integrates a quantum support vector machine (QSVM) with a quantum neural network (QNN), to improve classification accuracy and robustness in EEG-based BCI tasks. Unlike existing models, QSVM-QNN combines the decision boundary capabilities of QSVM with the expressive learning power of QNN, leading to superior generalization performance. The proposed model is evaluated on two benchmark EEG datasets, achieving high accuracies of 0.990 and 0.950, outperforming both classical and standalone quantum models. To demonstrate real-world viability, we further validated the robustness of QNN, QSVM, and QSVM-QNN against six realistic quantum noise models, including bit flip and phase damping. These experiments reveal that QSVM-QNN maintains stable performance under noisy conditions, establishing its applicability for deployment in practical, noisy quantum environments. Beyond BCI, the proposed hybrid quantum architecture is generalizable to other biomedical and time-series classification tasks, offering a scalable and noise-resilient solution for next-generation neurotechnological systems.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"308-320"},"PeriodicalIF":0.0,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-19DOI: 10.1109/TAI.2025.3552686
Hang Wang;David J. Miller;George Kesidis
Sources of overfitting in deep neural net (DNN) classifiers include: 1) large class imbalances; 2) insufficient training set diversity; and 3) over-training. Recently, it was shown that backdoor data-poisoning also induces overfitting, with unusually large maximum classification margins (MMs) to the attacker’s target class. This is enabled by (unbounded) ReLU activation functions, which allow large signals to propagate in the DNN. Thus, an effective posttraining backdoor mitigation approach (with no knowledge of the training set and no knowledge or control of the training process) was proposed, informed by a small, clean (poisoning-free) data set and choosing saturation levels on neural activations to limit the DNN’s MMs. Here, we show that nonmalicious sources of overfitting also exhibit unusually large MMs. Thus, we propose novel posttraining MM-based regularization that substantially mitigates nonmalicious overfitting due to class imbalances and overtraining. Whereas backdoor mitigation and other adversarial learning defenses often trade off a classifier’s accuracy to achieve robustness against attacks, our approach, inspired by ideas from adversarial learning, helps the classifier’s generalization accuracy: as shown for CIFAR-10 and CIFAR-100, our approach improves both the accuracy for rare categories as well as overall. Moreover, unlike other overfitting mitigation methods, it does so with no knowledge of class imbalances, no knowledge of the training set, and without control of the training process.
{"title":"Maximum Margin-Based Activation Clipping for Posttraining Overfitting Mitigation in DNN Classifiers","authors":"Hang Wang;David J. Miller;George Kesidis","doi":"10.1109/TAI.2025.3552686","DOIUrl":"https://doi.org/10.1109/TAI.2025.3552686","url":null,"abstract":"Sources of overfitting in deep neural net (DNN) classifiers include: 1) large class imbalances; 2) insufficient training set diversity; and 3) over-training. Recently, it was shown that backdoor data-poisoning <italic>also</i> induces overfitting, with unusually large maximum classification margins (MMs) to the attacker’s target class. This is enabled by (unbounded) ReLU activation functions, which allow large signals to propagate in the DNN. Thus, an effective <italic>posttraining</i> backdoor mitigation approach (with no knowledge of the training set and no knowledge or control of the training process) was proposed, informed by a small, clean (poisoning-free) data set and choosing saturation levels on neural activations to limit the DNN’s MMs. Here, we show that nonmalicious sources of overfitting <italic>also</i> exhibit unusually large MMs. Thus, we propose novel posttraining MM-based regularization that substantially mitigates <italic>nonmalicious</i> overfitting due to class imbalances and overtraining. Whereas backdoor mitigation and other adversarial learning defenses often <italic>trade off</i> a classifier’s accuracy to achieve robustness against attacks, our approach, inspired by ideas from adversarial learning, <italic>helps</i> the classifier’s generalization accuracy: as shown for CIFAR-10 and CIFAR-100, our approach improves both the accuracy for rare categories as well as overall. Moreover, unlike other overfitting mitigation methods, it does so with no knowledge of class imbalances, no knowledge of the training set, and without control of the training process.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 10","pages":"2840-2847"},"PeriodicalIF":0.0,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145196042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-18DOI: 10.1109/TAI.2025.3552687
Junpeng Li;Shuying Huang;Changchun Hua;Yana Yang
Learning from pairwise similarity and unlabeled data (SU) is a recently emerging weakly-supervised learning method, which learns a classifier from similar data pairs (two instances belonging to the same class) and unlabeled data. However, this framework is insoluble for triplet similarities and unlabeled data. To address this limitation, this article develops a framework for learning from triplet similarities (three instances belonging to the same class) and unlabeled data points, denoted as TSU. This framework not only showcases the feasibility of constructing a TSU classifier but also serves as an inspiration to explore the broader challenge of addressing N-tuple similarities (N ≥ 2) and unlabeled data points. To tackle this more generalized problem, the present article develops an advancing weakly-supervision framework of learning from N-tuple similarities (N instances belong to the same class) and unlabeled data points, named NSU. This framework provides a solid foundation for handling diverse similarity scenarios. Based on these findings, we propose empirical risk minimization estimators for both TSU and NSU classification. The estimation error bounds are also established for the proposed methods. Finally, experiments are performed to verify the effectiveness of the proposed algorithm.
{"title":"Learning From N-Tuple Similarities and Unlabeled Data","authors":"Junpeng Li;Shuying Huang;Changchun Hua;Yana Yang","doi":"10.1109/TAI.2025.3552687","DOIUrl":"https://doi.org/10.1109/TAI.2025.3552687","url":null,"abstract":"Learning from pairwise similarity and unlabeled data (SU) is a recently emerging weakly-supervised learning method, which learns a classifier from similar data pairs (two instances belonging to the same class) and unlabeled data. However, this framework is insoluble for triplet similarities and unlabeled data. To address this limitation, this article develops a framework for learning from triplet similarities (three instances belonging to the same class) and unlabeled data points, denoted as TSU. This framework not only showcases the feasibility of constructing a TSU classifier but also serves as an inspiration to explore the broader challenge of addressing N-tuple similarities (N ≥ 2) and unlabeled data points. To tackle this more generalized problem, the present article develops an advancing weakly-supervision framework of learning from N-tuple similarities (N instances belong to the same class) and unlabeled data points, named NSU. This framework provides a solid foundation for handling diverse similarity scenarios. Based on these findings, we propose empirical risk minimization estimators for both TSU and NSU classification. The estimation error bounds are also established for the proposed methods. Finally, experiments are performed to verify the effectiveness of the proposed algorithm.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 9","pages":"2542-2551"},"PeriodicalIF":0.0,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-17DOI: 10.1109/TAI.2025.3550458
Xinge Zhao;Chien Chern Cheah
Recent advancements in learning from graph-structured data have highlighted the importance of graph convolutional networks (GCNs). Despite some research efforts on the theoretical aspects of GCNs, a gap remains in understanding their training process, especially concerning convergence analysis. This study introduces a two-stage training methodology for GCNs, incorporating both pretraining and fine-tuning phases. A two-layer GCN model is used for the convergence analysis and case studies. The convergence analysis that employs a Lyapunov-like approach is performed on the proposed learning algorithm, providing conditions to ensure the convergence of the model learning. Additionally, an automated learning rate scheduler is proposed based on the convergence conditions to prevent divergence and eliminate the need for manual tuning of the initial learning rate. The efficacy of the proposed method is demonstrated through case studies on the node classification problem. The results reveal that the proposed method outperforms gradient descent-based optimizers by achieving consistent training accuracies within a variation of 0.1% across various initial learning rates, without requiring manual tuning.
{"title":"Ensuring Reliable Learning in Graph Convolutional Networks: Convergence Analysis and Training Methodology","authors":"Xinge Zhao;Chien Chern Cheah","doi":"10.1109/TAI.2025.3550458","DOIUrl":"https://doi.org/10.1109/TAI.2025.3550458","url":null,"abstract":"Recent advancements in learning from graph-structured data have highlighted the importance of graph convolutional networks (GCNs). Despite some research efforts on the theoretical aspects of GCNs, a gap remains in understanding their training process, especially concerning convergence analysis. This study introduces a two-stage training methodology for GCNs, incorporating both pretraining and fine-tuning phases. A two-layer GCN model is used for the convergence analysis and case studies. The convergence analysis that employs a Lyapunov-like approach is performed on the proposed learning algorithm, providing conditions to ensure the convergence of the model learning. Additionally, an automated learning rate scheduler is proposed based on the convergence conditions to prevent divergence and eliminate the need for manual tuning of the initial learning rate. The efficacy of the proposed method is demonstrated through case studies on the node classification problem. The results reveal that the proposed method outperforms gradient descent-based optimizers by achieving consistent training accuracies within a variation of 0.1% across various initial learning rates, without requiring manual tuning.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 9","pages":"2510-2525"},"PeriodicalIF":0.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}