Pub Date : 2025-06-30DOI: 10.1109/TAI.2025.3584288
Ziyan Zhang;Fei Xu;Bo Jiang;Jin Tang
To alleviate the local receptive issue of graph convolutional network (GCN), transformers have been exploited to capture the long-range dependence of nodes for graph data representation and learning. However, existing graph transformers generally employ a regular self-attention module for all node-to-node message passing, which needs to learn the affinities/relationships between all node’s pairs, leading to high computational cost issue. Also, they are usually sensitive to graph noises. To overcome this issue, we propose a novel graph transformer architecture, termed anchor graph transformer (AGFormer), by leveraging an anchor graph model. To be specific, AGFormer first obtains some representative anchors and then converts node-to-node message passing into anchor-to-anchor and anchor-to-node message passing processes. Thus, AGFormer performs much more efficiently and also robustly than regular node-to-node transformers. Extensive experiments on several benchmark datasets demonstrate the benefits of the proposed AGFormer. Specifically, when the number of graph nodes reaches 15 000, AGFormer achieves a training speed that is three times faster than that of GraphTrans. Furthermore, AGFormers perform more robustly on the noised NCI109 dataset compared to GraphTrans.
{"title":"Efficient Graph Representation With Anchor-Graph Transformer","authors":"Ziyan Zhang;Fei Xu;Bo Jiang;Jin Tang","doi":"10.1109/TAI.2025.3584288","DOIUrl":"https://doi.org/10.1109/TAI.2025.3584288","url":null,"abstract":"To alleviate the local receptive issue of graph convolutional network (GCN), transformers have been exploited to capture the long-range dependence of nodes for graph data representation and learning. However, existing graph transformers generally employ a regular self-attention module for all node-to-node message passing, which needs to learn the affinities/relationships between all node’s pairs, leading to high computational cost issue. Also, they are usually sensitive to graph noises. To overcome this issue, we propose a novel graph transformer architecture, termed anchor graph transformer (AGFormer), by leveraging an anchor graph model. To be specific, AGFormer first obtains some representative anchors and then converts node-to-node message passing into anchor-to-anchor and anchor-to-node message passing processes. Thus, AGFormer performs much more efficiently and also robustly than regular node-to-node transformers. Extensive experiments on several benchmark datasets demonstrate the benefits of the proposed AGFormer. Specifically, when the number of graph nodes reaches 15 000, AGFormer achieves a training speed that is three times faster than that of GraphTrans. Furthermore, AGFormers perform more robustly on the noised NCI109 dataset compared to GraphTrans.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 2","pages":"1201-1209"},"PeriodicalIF":0.0,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146176001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-12DOI: 10.1109/TAI.2025.3577605
Tanish Singh Rajpal;Akshit Naithani
In response to the critical vulnerabilities exposed by quantum computing and AI-driven cryptanalysis in traditional encryption systems, this article introduces NeuroCrypt—a neuro-symbolic AI framework that synergizes adaptive cryptography, decentralized governance, and postquantum security. NeuroCrypt employs three AI groups: CryptAI (multialgorithm encryption), GenAI (neuro-symbolic algorithm synthesis), and TestAI (adversarial validation), to dynamically generate and deploy quantum-resistant cryptographic techniques. The framework uniquely combines five-layer encryption (randomly ordered classical and AI-generated algorithms, e.g., lattice–chaotic hybrids) with metadata-driven security, where encrypted logic is distributed via Shamir’s secret sharing (SSS) over VPNs, eliminating key-exchange dependencies. A permissioned blockchain enforces tamper-proof updates validated by TestAI consensus ($n/2 + 1$ threshold), while dynamic threshold adaptation adjusts SSS shard requirements based on real-time threat levels. Evaluations demonstrate NeuroCrypt’s superiority: 2.3$times$ higher entropy than AES-256, 94.3% shard survival under 30% compromise, and 220 ms encryption latency for 1 MB data on edge devices. The system’s lattice-based encryption (1024-dimensional) and frequent AI-driven updates resist Shor/Grover attacks, validated through simulated quantum oracles achieving $mathcal{O}(10^{38})$ operations for 256-bit keys. Compliance with GDPR, NIST PQC, and FIPS 140-2 ensures readiness for healthcare, fintech, and government applications. NeuroCrypt’s architecture—backward-compatible with legacy systems and optimized for IoT/cloud ecosystems—sets a precedent for self-evolving security, offering a 15% storage overhead trade-off for metadata-driven keyless decryption. Future work will optimize edge-device performance and integrate 6G network protocols, establishing NeuroCrypt as a foundational framework for postquantum cybersecurity.
{"title":"NeuroCrypt: A Neuro Symbolic AI Ecosystem for Advanced Cryptographic Data Security and Transmission","authors":"Tanish Singh Rajpal;Akshit Naithani","doi":"10.1109/TAI.2025.3577605","DOIUrl":"https://doi.org/10.1109/TAI.2025.3577605","url":null,"abstract":"In response to the critical vulnerabilities exposed by quantum computing and AI-driven cryptanalysis in traditional encryption systems, this article introduces <italic>NeuroCrypt</i>—a neuro-symbolic AI framework that synergizes adaptive cryptography, decentralized governance, and postquantum security. NeuroCrypt employs three AI groups: <italic>CryptAI</i> (multialgorithm encryption), <italic>GenAI</i> (neuro-symbolic algorithm synthesis), and <italic>TestAI</i> (adversarial validation), to dynamically generate and deploy quantum-resistant cryptographic techniques. The framework uniquely combines five-layer encryption (randomly ordered classical and AI-generated algorithms, e.g., lattice–chaotic hybrids) with metadata-driven security, where encrypted logic is distributed via Shamir’s secret sharing (SSS) over VPNs, eliminating key-exchange dependencies. A permissioned blockchain enforces tamper-proof updates validated by <italic>TestAI</i> consensus (<inline-formula><tex-math>$n/2 + 1$</tex-math></inline-formula> threshold), while dynamic threshold adaptation adjusts SSS shard requirements based on real-time threat levels. Evaluations demonstrate NeuroCrypt’s superiority: 2.3<inline-formula><tex-math>$times$</tex-math></inline-formula> higher entropy than AES-256, 94.3% shard survival under 30% compromise, and 220 ms encryption latency for 1 MB data on edge devices. The system’s lattice-based encryption (1024-dimensional) and frequent AI-driven updates resist Shor/Grover attacks, validated through simulated quantum oracles achieving <inline-formula><tex-math>$mathcal{O}(10^{38})$</tex-math></inline-formula> operations for 256-bit keys. Compliance with GDPR, NIST PQC, and FIPS 140-2 ensures readiness for healthcare, fintech, and government applications. NeuroCrypt’s architecture—backward-compatible with legacy systems and optimized for IoT/cloud ecosystems—sets a precedent for self-evolving security, offering a 15% storage overhead trade-off for metadata-driven keyless decryption. Future work will optimize edge-device performance and integrate 6G network protocols, establishing NeuroCrypt as a foundational framework for postquantum cybersecurity.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"512-521"},"PeriodicalIF":0.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we apply topic modeling using nonnegative matrix factorization (NMF) on the COVID-19 open research dataset (CORD-19) to uncover the underlying thematic structure and its evolution within the extensive body of COVID-19 research literature. NMF factorizes the document-term matrix into two nonnegative matrices, effectively representing the topics and their distribution across the documents. This helps us to see how strongly documents relate to topics and how topics relate to words. We describe the complete methodology, which involves a series of rigorous preprocessing steps to standardize the available text data while preserving the context of phrases and subsequently feature extraction using the term frequency-inverse document frequency (tf-idf), which assigns weights to words based on their frequency and rarity in the dataset. To ensure the robustness of our topic model, we conduct a stability analysis. This process assesses the stability scores of the NMF topic model for different numbers of topics, enabling us to select the optimal number of topics for our analysis. Through our analysis, we track the evolution of topics over time within the CORD-19 dataset. Our findings contribute to the understanding of the knowledge structure of the COVID-19 research landscape, providing a valuable resource for future research in this field.
{"title":"Exploring Topic Trends in COVID-19 Research Literature Using Nonnegative Matrix Factorization","authors":"Divya Patel;Vansh Parikh;Om Patel;Agam Shah;Bhaskar Chaudhury","doi":"10.1109/TAI.2025.3579459","DOIUrl":"https://doi.org/10.1109/TAI.2025.3579459","url":null,"abstract":"In this work, we apply topic modeling using nonnegative matrix factorization (NMF) on the COVID-19 open research dataset (CORD-19) to uncover the underlying thematic structure and its evolution within the extensive body of COVID-19 research literature. NMF factorizes the document-term matrix into two nonnegative matrices, effectively representing the topics and their distribution across the documents. This helps us to see how strongly documents relate to topics and how topics relate to words. We describe the complete methodology, which involves a series of rigorous preprocessing steps to standardize the available text data while preserving the context of phrases and subsequently feature extraction using the term frequency-inverse document frequency (tf-idf), which assigns weights to words based on their frequency and rarity in the dataset. To ensure the robustness of our topic model, we conduct a stability analysis. This process assesses the stability scores of the NMF topic model for different numbers of topics, enabling us to select the optimal number of topics for our analysis. Through our analysis, we track the evolution of topics over time within the CORD-19 dataset. Our findings contribute to the understanding of the knowledge structure of the COVID-19 research landscape, providing a valuable resource for future research in this field.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"586-595"},"PeriodicalIF":0.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-10DOI: 10.1109/TAI.2025.3578007
Zhenqin Chen;Yiwei Lin;Qiong Luo;Jinshan Xu
Fetal electrocardiography (FECG) is a crucial tool for assessing fetal cardiac health and pregnancy status. Direct invasive FECG provides reliable fetal heart rate signals, but poses risks and is limited to use during labor. Conversely, non-invasive monitoring of the fetal heart is possible via abdominal electrocardiography (AECG), which detects fetal heart waveforms using electrodes positioned on the mother’s abdomen. However, this method is often subject to interference from maternal cardiac activity and other external sources. To address this issue, we propose a novel diffusion method, DIFF-FECG, aimed at improving the extraction of FECG signals from AECG recordings. This method leverages a condition-driven diffusion process to learn specific conditional probability distributions, enabling the effective separation of high-quality FECG signals from noisy AECG data. By adaptively managing the inherent non-Gaussian noise characteristics of MECG within the AECG, DIFF-FECG achieves more effective FECG reconstruction. Furthermore, the quality of the generated FECG signals is also enhanced by adding reconstruction loss and multiple reconstructions. Experimental results on two public databases demonstrate that the proposed DIFF-FECG method yields satisfactory results, with an average Pearson correlation coefficient of 0.922 for the estimated FECG. These findings underscore the potential of diffusion probabilistic models in advancing FECG signal extraction techniques, thereby contributing to improved fetal health monitoring.
{"title":"DIFF-FECG: A Conditional Diffusion-Based Method for Fetal ECG Extraction From Abdominal ECG","authors":"Zhenqin Chen;Yiwei Lin;Qiong Luo;Jinshan Xu","doi":"10.1109/TAI.2025.3578007","DOIUrl":"https://doi.org/10.1109/TAI.2025.3578007","url":null,"abstract":"Fetal electrocardiography (FECG) is a crucial tool for assessing fetal cardiac health and pregnancy status. Direct invasive FECG provides reliable fetal heart rate signals, but poses risks and is limited to use during labor. Conversely, non-invasive monitoring of the fetal heart is possible via abdominal electrocardiography (AECG), which detects fetal heart waveforms using electrodes positioned on the mother’s abdomen. However, this method is often subject to interference from maternal cardiac activity and other external sources. To address this issue, we propose a novel diffusion method, DIFF-FECG, aimed at improving the extraction of FECG signals from AECG recordings. This method leverages a condition-driven diffusion process to learn specific conditional probability distributions, enabling the effective separation of high-quality FECG signals from noisy AECG data. By adaptively managing the inherent non-Gaussian noise characteristics of MECG within the AECG, DIFF-FECG achieves more effective FECG reconstruction. Furthermore, the quality of the generated FECG signals is also enhanced by adding reconstruction loss and multiple reconstructions. Experimental results on two public databases demonstrate that the proposed DIFF-FECG method yields satisfactory results, with an average Pearson correlation coefficient of 0.922 for the estimated FECG. These findings underscore the potential of diffusion probabilistic models in advancing FECG signal extraction techniques, thereby contributing to improved fetal health monitoring.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"534-546"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-10DOI: 10.1109/TAI.2025.3578305
Jobin Wilson;Santanu Chaudhury;Brejesh Lall
Ensemble learning is one of the most successful approaches for concept-drift adaptation due to its versatility and high predictive performance. However, a practical challenge in using ensembles for high-speed data stream mining is the associated large computational cost. In this article, we introduce a computationally efficient heterogeneous ensemble classifier named successive halving ensemble (SUHEN) which adapts to concept-drift using online ensemble selection. We model ensemble selection as a fixed budget best arm identification bandit problem and solve it using successive halving algorithm (SHA). SUHEN identifies a single best performing member for a stream segment and utilizes it for training and prediction until a drift is detected. Upon detecting drift, SHA identifies the new best performer for the segment. As stream characteristics evolve, manually choosing a fixed SHA budget would be challenging. To this end, we extend SUHEN by posing budget selection as a hyperparameter tuning problem and solve it using meta-learning. Our evaluation on 20 benchmark datasets reveal that SUHEN provides accuracy statistically at par with state-of-the-art ensemble algorithms, while providing significant computational resource savings. This makes our proposal attractive for high-speed stream mining problems in resource-constrained settings.
{"title":"Successive Halving Based Online Ensemble Selection for Concept-Drift Adaptation","authors":"Jobin Wilson;Santanu Chaudhury;Brejesh Lall","doi":"10.1109/TAI.2025.3578305","DOIUrl":"https://doi.org/10.1109/TAI.2025.3578305","url":null,"abstract":"Ensemble learning is one of the most successful approaches for concept-drift adaptation due to its versatility and high predictive performance. However, a practical challenge in using ensembles for high-speed data stream mining is the associated large computational cost. In this article, we introduce a computationally efficient heterogeneous ensemble classifier named successive halving ensemble (SUHEN) which adapts to concept-drift using online ensemble selection. We model ensemble selection as a fixed budget best arm identification bandit problem and solve it using successive halving algorithm (SHA). SUHEN identifies a single best performing member for a stream segment and utilizes it for training and prediction until a drift is detected. Upon detecting drift, SHA identifies the new best performer for the segment. As stream characteristics evolve, manually choosing a fixed SHA budget would be challenging. To this end, we extend SUHEN by posing budget selection as a hyperparameter tuning problem and solve it using meta-learning. Our evaluation on 20 benchmark datasets reveal that SUHEN provides accuracy statistically at par with state-of-the-art ensemble algorithms, while providing significant computational resource savings. This makes our proposal attractive for high-speed stream mining problems in resource-constrained settings.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"547-561"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-10DOI: 10.1109/TAI.2025.3578011
Zhimin Zhou;Lin Zhao
In this letter, a predefined-time adaptive fuzzy trajectory tracking control based on an event-triggered quantization framework is proposed for a quadrotor with inertial uncertainty, full-state constraints, and actuator saturation. First, a double-threshold event-triggered quantization mechanism is proposed to adaptively adjust the discretization degree of the control signals, reducing the communication burden while balancing the control accuracy. Subsequently, the computational complexity and filter error problems are solved by constructing the command filter and filter error compensation mechanism. The unknown nonlinear dynamics of the quadrotor are handled through the approximation capability of an adaptive fuzzy logic system. In addition, an auxiliary signal and a smooth approximation function are combined to cope with actuator saturation. Using Lyapunov theory, the predefined-time stability of the system under full-state constraints is proven. Finally, the validity and superiority of the proposed algorithm have been verified through the simulation example.
{"title":"Event-Triggered Quantization-Based Predefined-Time Adaptive Fuzzy Control for Quadrotor Trajectory Tracking","authors":"Zhimin Zhou;Lin Zhao","doi":"10.1109/TAI.2025.3578011","DOIUrl":"https://doi.org/10.1109/TAI.2025.3578011","url":null,"abstract":"In this letter, a predefined-time adaptive fuzzy trajectory tracking control based on an event-triggered quantization framework is proposed for a quadrotor with inertial uncertainty, full-state constraints, and actuator saturation. First, a double-threshold event-triggered quantization mechanism is proposed to adaptively adjust the discretization degree of the control signals, reducing the communication burden while balancing the control accuracy. Subsequently, the computational complexity and filter error problems are solved by constructing the command filter and filter error compensation mechanism. The unknown nonlinear dynamics of the quadrotor are handled through the approximation capability of an adaptive fuzzy logic system. In addition, an auxiliary signal and a smooth approximation function are combined to cope with actuator saturation. Using Lyapunov theory, the predefined-time stability of the system under full-state constraints is proven. Finally, the validity and superiority of the proposed algorithm have been verified through the simulation example.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"596-605"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-10DOI: 10.1109/TAI.2025.3578004
Kuan Huang;Meng Xu;Yingfeng Wang
The success of adversarial attack methods suggests a small input change may mislead a trained machine-learning model. For example, changing one pixel of an image may cause the trained model to misclassify this updated image. Uncertainty quantification is crucial for detecting misclassifications; hence, precise uncertainty quantification, meaning uncertainty estimates that closely align with prediction correctness, is essential. We assume that misclassified samples should exhibit high uncertainty while correctly classified samples should exhibit low uncertainty. To evaluate the performance of uncertainty quantification, we investigate the task of uncertainty-based misclassification detection under adversarial attack conditions. Our findings suggest that existing uncertainty quantification methods are unable to accurately identify misclassified predictions resulting from adversarial attacks due to training issues. We propose a simple adversarial training strategy for improving uncertainty quantification. Our results show that adversarial training improves the reliability of uncertainty quantification by better aligning uncertainty with prediction correctness. Specifically, we observe consistent improvements in misclassification detection performance, measured by AUC-ROC and AUC-PR, across clean and adversarial samples.
{"title":"Using Adversarial Training to Improve Uncertainty Quantification","authors":"Kuan Huang;Meng Xu;Yingfeng Wang","doi":"10.1109/TAI.2025.3578004","DOIUrl":"https://doi.org/10.1109/TAI.2025.3578004","url":null,"abstract":"The success of adversarial attack methods suggests a small input change may mislead a trained machine-learning model. For example, changing one pixel of an image may cause the trained model to misclassify this updated image. Uncertainty quantification is crucial for detecting misclassifications; hence, precise uncertainty quantification, meaning uncertainty estimates that closely align with prediction correctness, is essential. We assume that misclassified samples should exhibit high uncertainty while correctly classified samples should exhibit low uncertainty. To evaluate the performance of uncertainty quantification, we investigate the task of uncertainty-based misclassification detection under adversarial attack conditions. Our findings suggest that existing uncertainty quantification methods are unable to accurately identify misclassified predictions resulting from adversarial attacks due to training issues. We propose a simple adversarial training strategy for improving uncertainty quantification. Our results show that adversarial training improves the reliability of uncertainty quantification by better aligning uncertainty with prediction correctness. Specifically, we observe consistent improvements in misclassification detection performance, measured by AUC-ROC and AUC-PR, across clean and adversarial samples.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"522-533"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-06DOI: 10.1109/TAI.2025.3577601
Xinran Wu;Kun Yue;Liang Duan;Hongbo Xie;Huashuai Liu
Intelligent systems could be increasingly powerful by applying probabilistic inferences over the dependence relations among observed and latent variables, which could be represented by the Bayesian network (BN) with multiple latent (BNML) variables. As the critical task in BNML construction, parameter learning is fulfilled by extending the classic EM algorithm in most of the existing methods, but the time complexity is exponential to the number of latent variables. To address this issue, we first propose to reduce the number of latent variables by training a vector quantized variational autoencoder (VQVAE). Specifically, we incorporate the initial probability parameters in conditional probability tables (CPTs) of BNML as the regularization term of VQVAE to guarantee that the probability parameters after reduction are similar (i.e., consistent) to those before reduction. Then, we incorporate efficient gradient calculations to augment the EM algorithm and propose the efficient algorithm for parameter learning of the BN with reduced latent (BNRL) variables. Finally, we present the efficient method for probabilistic inferences in BNRL by encoding evidence variable, decoding query variables and updating query variable values via backpropagation. Experimental results on real and synthetic BNs demonstrate that our method outperforms the state-of-the-art methods on efficiency and effectiveness.
{"title":"Deep Variational Autoencoder-Based Parameter Learning of Bayesian Network With Multiple Latent Variables","authors":"Xinran Wu;Kun Yue;Liang Duan;Hongbo Xie;Huashuai Liu","doi":"10.1109/TAI.2025.3577601","DOIUrl":"https://doi.org/10.1109/TAI.2025.3577601","url":null,"abstract":"Intelligent systems could be increasingly powerful by applying probabilistic inferences over the dependence relations among observed and latent variables, which could be represented by the Bayesian network (BN) with multiple latent (BNML) variables. As the critical task in BNML construction, parameter learning is fulfilled by extending the classic EM algorithm in most of the existing methods, but the time complexity is exponential to the number of latent variables. To address this issue, we first propose to reduce the number of latent variables by training a vector quantized variational autoencoder (VQVAE). Specifically, we incorporate the initial probability parameters in conditional probability tables (CPTs) of BNML as the regularization term of VQVAE to guarantee that the probability parameters after reduction are similar (i.e., consistent) to those before reduction. Then, we incorporate efficient gradient calculations to augment the EM algorithm and propose the efficient algorithm for parameter learning of the BN with reduced latent (BNRL) variables. Finally, we present the efficient method for probabilistic inferences in BNRL by encoding evidence variable, decoding query variables and updating query variable values via backpropagation. Experimental results on real and synthetic BNs demonstrate that our method outperforms the state-of-the-art methods on efficiency and effectiveness.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"497-511"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-06DOI: 10.1109/TAI.2025.3577119
Kai Zhong;Hengchang Zhu;Xiaoming Zhang;Darong Huang;Min Han
Few-shot diagnosis has received extensive attention recently. Existing methods rarely consider the consistency within and between heterogeneous data, leading to suboptimal diagnosis performance. To address this issue, a contrastive learning based collaborative modeling for few-shot diagnosis is proposed. First of all, a heterogeneous data enhancement workflows with distribution consistency assessment is designed to acquire sufficient industrial process information, which can also mitigate the inconsistency between enhanced data and original data. Following this, convolutional networks with customized structures are used to extract the multimodal features from heterogeneous signals. After that, the collaborative modeling and diagnosis module is devised through the joint optimization of contrastive loss and cross entropy loss, which can shorten the distance of similar samples in feature space and retain cross structure consistency. Finally, the effectiveness and superiority of the proposed method are substantiated through simulated and the real world cases.
{"title":"Contrastive Learning Based Collaborative Modeling of Heterogeneous Data for Few-Shot Fault Diagnosis","authors":"Kai Zhong;Hengchang Zhu;Xiaoming Zhang;Darong Huang;Min Han","doi":"10.1109/TAI.2025.3577119","DOIUrl":"https://doi.org/10.1109/TAI.2025.3577119","url":null,"abstract":"Few-shot diagnosis has received extensive attention recently. Existing methods rarely consider the consistency within and between heterogeneous data, leading to suboptimal diagnosis performance. To address this issue, a contrastive learning based collaborative modeling for few-shot diagnosis is proposed. First of all, a heterogeneous data enhancement workflows with distribution consistency assessment is designed to acquire sufficient industrial process information, which can also mitigate the inconsistency between enhanced data and original data. Following this, convolutional networks with customized structures are used to extract the multimodal features from heterogeneous signals. After that, the collaborative modeling and diagnosis module is devised through the joint optimization of contrastive loss and cross entropy loss, which can shorten the distance of similar samples in feature space and retain cross structure consistency. Finally, the effectiveness and superiority of the proposed method are substantiated through simulated and the real world cases.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"486-496"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-04DOI: 10.1109/TAI.2025.3576087
Yang Yang;Chao Wang;Lei Gong;Min Wu;Zhenghua Chen;Xuehai Zhou
Knowledge distillation has become increasingly popular for training compact neural network models that can achieve comparable performance to larger models. In order to improve the effectiveness of knowledge distillation, enhancing the quality of the teacher knowledge is a crucial aspect to consider. While existing efforts have predominantly focused on optimizing the structure of teacher models and refining training procedures, we argue that there is untapped potential in further enhancing knowledge distillation through the augmentation of the teacher knowledge itself. In this article, we introduce FG-KD, a novel forward gradient-based framework specifically designed for augmenting teacher knowledge in knowledge distillation. FG-KD comprises two fundamental components: a feature reconstructor and a relation-aware enhancer. Both components employ a forward gradient-based approach to unlock the latent potential for enhancing teachers’ knowledge, thereby providing an enriched foundation for knowledge distillation. The feature reconstructor operates at the feature level, enabling the optimization of the teacher knowledge by enhancing the encoding of high-dimensional spaces. On the other hand, the relation-aware enhancer operates at the logit level, with a focus on identifying and reinforcing the interclass and intraclass relationships within the teacher knowledge. Through extensive experiments conducted on image recognition tasks, we demonstrate the effectiveness of FG-KD in improving the performance of various knowledge distillation techniques, regardless of the specific teacher–student model combinations.
{"title":"FG-KD: A Novel Forward Gradient-Based Framework for Teacher Knowledge Augmentation","authors":"Yang Yang;Chao Wang;Lei Gong;Min Wu;Zhenghua Chen;Xuehai Zhou","doi":"10.1109/TAI.2025.3576087","DOIUrl":"https://doi.org/10.1109/TAI.2025.3576087","url":null,"abstract":"Knowledge distillation has become increasingly popular for training compact neural network models that can achieve comparable performance to larger models. In order to improve the effectiveness of knowledge distillation, enhancing the quality of the teacher knowledge is a crucial aspect to consider. While existing efforts have predominantly focused on optimizing the structure of teacher models and refining training procedures, we argue that there is untapped potential in further enhancing knowledge distillation through the augmentation of the teacher knowledge itself. In this article, we introduce FG-KD, a novel forward gradient-based framework specifically designed for augmenting teacher knowledge in knowledge distillation. FG-KD comprises two fundamental components: a feature reconstructor and a relation-aware enhancer. Both components employ a forward gradient-based approach to unlock the latent potential for enhancing teachers’ knowledge, thereby providing an enriched foundation for knowledge distillation. The feature reconstructor operates at the feature level, enabling the optimization of the teacher knowledge by enhancing the encoding of high-dimensional spaces. On the other hand, the relation-aware enhancer operates at the logit level, with a focus on identifying and reinforcing the interclass and intraclass relationships within the teacher knowledge. Through extensive experiments conducted on image recognition tasks, we demonstrate the effectiveness of FG-KD in improving the performance of various knowledge distillation techniques, regardless of the specific teacher–student model combinations.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"439-454"},"PeriodicalIF":0.0,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}