Pub Date : 2025-06-10DOI: 10.1109/TAI.2025.3578007
Zhenqin Chen;Yiwei Lin;Qiong Luo;Jinshan Xu
Fetal electrocardiography (FECG) is a crucial tool for assessing fetal cardiac health and pregnancy status. Direct invasive FECG provides reliable fetal heart rate signals, but poses risks and is limited to use during labor. Conversely, non-invasive monitoring of the fetal heart is possible via abdominal electrocardiography (AECG), which detects fetal heart waveforms using electrodes positioned on the mother’s abdomen. However, this method is often subject to interference from maternal cardiac activity and other external sources. To address this issue, we propose a novel diffusion method, DIFF-FECG, aimed at improving the extraction of FECG signals from AECG recordings. This method leverages a condition-driven diffusion process to learn specific conditional probability distributions, enabling the effective separation of high-quality FECG signals from noisy AECG data. By adaptively managing the inherent non-Gaussian noise characteristics of MECG within the AECG, DIFF-FECG achieves more effective FECG reconstruction. Furthermore, the quality of the generated FECG signals is also enhanced by adding reconstruction loss and multiple reconstructions. Experimental results on two public databases demonstrate that the proposed DIFF-FECG method yields satisfactory results, with an average Pearson correlation coefficient of 0.922 for the estimated FECG. These findings underscore the potential of diffusion probabilistic models in advancing FECG signal extraction techniques, thereby contributing to improved fetal health monitoring.
{"title":"DIFF-FECG: A Conditional Diffusion-Based Method for Fetal ECG Extraction From Abdominal ECG","authors":"Zhenqin Chen;Yiwei Lin;Qiong Luo;Jinshan Xu","doi":"10.1109/TAI.2025.3578007","DOIUrl":"https://doi.org/10.1109/TAI.2025.3578007","url":null,"abstract":"Fetal electrocardiography (FECG) is a crucial tool for assessing fetal cardiac health and pregnancy status. Direct invasive FECG provides reliable fetal heart rate signals, but poses risks and is limited to use during labor. Conversely, non-invasive monitoring of the fetal heart is possible via abdominal electrocardiography (AECG), which detects fetal heart waveforms using electrodes positioned on the mother’s abdomen. However, this method is often subject to interference from maternal cardiac activity and other external sources. To address this issue, we propose a novel diffusion method, DIFF-FECG, aimed at improving the extraction of FECG signals from AECG recordings. This method leverages a condition-driven diffusion process to learn specific conditional probability distributions, enabling the effective separation of high-quality FECG signals from noisy AECG data. By adaptively managing the inherent non-Gaussian noise characteristics of MECG within the AECG, DIFF-FECG achieves more effective FECG reconstruction. Furthermore, the quality of the generated FECG signals is also enhanced by adding reconstruction loss and multiple reconstructions. Experimental results on two public databases demonstrate that the proposed DIFF-FECG method yields satisfactory results, with an average Pearson correlation coefficient of 0.922 for the estimated FECG. These findings underscore the potential of diffusion probabilistic models in advancing FECG signal extraction techniques, thereby contributing to improved fetal health monitoring.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"534-546"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-10DOI: 10.1109/TAI.2025.3578305
Jobin Wilson;Santanu Chaudhury;Brejesh Lall
Ensemble learning is one of the most successful approaches for concept-drift adaptation due to its versatility and high predictive performance. However, a practical challenge in using ensembles for high-speed data stream mining is the associated large computational cost. In this article, we introduce a computationally efficient heterogeneous ensemble classifier named successive halving ensemble (SUHEN) which adapts to concept-drift using online ensemble selection. We model ensemble selection as a fixed budget best arm identification bandit problem and solve it using successive halving algorithm (SHA). SUHEN identifies a single best performing member for a stream segment and utilizes it for training and prediction until a drift is detected. Upon detecting drift, SHA identifies the new best performer for the segment. As stream characteristics evolve, manually choosing a fixed SHA budget would be challenging. To this end, we extend SUHEN by posing budget selection as a hyperparameter tuning problem and solve it using meta-learning. Our evaluation on 20 benchmark datasets reveal that SUHEN provides accuracy statistically at par with state-of-the-art ensemble algorithms, while providing significant computational resource savings. This makes our proposal attractive for high-speed stream mining problems in resource-constrained settings.
{"title":"Successive Halving Based Online Ensemble Selection for Concept-Drift Adaptation","authors":"Jobin Wilson;Santanu Chaudhury;Brejesh Lall","doi":"10.1109/TAI.2025.3578305","DOIUrl":"https://doi.org/10.1109/TAI.2025.3578305","url":null,"abstract":"Ensemble learning is one of the most successful approaches for concept-drift adaptation due to its versatility and high predictive performance. However, a practical challenge in using ensembles for high-speed data stream mining is the associated large computational cost. In this article, we introduce a computationally efficient heterogeneous ensemble classifier named successive halving ensemble (SUHEN) which adapts to concept-drift using online ensemble selection. We model ensemble selection as a fixed budget best arm identification bandit problem and solve it using successive halving algorithm (SHA). SUHEN identifies a single best performing member for a stream segment and utilizes it for training and prediction until a drift is detected. Upon detecting drift, SHA identifies the new best performer for the segment. As stream characteristics evolve, manually choosing a fixed SHA budget would be challenging. To this end, we extend SUHEN by posing budget selection as a hyperparameter tuning problem and solve it using meta-learning. Our evaluation on 20 benchmark datasets reveal that SUHEN provides accuracy statistically at par with state-of-the-art ensemble algorithms, while providing significant computational resource savings. This makes our proposal attractive for high-speed stream mining problems in resource-constrained settings.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"547-561"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-10DOI: 10.1109/TAI.2025.3578011
Zhimin Zhou;Lin Zhao
In this letter, a predefined-time adaptive fuzzy trajectory tracking control based on an event-triggered quantization framework is proposed for a quadrotor with inertial uncertainty, full-state constraints, and actuator saturation. First, a double-threshold event-triggered quantization mechanism is proposed to adaptively adjust the discretization degree of the control signals, reducing the communication burden while balancing the control accuracy. Subsequently, the computational complexity and filter error problems are solved by constructing the command filter and filter error compensation mechanism. The unknown nonlinear dynamics of the quadrotor are handled through the approximation capability of an adaptive fuzzy logic system. In addition, an auxiliary signal and a smooth approximation function are combined to cope with actuator saturation. Using Lyapunov theory, the predefined-time stability of the system under full-state constraints is proven. Finally, the validity and superiority of the proposed algorithm have been verified through the simulation example.
{"title":"Event-Triggered Quantization-Based Predefined-Time Adaptive Fuzzy Control for Quadrotor Trajectory Tracking","authors":"Zhimin Zhou;Lin Zhao","doi":"10.1109/TAI.2025.3578011","DOIUrl":"https://doi.org/10.1109/TAI.2025.3578011","url":null,"abstract":"In this letter, a predefined-time adaptive fuzzy trajectory tracking control based on an event-triggered quantization framework is proposed for a quadrotor with inertial uncertainty, full-state constraints, and actuator saturation. First, a double-threshold event-triggered quantization mechanism is proposed to adaptively adjust the discretization degree of the control signals, reducing the communication burden while balancing the control accuracy. Subsequently, the computational complexity and filter error problems are solved by constructing the command filter and filter error compensation mechanism. The unknown nonlinear dynamics of the quadrotor are handled through the approximation capability of an adaptive fuzzy logic system. In addition, an auxiliary signal and a smooth approximation function are combined to cope with actuator saturation. Using Lyapunov theory, the predefined-time stability of the system under full-state constraints is proven. Finally, the validity and superiority of the proposed algorithm have been verified through the simulation example.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"596-605"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-10DOI: 10.1109/TAI.2025.3578004
Kuan Huang;Meng Xu;Yingfeng Wang
The success of adversarial attack methods suggests a small input change may mislead a trained machine-learning model. For example, changing one pixel of an image may cause the trained model to misclassify this updated image. Uncertainty quantification is crucial for detecting misclassifications; hence, precise uncertainty quantification, meaning uncertainty estimates that closely align with prediction correctness, is essential. We assume that misclassified samples should exhibit high uncertainty while correctly classified samples should exhibit low uncertainty. To evaluate the performance of uncertainty quantification, we investigate the task of uncertainty-based misclassification detection under adversarial attack conditions. Our findings suggest that existing uncertainty quantification methods are unable to accurately identify misclassified predictions resulting from adversarial attacks due to training issues. We propose a simple adversarial training strategy for improving uncertainty quantification. Our results show that adversarial training improves the reliability of uncertainty quantification by better aligning uncertainty with prediction correctness. Specifically, we observe consistent improvements in misclassification detection performance, measured by AUC-ROC and AUC-PR, across clean and adversarial samples.
{"title":"Using Adversarial Training to Improve Uncertainty Quantification","authors":"Kuan Huang;Meng Xu;Yingfeng Wang","doi":"10.1109/TAI.2025.3578004","DOIUrl":"https://doi.org/10.1109/TAI.2025.3578004","url":null,"abstract":"The success of adversarial attack methods suggests a small input change may mislead a trained machine-learning model. For example, changing one pixel of an image may cause the trained model to misclassify this updated image. Uncertainty quantification is crucial for detecting misclassifications; hence, precise uncertainty quantification, meaning uncertainty estimates that closely align with prediction correctness, is essential. We assume that misclassified samples should exhibit high uncertainty while correctly classified samples should exhibit low uncertainty. To evaluate the performance of uncertainty quantification, we investigate the task of uncertainty-based misclassification detection under adversarial attack conditions. Our findings suggest that existing uncertainty quantification methods are unable to accurately identify misclassified predictions resulting from adversarial attacks due to training issues. We propose a simple adversarial training strategy for improving uncertainty quantification. Our results show that adversarial training improves the reliability of uncertainty quantification by better aligning uncertainty with prediction correctness. Specifically, we observe consistent improvements in misclassification detection performance, measured by AUC-ROC and AUC-PR, across clean and adversarial samples.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"522-533"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-06DOI: 10.1109/TAI.2025.3577601
Xinran Wu;Kun Yue;Liang Duan;Hongbo Xie;Huashuai Liu
Intelligent systems could be increasingly powerful by applying probabilistic inferences over the dependence relations among observed and latent variables, which could be represented by the Bayesian network (BN) with multiple latent (BNML) variables. As the critical task in BNML construction, parameter learning is fulfilled by extending the classic EM algorithm in most of the existing methods, but the time complexity is exponential to the number of latent variables. To address this issue, we first propose to reduce the number of latent variables by training a vector quantized variational autoencoder (VQVAE). Specifically, we incorporate the initial probability parameters in conditional probability tables (CPTs) of BNML as the regularization term of VQVAE to guarantee that the probability parameters after reduction are similar (i.e., consistent) to those before reduction. Then, we incorporate efficient gradient calculations to augment the EM algorithm and propose the efficient algorithm for parameter learning of the BN with reduced latent (BNRL) variables. Finally, we present the efficient method for probabilistic inferences in BNRL by encoding evidence variable, decoding query variables and updating query variable values via backpropagation. Experimental results on real and synthetic BNs demonstrate that our method outperforms the state-of-the-art methods on efficiency and effectiveness.
{"title":"Deep Variational Autoencoder-Based Parameter Learning of Bayesian Network With Multiple Latent Variables","authors":"Xinran Wu;Kun Yue;Liang Duan;Hongbo Xie;Huashuai Liu","doi":"10.1109/TAI.2025.3577601","DOIUrl":"https://doi.org/10.1109/TAI.2025.3577601","url":null,"abstract":"Intelligent systems could be increasingly powerful by applying probabilistic inferences over the dependence relations among observed and latent variables, which could be represented by the Bayesian network (BN) with multiple latent (BNML) variables. As the critical task in BNML construction, parameter learning is fulfilled by extending the classic EM algorithm in most of the existing methods, but the time complexity is exponential to the number of latent variables. To address this issue, we first propose to reduce the number of latent variables by training a vector quantized variational autoencoder (VQVAE). Specifically, we incorporate the initial probability parameters in conditional probability tables (CPTs) of BNML as the regularization term of VQVAE to guarantee that the probability parameters after reduction are similar (i.e., consistent) to those before reduction. Then, we incorporate efficient gradient calculations to augment the EM algorithm and propose the efficient algorithm for parameter learning of the BN with reduced latent (BNRL) variables. Finally, we present the efficient method for probabilistic inferences in BNRL by encoding evidence variable, decoding query variables and updating query variable values via backpropagation. Experimental results on real and synthetic BNs demonstrate that our method outperforms the state-of-the-art methods on efficiency and effectiveness.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"497-511"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-06DOI: 10.1109/TAI.2025.3577119
Kai Zhong;Hengchang Zhu;Xiaoming Zhang;Darong Huang;Min Han
Few-shot diagnosis has received extensive attention recently. Existing methods rarely consider the consistency within and between heterogeneous data, leading to suboptimal diagnosis performance. To address this issue, a contrastive learning based collaborative modeling for few-shot diagnosis is proposed. First of all, a heterogeneous data enhancement workflows with distribution consistency assessment is designed to acquire sufficient industrial process information, which can also mitigate the inconsistency between enhanced data and original data. Following this, convolutional networks with customized structures are used to extract the multimodal features from heterogeneous signals. After that, the collaborative modeling and diagnosis module is devised through the joint optimization of contrastive loss and cross entropy loss, which can shorten the distance of similar samples in feature space and retain cross structure consistency. Finally, the effectiveness and superiority of the proposed method are substantiated through simulated and the real world cases.
{"title":"Contrastive Learning Based Collaborative Modeling of Heterogeneous Data for Few-Shot Fault Diagnosis","authors":"Kai Zhong;Hengchang Zhu;Xiaoming Zhang;Darong Huang;Min Han","doi":"10.1109/TAI.2025.3577119","DOIUrl":"https://doi.org/10.1109/TAI.2025.3577119","url":null,"abstract":"Few-shot diagnosis has received extensive attention recently. Existing methods rarely consider the consistency within and between heterogeneous data, leading to suboptimal diagnosis performance. To address this issue, a contrastive learning based collaborative modeling for few-shot diagnosis is proposed. First of all, a heterogeneous data enhancement workflows with distribution consistency assessment is designed to acquire sufficient industrial process information, which can also mitigate the inconsistency between enhanced data and original data. Following this, convolutional networks with customized structures are used to extract the multimodal features from heterogeneous signals. After that, the collaborative modeling and diagnosis module is devised through the joint optimization of contrastive loss and cross entropy loss, which can shorten the distance of similar samples in feature space and retain cross structure consistency. Finally, the effectiveness and superiority of the proposed method are substantiated through simulated and the real world cases.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"486-496"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-04DOI: 10.1109/TAI.2025.3576087
Yang Yang;Chao Wang;Lei Gong;Min Wu;Zhenghua Chen;Xuehai Zhou
Knowledge distillation has become increasingly popular for training compact neural network models that can achieve comparable performance to larger models. In order to improve the effectiveness of knowledge distillation, enhancing the quality of the teacher knowledge is a crucial aspect to consider. While existing efforts have predominantly focused on optimizing the structure of teacher models and refining training procedures, we argue that there is untapped potential in further enhancing knowledge distillation through the augmentation of the teacher knowledge itself. In this article, we introduce FG-KD, a novel forward gradient-based framework specifically designed for augmenting teacher knowledge in knowledge distillation. FG-KD comprises two fundamental components: a feature reconstructor and a relation-aware enhancer. Both components employ a forward gradient-based approach to unlock the latent potential for enhancing teachers’ knowledge, thereby providing an enriched foundation for knowledge distillation. The feature reconstructor operates at the feature level, enabling the optimization of the teacher knowledge by enhancing the encoding of high-dimensional spaces. On the other hand, the relation-aware enhancer operates at the logit level, with a focus on identifying and reinforcing the interclass and intraclass relationships within the teacher knowledge. Through extensive experiments conducted on image recognition tasks, we demonstrate the effectiveness of FG-KD in improving the performance of various knowledge distillation techniques, regardless of the specific teacher–student model combinations.
{"title":"FG-KD: A Novel Forward Gradient-Based Framework for Teacher Knowledge Augmentation","authors":"Yang Yang;Chao Wang;Lei Gong;Min Wu;Zhenghua Chen;Xuehai Zhou","doi":"10.1109/TAI.2025.3576087","DOIUrl":"https://doi.org/10.1109/TAI.2025.3576087","url":null,"abstract":"Knowledge distillation has become increasingly popular for training compact neural network models that can achieve comparable performance to larger models. In order to improve the effectiveness of knowledge distillation, enhancing the quality of the teacher knowledge is a crucial aspect to consider. While existing efforts have predominantly focused on optimizing the structure of teacher models and refining training procedures, we argue that there is untapped potential in further enhancing knowledge distillation through the augmentation of the teacher knowledge itself. In this article, we introduce FG-KD, a novel forward gradient-based framework specifically designed for augmenting teacher knowledge in knowledge distillation. FG-KD comprises two fundamental components: a feature reconstructor and a relation-aware enhancer. Both components employ a forward gradient-based approach to unlock the latent potential for enhancing teachers’ knowledge, thereby providing an enriched foundation for knowledge distillation. The feature reconstructor operates at the feature level, enabling the optimization of the teacher knowledge by enhancing the encoding of high-dimensional spaces. On the other hand, the relation-aware enhancer operates at the logit level, with a focus on identifying and reinforcing the interclass and intraclass relationships within the teacher knowledge. Through extensive experiments conducted on image recognition tasks, we demonstrate the effectiveness of FG-KD in improving the performance of various knowledge distillation techniques, regardless of the specific teacher–student model combinations.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"439-454"},"PeriodicalIF":0.0,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-04DOI: 10.1109/TAI.2025.3575554
An-An Liu;Yadong Zhao;Xin Wen;Rihao Chang;Weizhi Nie
Recommender systems typically exhibit severe popularity bias, with a few highly popular items receiving excessive exposure. Most existing studies tackle this bias in static settings. However, they neglect the dynamic nature of real-world recommendation scenarios and lack a thorough analysis into the root causes of bias, which makes it challenging to accurately model and mitigate the dynamically changing popularity bias and capture genuine user preferences. To this end, we propose a causal disentanglement sequential recommendation model (CDSRec) based on time series analysis and hidden variable separation. Our model leverages Markov chains to analyze historical interaction data within sequential recommendations, capturing the dynamic variations of item popularity and user preferences. Employing causal inference, we disentangle the potential factors implicated in popularity bias. Specifically, user–item interactions are primarily driven by personalized demands and item popularity. Through empirical analysis from a temporal perspective, we reveal that popularity has both positive and negative impacts, and attribute them to stable intrinsic quality factors and dynamic external interference factors. We construct a causal directed acyclic graph to elucidate the temporal correlations among different factors. Subsequently, we utilize historical interaction sequences and item-related attributes as auxiliary information to explicitly disentangle these factors as hidden variables. By reformulating the objective function to optimize the sequential VAE framework, our model effectively mitigates the negative impact of external interference factors. Extensive experimental results on three real-world datasets demonstrate the superiority of our proposed model.
{"title":"Causal Disentanglement for Tackling Popularity Bias in Sequential Recommendation","authors":"An-An Liu;Yadong Zhao;Xin Wen;Rihao Chang;Weizhi Nie","doi":"10.1109/TAI.2025.3575554","DOIUrl":"https://doi.org/10.1109/TAI.2025.3575554","url":null,"abstract":"Recommender systems typically exhibit severe popularity bias, with a few highly popular items receiving excessive exposure. Most existing studies tackle this bias in static settings. However, they neglect the dynamic nature of real-world recommendation scenarios and lack a thorough analysis into the root causes of bias, which makes it challenging to accurately model and mitigate the dynamically changing popularity bias and capture genuine user preferences. To this end, we propose a causal disentanglement sequential recommendation model (CDSRec) based on time series analysis and hidden variable separation. Our model leverages Markov chains to analyze historical interaction data within sequential recommendations, capturing the dynamic variations of item popularity and user preferences. Employing causal inference, we disentangle the potential factors implicated in popularity bias. Specifically, user–item interactions are primarily driven by personalized demands and item popularity. Through empirical analysis from a temporal perspective, we reveal that popularity has both positive and negative impacts, and attribute them to stable intrinsic quality factors and dynamic external interference factors. We construct a causal directed acyclic graph to elucidate the temporal correlations among different factors. Subsequently, we utilize historical interaction sequences and item-related attributes as auxiliary information to explicitly disentangle these factors as hidden variables. By reformulating the objective function to optimize the sequential VAE framework, our model effectively mitigates the negative impact of external interference factors. Extensive experimental results on three real-world datasets demonstrate the superiority of our proposed model.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"426-438"},"PeriodicalIF":0.0,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-04DOI: 10.1109/TAI.2025.3576201
Xinhui Yu;Arvin Tashakori;Liang Zou;Z. Jane Wang
Most existing federated learning (FL) frameworks use deterministic models as the task model, which may suffer from overfitting due to small-scale data at client sides. Since Bayesian learning (BL) can quantify the uncertainty associated with both model parameters and prediction outcomes, there have been efforts to integrate BL with FL and the global objective is transformed into posterior approximation using Bayesian optimization. Variational inference is commonly used in such efforts which utilize the global distribution as the prior for the optimization of local Bayesian neural networks (BNNs) and thus eliminates the need for assigning specific prior distributions for clients. However, due to statistical heterogeneity across clients, the global distribution, representing the collective knowledge of all clients, may not be precise as client prior. To address this concern, we propose a federated Bayesian learning framework with personalized priors (pFedBL) where each client is assigned with a local BNN. Specifically, we first introduce a KL-divergence-based distribution aggregation scheme to ensure the effectiveness of the global distribution. Meanwhile, under the mild assumption that the server has access to a general unlabeled dataset, the server uses predictions as well as predictive uncertainty of these data, derived from local BNNs, to construct feature distributions. These distributions are then provided to clients for fine-tuning the global distribution, resulting in personalized priors. In addition, to ensure optimal integration of local and global data insights, we design an adaptive $zeta$ strategy in the local objective function to balance the log-likelihood estimation term and the KL divergence term. We provide theoretical analysis regarding the upper bound of the averaged generalization error for the proposed pFedBL and experimental results demonstrate its effectiveness on three datasets under different problem settings.
{"title":"pFedBL: Federated Bayesian Learning With Personalized Prior","authors":"Xinhui Yu;Arvin Tashakori;Liang Zou;Z. Jane Wang","doi":"10.1109/TAI.2025.3576201","DOIUrl":"https://doi.org/10.1109/TAI.2025.3576201","url":null,"abstract":"Most existing federated learning (FL) frameworks use deterministic models as the task model, which may suffer from overfitting due to small-scale data at client sides. Since Bayesian learning (BL) can quantify the uncertainty associated with both model parameters and prediction outcomes, there have been efforts to integrate BL with FL and the global objective is transformed into posterior approximation using Bayesian optimization. Variational inference is commonly used in such efforts which utilize the global distribution as the prior for the optimization of local Bayesian neural networks (BNNs) and thus eliminates the need for assigning specific prior distributions for clients. However, due to statistical heterogeneity across clients, the global distribution, representing the collective knowledge of all clients, may not be precise as client prior. To address this concern, we propose a federated Bayesian learning framework with personalized priors (pFedBL) where each client is assigned with a local BNN. Specifically, we first introduce a KL-divergence-based distribution aggregation scheme to ensure the effectiveness of the global distribution. Meanwhile, under the mild assumption that the server has access to a general unlabeled dataset, the server uses predictions as well as predictive uncertainty of these data, derived from local BNNs, to construct feature distributions. These distributions are then provided to clients for fine-tuning the global distribution, resulting in personalized priors. In addition, to ensure optimal integration of local and global data insights, we design an adaptive <inline-formula><tex-math>$zeta$</tex-math></inline-formula> strategy in the local objective function to balance the log-likelihood estimation term and the KL divergence term. We provide theoretical analysis regarding the upper bound of the averaged generalization error for the proposed pFedBL and experimental results demonstrate its effectiveness on three datasets under different problem settings.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"455-470"},"PeriodicalIF":0.0,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-04DOI: 10.1109/TAI.2025.3576336
Yang Wang;Ya-Hui Jia;Wei-Neng Chen;Yi Mei
Neural combinatorial optimization (NCO) has achieved remarkable performance in solving individual vehicle routing problems (VRPs) by leveraging attention mechanisms. However, when generalizing across different problems, these methods perform poorly because the hard parameter sharing models they adopted are unable to capture the commonalities and peculiarities of different problems. To address this limitation, we propose a novel multitask NCO method called the soft parameter sharing model (SPSM) that incorporates multiple independent attention modules and a gating network. SPSM allows the model to learn both universal patterns and individualized requirements without explicitly designating any module as shared or task-specific. When solving a specific VRP, the gating network may decide the importance of the characteristics learned by each attention module. Additionally, we adopt the maximum entropy reinforcement learning to maintain the diversity of the model in the training process, which can prevent the model from being greedy for some dominant tasks or only for the training tasks. Experimental results demonstrate that SPSM significantly enhances zero-shot generalization performance across ten unseen VRP variants and real-world benchmark instances.
{"title":"Soft Parameter Sharing Model for Cross-Problem Generalization in Vehicle Routing Problems","authors":"Yang Wang;Ya-Hui Jia;Wei-Neng Chen;Yi Mei","doi":"10.1109/TAI.2025.3576336","DOIUrl":"https://doi.org/10.1109/TAI.2025.3576336","url":null,"abstract":"Neural combinatorial optimization (NCO) has achieved remarkable performance in solving individual vehicle routing problems (VRPs) by leveraging attention mechanisms. However, when generalizing across different problems, these methods perform poorly because the hard parameter sharing models they adopted are unable to capture the commonalities and peculiarities of different problems. To address this limitation, we propose a novel multitask NCO method called the soft parameter sharing model (SPSM) that incorporates multiple independent attention modules and a gating network. SPSM allows the model to learn both universal patterns and individualized requirements without explicitly designating any module as shared or task-specific. When solving a specific VRP, the gating network may decide the importance of the characteristics learned by each attention module. Additionally, we adopt the maximum entropy reinforcement learning to maintain the diversity of the model in the training process, which can prevent the model from being greedy for some dominant tasks or only for the training tasks. Experimental results demonstrate that SPSM significantly enhances zero-shot generalization performance across ten unseen VRP variants and real-world benchmark instances.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"471-485"},"PeriodicalIF":0.0,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}