Pub Date : 2025-02-19DOI: 10.1007/s40747-025-01797-w
Li Liu, Jinrui Guo, Ziqi Yin, Rui Chen, Guojun Huang
Class imbalance is a prevalent issue in practical applications, which poses significant challenges for classifiers. The large margin distribution machine (LDM) introduces the margin distribution of samples to replace the traditional minimum margin, resulting in extensively enhanced classification performance. However, the hyperplane of LDM tends to be skewed toward the minority class, due to the optimization property for margin means. Moreover, the absence of non-deterministic options and measurement of the confidence level of samples further restricts the capability to manage uncertainty in imbalanced classification tasks. To solve these problems, we propose a novel three-way distance-based fuzzy large margin distribution machine (3W-DBFLDM). Specifically, we introduce a distance-based factor to mitigate the impact of sample size imbalance on classification results by increasing the distance weights of the minority class. Additionally, three-way decision model is introduced to deal with uncertainty, and the model’s robustness is further enhanced by utilizing the fuzzy membership degree that reflects the importance level of each input point. Comparative experiments conducted on UCI datasets demonstrate that the 3W-DBFLDM model surpasses other models in classification accuracy, stability, and robustness. Furthermore, the cost comparison experiment validate that the 3W-DBFLDM model reduces the overall decision cost.
{"title":"A novel three-way distance-based fuzzy large margin distribution machine for imbalance classification","authors":"Li Liu, Jinrui Guo, Ziqi Yin, Rui Chen, Guojun Huang","doi":"10.1007/s40747-025-01797-w","DOIUrl":"https://doi.org/10.1007/s40747-025-01797-w","url":null,"abstract":"<p>Class imbalance is a prevalent issue in practical applications, which poses significant challenges for classifiers. The large margin distribution machine (LDM) introduces the margin distribution of samples to replace the traditional minimum margin, resulting in extensively enhanced classification performance. However, the hyperplane of LDM tends to be skewed toward the minority class, due to the optimization property for margin means. Moreover, the absence of non-deterministic options and measurement of the confidence level of samples further restricts the capability to manage uncertainty in imbalanced classification tasks. To solve these problems, we propose a novel three-way distance-based fuzzy large margin distribution machine (3W-DBFLDM). Specifically, we introduce a distance-based factor to mitigate the impact of sample size imbalance on classification results by increasing the distance weights of the minority class. Additionally, three-way decision model is introduced to deal with uncertainty, and the model’s robustness is further enhanced by utilizing the fuzzy membership degree that reflects the importance level of each input point. Comparative experiments conducted on UCI datasets demonstrate that the 3W-DBFLDM model surpasses other models in classification accuracy, stability, and robustness. Furthermore, the cost comparison experiment validate that the 3W-DBFLDM model reduces the overall decision cost.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"12 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143443464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-19DOI: 10.1007/s40747-025-01791-2
Abdelhadi Limane, Farouq Zitouni, Saad Harous, Rihab Lakbichi, Aridj Ferhat, Abdulaziz S. Almazyad, Pradeep Jangir, Ali Wagdy Mohamed
Chaos theory, with its unique blend of randomness and ergodicity, has become a powerful tool for enhancing metaheuristic algorithms. In recent years, there has been a growing number of chaos-enhanced metaheuristic algorithms (CMAs), accompanied by a notable scarcity of studies that analyze and organize this field. To respond to this challenge, this paper comprehensively analyzes recent advances in CMAs from 2013 to 2024, proposing a novel classification scheme that systematically organizes prevalent and practical approaches for integrating chaos theory into metaheuristic algorithms based on their strategic roles. In addition, a list of 27 standard chaotic maps is explored, and a summary of the application domains where CMAs have demonstrably improved performance is provided. To experimentally demonstrate the capability of chaos theory to enhance metaheuristic algorithms that face common issues such as susceptibility to local optima, non-smooth transitions between global and local search phases, and decreased diversity, we developed a chaotic variant of the recently proposed RIME optimizer, which also encounters these challenges to some extent. We tested C-RIME on the CEC2022 benchmark suite, rigorously analyzing numerical results using statistical metrics. Non-parametric statistical tests, including the Friedman and Wilcoxon signed-rank tests, were also used to validate the findings. The results demonstrated promising performance, with 14 out of 21 chaotic variants outperforming the non-chaotic variant, whereas the piecewise map-based variant achieved the best results. In addition, C-RIME outperformed ten state-of-the-art metaheuristic algorithms regarding solution quality and convergence speed.
{"title":"Chaos-enhanced metaheuristics: classification, comparison, and convergence analysis","authors":"Abdelhadi Limane, Farouq Zitouni, Saad Harous, Rihab Lakbichi, Aridj Ferhat, Abdulaziz S. Almazyad, Pradeep Jangir, Ali Wagdy Mohamed","doi":"10.1007/s40747-025-01791-2","DOIUrl":"https://doi.org/10.1007/s40747-025-01791-2","url":null,"abstract":"<p>Chaos theory, with its unique blend of randomness and ergodicity, has become a powerful tool for enhancing metaheuristic algorithms. In recent years, there has been a growing number of chaos-enhanced metaheuristic algorithms (CMAs), accompanied by a notable scarcity of studies that analyze and organize this field. To respond to this challenge, this paper comprehensively analyzes recent advances in CMAs from 2013 to 2024, proposing a novel classification scheme that systematically organizes prevalent and practical approaches for integrating chaos theory into metaheuristic algorithms based on their strategic roles. In addition, a list of 27 standard chaotic maps is explored, and a summary of the application domains where CMAs have demonstrably improved performance is provided. To experimentally demonstrate the capability of chaos theory to enhance metaheuristic algorithms that face common issues such as susceptibility to local optima, non-smooth transitions between global and local search phases, and decreased diversity, we developed a chaotic variant of the recently proposed RIME optimizer, which also encounters these challenges to some extent. We tested C-RIME on the CEC2022 benchmark suite, rigorously analyzing numerical results using statistical metrics. Non-parametric statistical tests, including the Friedman and Wilcoxon signed-rank tests, were also used to validate the findings. The results demonstrated promising performance, with 14 out of 21 chaotic variants outperforming the non-chaotic variant, whereas the piecewise map-based variant achieved the best results. In addition, C-RIME outperformed ten state-of-the-art metaheuristic algorithms regarding solution quality and convergence speed.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"16 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143443465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-19DOI: 10.1007/s40747-025-01816-w
Tao Wang, Ziming Ruan, Yuyan Wang, Chong Chen
Multi-task learning is important in reinforcement learning where simultaneously training across different tasks allows for leveraging shared information among them, typically leading to better performance than single-task learning. While joint training of multiple tasks permits parameter sharing between tasks, the optimization challenge becomes crucial—identifying which parameters should be reused and managing potential gradient conflicts arising from different tasks. To tackle this issue, instead of uniform parameter sharing, we propose an adjudicate reconfiguration network model, which we integrate into the Soft Actor-Critic (SAC) algorithm to address the optimization problems brought about by parameter sharing in multi-task reinforcement learning algorithms. The decision reconstruction network model is designed to achieve cross-network layer information exchange between network layers by dynamically adjusting and reconfiguring the network hierarchy, which can overcome the inherent limitations of traditional network architecture in handling multitasking scenarios. The SAC algorithm based on the decision reconstruction network model can achieve simultaneous training in multiple tasks, effectively learning and integrating relevant knowledge of each task. Finally, the proposed algorithm is evaluated in a multi-task environment of the Meta-World, a benchmark for multi-task reinforcement learning containing robotic manipulation tasks, and the multi-task MUJOCO environment.
{"title":"Control strategy of robotic manipulator based on multi-task reinforcement learning","authors":"Tao Wang, Ziming Ruan, Yuyan Wang, Chong Chen","doi":"10.1007/s40747-025-01816-w","DOIUrl":"https://doi.org/10.1007/s40747-025-01816-w","url":null,"abstract":"<p>Multi-task learning is important in reinforcement learning where simultaneously training across different tasks allows for leveraging shared information among them, typically leading to better performance than single-task learning. While joint training of multiple tasks permits parameter sharing between tasks, the optimization challenge becomes crucial—identifying which parameters should be reused and managing potential gradient conflicts arising from different tasks. To tackle this issue, instead of uniform parameter sharing, we propose an adjudicate reconfiguration network model, which we integrate into the Soft Actor-Critic (SAC) algorithm to address the optimization problems brought about by parameter sharing in multi-task reinforcement learning algorithms. The decision reconstruction network model is designed to achieve cross-network layer information exchange between network layers by dynamically adjusting and reconfiguring the network hierarchy, which can overcome the inherent limitations of traditional network architecture in handling multitasking scenarios. The SAC algorithm based on the decision reconstruction network model can achieve simultaneous training in multiple tasks, effectively learning and integrating relevant knowledge of each task. Finally, the proposed algorithm is evaluated in a multi-task environment of the Meta-World, a benchmark for multi-task reinforcement learning containing robotic manipulation tasks, and the multi-task MUJOCO environment.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"5 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143443462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Trajectory prediction has become increasingly critical in various applications such as autonomous driving and robotic navigation. However, due to the significant variations in trajectory patterns across different scenarios, models trained in known environments often falter in unseen ones. To learn a generalized model that can directly handle unseen domains without requiring any model updating, we propose a novel tailored meta-learning-based trajectory prediction model called DTM. Our approach integrates a dual trajectory transformer (Dual_TT) equipped with an agent-consistency loss, facilitating a comprehensive exploration of both individual intentions and group dynamics across diverse scenarios. Building on this, we propose a tailored meta-learning framework (TMG) to simulate the generalization process between source and target domains during the training phase. In the task construction phase, we employ multi-dimensional labels to precisely define and distinguish between different domains. During the dual-phase parameter update, we partially fix crucial attention mechanism parameters and apply an attention alignment loss to harmonize domain-invariant and specific features. We also incorporate a Serial and Parallel Training (SPT) strategy to significantly enhance task processing and the model’s adaptability to domain shifts. Extensive testing across various domains demonstrates that our DTM model not only outperforms existing top-performing baselines on real-world datasets but also validates the effectiveness of our design through ablation studies.
{"title":"Tailored meta-learning for dual trajectory transformer: advancing generalized trajectory prediction","authors":"Feilong Huang, Zide Fan, Xiaohe Li, Wenhui Zhang, Pengfei Li, Ying Geng, Keqing Zhu","doi":"10.1007/s40747-025-01802-2","DOIUrl":"https://doi.org/10.1007/s40747-025-01802-2","url":null,"abstract":"<p>Trajectory prediction has become increasingly critical in various applications such as autonomous driving and robotic navigation. However, due to the significant variations in trajectory patterns across different scenarios, models trained in known environments often falter in unseen ones. To learn a generalized model that can directly handle unseen domains without requiring any model updating, we propose a novel tailored meta-learning-based trajectory prediction model called DTM. Our approach integrates a dual trajectory transformer (Dual_TT) equipped with an agent-consistency loss, facilitating a comprehensive exploration of both individual intentions and group dynamics across diverse scenarios. Building on this, we propose a tailored meta-learning framework (TMG) to simulate the generalization process between source and target domains during the training phase. In the task construction phase, we employ multi-dimensional labels to precisely define and distinguish between different domains. During the dual-phase parameter update, we partially fix crucial attention mechanism parameters and apply an attention alignment loss to harmonize domain-invariant and specific features. We also incorporate a Serial and Parallel Training (SPT) strategy to significantly enhance task processing and the model’s adaptability to domain shifts. Extensive testing across various domains demonstrates that our DTM model not only outperforms existing top-performing baselines on real-world datasets but also validates the effectiveness of our design through ablation studies.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143443461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-19DOI: 10.1007/s40747-025-01790-3
Ahmad Almadhor, Areej Alasiry, Shtwai Alsubai, Abdullah Al Hejaili, Urban Kovac, Sidra Abbas
Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition marked by difficulties in social skills, repetitive behaviours, and communication. Early and accurate diagnosis is essential for effective intervention and support. This paper proposes a secure and privacy-preserving framework for diagnosing ASD by integrating multimodal kinematic and eye movement sensory data, Deep Neural Networks (DNN), and Explainable Artificial Intelligence (XAI). Federated Learning (FL), a distributed machine learning approach, is utilized to ensure data privacy by training models across multiple devices without centralizing sensitive data. In our evaluation, we employ FL using a shallow DNN as the shared model and Federated Averaging (FedAvg) as the aggregation algorithm. We conduct experiments across two scenarios for each dataset: the first using FL with all features and the second using FL with features selected by XAI. The experiments, conducted with three clients over three rounds of training, show that the L_General dataset produces the best results, with Client 2 achieving an accuracy of 99.99% and Client 1 achieving 88%. This study underscores FL’s potential to preserve privacy and security while maintaining high diagnostic accuracy, making it a viable solution for healthcare applications involving sensitive data.
{"title":"Explainable and secure framework for autism prediction using multimodal eye tracking and kinematic data","authors":"Ahmad Almadhor, Areej Alasiry, Shtwai Alsubai, Abdullah Al Hejaili, Urban Kovac, Sidra Abbas","doi":"10.1007/s40747-025-01790-3","DOIUrl":"https://doi.org/10.1007/s40747-025-01790-3","url":null,"abstract":"<p>Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition marked by difficulties in social skills, repetitive behaviours, and communication. Early and accurate diagnosis is essential for effective intervention and support. This paper proposes a secure and privacy-preserving framework for diagnosing ASD by integrating multimodal kinematic and eye movement sensory data, Deep Neural Networks (DNN), and Explainable Artificial Intelligence (XAI). Federated Learning (FL), a distributed machine learning approach, is utilized to ensure data privacy by training models across multiple devices without centralizing sensitive data. In our evaluation, we employ FL using a shallow DNN as the shared model and Federated Averaging (FedAvg) as the aggregation algorithm. We conduct experiments across two scenarios for each dataset: the first using FL with all features and the second using FL with features selected by XAI. The experiments, conducted with three clients over three rounds of training, show that the L_General dataset produces the best results, with Client 2 achieving an accuracy of 99.99% and Client 1 achieving 88%. This study underscores FL’s potential to preserve privacy and security while maintaining high diagnostic accuracy, making it a viable solution for healthcare applications involving sensitive data.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"13 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143443463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-19DOI: 10.1007/s40747-025-01795-y
Xiang Huang, Qiong Nong, Xiaobo Wang, Hongcheng Zhang, Kunpeng Du, Chunlin Yin, Li Yang, Bin Yan, Xuan Zhang
The rise of Large Language Models (LLMs) has further led to the development of text summarization techniques and also brought more attention to the problem of hallucination in the research of text summarization. Existing work in current text summarization research based on LLMs typically uses In-Context Learning (ICL) to supply accurate (document-summary) pairs of samples to the model, thus allowing the model to be more explicit in predicting the target. However, in this way, models can only determine what to do, without explicitly prohibiting what models cannot do. It is highly likely to lead to increased hallucinations due to excessive model-free play. In this paper, to alleviate the problem of hallucination in text summarization based on LLMs, we propose CL2Sum, a method that combines Contrastive Learning (CL) and ICL for summarization. After analysing the generated summaries of LLMs and summarising their hallucination types, we provided the models with accurate summaries and summaries containing hallucinations as ICL instances, either automatically or artificially. It aims to guide the model to make accurate predictions according to positive samples while also avoiding hallucinations similar to those in negative samples. Finally, a series of comparative experiments were conducted on summary datasets of different lengths and languages. The results show that CL2Sum effectively alleviates the hallucination problem of text summaries while also improving the overall quality of the generated summaries. Moreover, it can be widely adapted to text summarization tasks in different scenarios with a certain degree of robustness.
{"title":"Cl2sum: abstractive summarization via contrastive prompt constructed by LLMs hallucination","authors":"Xiang Huang, Qiong Nong, Xiaobo Wang, Hongcheng Zhang, Kunpeng Du, Chunlin Yin, Li Yang, Bin Yan, Xuan Zhang","doi":"10.1007/s40747-025-01795-y","DOIUrl":"https://doi.org/10.1007/s40747-025-01795-y","url":null,"abstract":"<p>The rise of Large Language Models (LLMs) has further led to the development of text summarization techniques and also brought more attention to the problem of hallucination in the research of text summarization. Existing work in current text summarization research based on LLMs typically uses In-Context Learning (ICL) to supply accurate (document-summary) pairs of samples to the model, thus allowing the model to be more explicit in predicting the target. However, in this way, models can only determine what to do, without explicitly prohibiting what models cannot do. It is highly likely to lead to increased hallucinations due to excessive model-free play. In this paper, to alleviate the problem of hallucination in text summarization based on LLMs, we propose CL2Sum, a method that combines Contrastive Learning (CL) and ICL for summarization. After analysing the generated summaries of LLMs and summarising their hallucination types, we provided the models with accurate summaries and summaries containing hallucinations as ICL instances, either automatically or artificially. It aims to guide the model to make accurate predictions according to positive samples while also avoiding hallucinations similar to those in negative samples. Finally, a series of comparative experiments were conducted on summary datasets of different lengths and languages. The results show that CL2Sum effectively alleviates the hallucination problem of text summaries while also improving the overall quality of the generated summaries. Moreover, it can be widely adapted to text summarization tasks in different scenarios with a certain degree of robustness.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"13 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143443478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1007/s40747-025-01793-0
Zhang Chen, Hanlin Bian, Wei Zhu
With the development of data acquisition technology, a large amount of time-series data can be collected. However, handling too much data often leads to a waste of social resources. It becomes significant to determine the minimum data size required for training. In this paper, a framework for neural ordinary differential equations based on incremental learning is discussed, which can enhance learning ability and determine the minimum data size required in data modeling compared to neural ordinary differential equations. This framework continuously updates the neural ordinary differential equations with newly added data while avoiding the addition of extra parameters. Once the preset accuracy is reached, the minimum data size needed for training can be determined. Furthermore, the minimum data size required for five classic models under various sampling rates is discussed. By incorporating new data, it enhances accuracy instead of increasing the depth and width of the neural network. The close integration of data generation and training can significantly reduce the total time required. Theoretical analysis confirms convergence, while numerical results demonstrate that the framework offers superior predictive ability and reduced computation time compared to traditional neural differential equations.
{"title":"Incremental data modeling based on neural ordinary differential equations","authors":"Zhang Chen, Hanlin Bian, Wei Zhu","doi":"10.1007/s40747-025-01793-0","DOIUrl":"https://doi.org/10.1007/s40747-025-01793-0","url":null,"abstract":"<p>With the development of data acquisition technology, a large amount of time-series data can be collected. However, handling too much data often leads to a waste of social resources. It becomes significant to determine the minimum data size required for training. In this paper, a framework for neural ordinary differential equations based on incremental learning is discussed, which can enhance learning ability and determine the minimum data size required in data modeling compared to neural ordinary differential equations. This framework continuously updates the neural ordinary differential equations with newly added data while avoiding the addition of extra parameters. Once the preset accuracy is reached, the minimum data size needed for training can be determined. Furthermore, the minimum data size required for five classic models under various sampling rates is discussed. By incorporating new data, it enhances accuracy instead of increasing the depth and width of the neural network. The close integration of data generation and training can significantly reduce the total time required. Theoretical analysis confirms convergence, while numerical results demonstrate that the framework offers superior predictive ability and reduced computation time compared to traditional neural differential equations.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143427301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1007/s40747-025-01789-w
Hanyan Liang, Shuyao Chai, Xixuan Zhao, Jiangming Kan
Single Image Defocus Deblurring (SIDD) remains challenging due to spatially varying blur kernels, particularly in processing high-resolution images where traditional methods often struggle with artifact generation, detail preservation, and computational efficiency. This paper presents Swin-Diff, a novel architecture integrating diffusion models with Transformer-based networks for robust defocus deblurring. Our approach employs a two-stage training strategy where a diffusion model generates prior information in a compact latent space, which is then hierarchically fused with intermediate features to guide the regression model. The architecture incorporates a dual-dimensional self-attention mechanism operating across channel and spatial domains, enhancing long-range modeling capabilities while maintaining linear computational complexity. Extensive experiments on three public datasets (DPDD, RealDOF, and RTF) demonstrate Swin-Diff’s superior performance, achieving average improvements of 1.37% in PSNR, 3.6% in SSIM, 2.3% in MAE, and 25.2% in LPIPS metrics compared to state-of-the-art methods. Our results validate the effectiveness of combining diffusion models with hierarchical attention mechanisms for high-quality defocus blur removal.
{"title":"Swin-Diff: a single defocus image deblurring network based on diffusion model","authors":"Hanyan Liang, Shuyao Chai, Xixuan Zhao, Jiangming Kan","doi":"10.1007/s40747-025-01789-w","DOIUrl":"https://doi.org/10.1007/s40747-025-01789-w","url":null,"abstract":"<p>Single Image Defocus Deblurring (SIDD) remains challenging due to spatially varying blur kernels, particularly in processing high-resolution images where traditional methods often struggle with artifact generation, detail preservation, and computational efficiency. This paper presents Swin-Diff, a novel architecture integrating diffusion models with Transformer-based networks for robust defocus deblurring. Our approach employs a two-stage training strategy where a diffusion model generates prior information in a compact latent space, which is then hierarchically fused with intermediate features to guide the regression model. The architecture incorporates a dual-dimensional self-attention mechanism operating across channel and spatial domains, enhancing long-range modeling capabilities while maintaining linear computational complexity. Extensive experiments on three public datasets (DPDD, RealDOF, and RTF) demonstrate Swin-Diff’s superior performance, achieving average improvements of 1.37% in PSNR, 3.6% in SSIM, 2.3% in MAE, and 25.2% in LPIPS metrics compared to state-of-the-art methods. Our results validate the effectiveness of combining diffusion models with hierarchical attention mechanisms for high-quality defocus blur removal.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"15 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143427273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most of existing non-invasive load monitoring (NILM) methods usually ignore the complementarity between temporal and spatial characteristics of appliance power data. To tackle this problem, this paper proposes a spatio-temporal attention fusion network with a sequence-to-point learning scheme for load disaggregation. Initially, a temporal feature extraction module is designed to extract temporal features over a large temporal receptive field. Then, an asymmetric inception module is designed for a multi-scale spatial feature extraction. The extracted temporal features and spatial features are concatenated, and fed into a polarized self-attention module to perform a spatio-temporal attention fusion, followed by two dense layers for final NILM predictions. Extensive experiments on two public datasets such as REDD and UK-DALE show the validity of the proposed method, outperforming the other used methods on NILM tasks.
{"title":"Sequence-to-point learning based on spatio-temporal attention fusion network for non-intrusive load monitoring","authors":"Shiqing Zhang, Youyao Fu, Xiaoming Zhao, Jiangxiong Fang, Yadong Liu, Xiaoli Wang, Baochang Zhang, Jun Yu","doi":"10.1007/s40747-025-01803-1","DOIUrl":"https://doi.org/10.1007/s40747-025-01803-1","url":null,"abstract":"<p>Most of existing non-invasive load monitoring (NILM) methods usually ignore the complementarity between temporal and spatial characteristics of appliance power data. To tackle this problem, this paper proposes a spatio-temporal attention fusion network with a sequence-to-point learning scheme for load disaggregation. Initially, a temporal feature extraction module is designed to extract temporal features over a large temporal receptive field. Then, an asymmetric inception module is designed for a multi-scale spatial feature extraction. The extracted temporal features and spatial features are concatenated, and fed into a polarized self-attention module to perform a spatio-temporal attention fusion, followed by two dense layers for final NILM predictions. Extensive experiments on two public datasets such as REDD and UK-DALE show the validity of the proposed method, outperforming the other used methods on NILM tasks.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"4 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143427274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1007/s40747-025-01804-0
Meng Zhang, Wenzhong Yang, Liejun Wang, Zhonghua Wu, Danny Chen
Micro-expressions (MEs) are unconscious and involuntary reactions that genuinely reflect an individual’s inner emotional state, making them valuable in the fields of emotion analysis and behavior recognition. MEs are characterized by subtle changes within specific facial action units, and effective feature learning and fusion tailored to these characteristics still require in-depth research. To address this challenge, this paper proposes a novel hierarchical feature aggregation network (HFA-Net). In the local branch, the multi-scale attention (MSA) block is proposed to capture subtle facial changes and local information. The global branch introduces the retentive meet transformers (RMT) block to establish dependencies between holistic facial features and structural information. Considering that single-scale features are insufficient to fully capture the subtleties of MEs, a multi-level feature aggregation (MLFA) module is proposed to extract and fuse features from different levels across the two branches, preserving more comprehensive feature information. To enhance the representation of key features, an adaptive attention feature fusion (AAFF) module is designed to focus on the most useful and relevant feature channels. Extensive experiments conducted on the SMIC, CASME II, and SAMM benchmark databases demonstrate that the proposed HFA-Net outperforms current state-of-the-art methods. Additionally, ablation studies confirm the superior discriminative capability of HFA-Net when learning feature representations from limited ME samples. Our code is publicly available at https://github.com/tairuwu/HFA-Net.
{"title":"HFA-Net: hierarchical feature aggregation network for micro-expression recognition","authors":"Meng Zhang, Wenzhong Yang, Liejun Wang, Zhonghua Wu, Danny Chen","doi":"10.1007/s40747-025-01804-0","DOIUrl":"https://doi.org/10.1007/s40747-025-01804-0","url":null,"abstract":"<p>Micro-expressions (MEs) are unconscious and involuntary reactions that genuinely reflect an individual’s inner emotional state, making them valuable in the fields of emotion analysis and behavior recognition. MEs are characterized by subtle changes within specific facial action units, and effective feature learning and fusion tailored to these characteristics still require in-depth research. To address this challenge, this paper proposes a novel hierarchical feature aggregation network (HFA-Net). In the local branch, the multi-scale attention (MSA) block is proposed to capture subtle facial changes and local information. The global branch introduces the retentive meet transformers (RMT) block to establish dependencies between holistic facial features and structural information. Considering that single-scale features are insufficient to fully capture the subtleties of MEs, a multi-level feature aggregation (MLFA) module is proposed to extract and fuse features from different levels across the two branches, preserving more comprehensive feature information. To enhance the representation of key features, an adaptive attention feature fusion (AAFF) module is designed to focus on the most useful and relevant feature channels. Extensive experiments conducted on the SMIC, CASME II, and SAMM benchmark databases demonstrate that the proposed HFA-Net outperforms current state-of-the-art methods. Additionally, ablation studies confirm the superior discriminative capability of HFA-Net when learning feature representations from limited ME samples. Our code is publicly available at https://github.com/tairuwu/HFA-Net.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"18 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143393219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}