Pub Date : 2025-01-03DOI: 10.1007/s40747-024-01727-2
Zhijie Zhang, Huang Bai, Ljubiša Stanković, Junmei Sun, Xiumei Li
Compressive sensing (CS) has been widely applied in signal processing field, especially for image reconstruction tasks. CS simplifies the sampling and compression procedures, but leaves the difficulty to the nonlinear reconstruction. Traditional CS reconstruction algorithms are usually iterative, having a complete theoretical foundation. Nevertheless, these iterative algorithms suffer from the high computational complexity. The fashionable deep network-based methods can achieve high-precision CS reconstruction with satisfactory speed but are short of theoretical analysis and interpretability. To combine the merits of the above two kinds of CS methods, the deep unfolding networks (DUNs) have been developed. In this paper, a novel DUN named supervised transmission-augmented network (SuperTA-Net) is proposed. Based on the framework of our previous work PIPO-Net, the multi-channel transmission strategy is put forward to reduce the influence of critical information loss between modules and improve the reliability of data. Besides, in order to avoid the issues such as high information redundancy and high computational burden when too many channels are set, the attention based supervision scheme is presented to dynamically adjust the weight of each channel and remove the redundant information. Furthermore, noting the difference between the original image and the output of SuperTA-Net, the reinforcement network is developed, where the main component called residual recovery network (RR-Net) is lightweight and can be added to reinforce all kinds of CS reconstruction networks. Experiments on reconstructing CS images demonstrate the effectiveness of the proposed networks.
{"title":"A novel transmission-augmented deep unfolding network with consideration of residual recovery","authors":"Zhijie Zhang, Huang Bai, Ljubiša Stanković, Junmei Sun, Xiumei Li","doi":"10.1007/s40747-024-01727-2","DOIUrl":"https://doi.org/10.1007/s40747-024-01727-2","url":null,"abstract":"<p>Compressive sensing (CS) has been widely applied in signal processing field, especially for image reconstruction tasks. CS simplifies the sampling and compression procedures, but leaves the difficulty to the nonlinear reconstruction. Traditional CS reconstruction algorithms are usually iterative, having a complete theoretical foundation. Nevertheless, these iterative algorithms suffer from the high computational complexity. The fashionable deep network-based methods can achieve high-precision CS reconstruction with satisfactory speed but are short of theoretical analysis and interpretability. To combine the merits of the above two kinds of CS methods, the deep unfolding networks (DUNs) have been developed. In this paper, a novel DUN named supervised transmission-augmented network (SuperTA-Net) is proposed. Based on the framework of our previous work PIPO-Net, the multi-channel transmission strategy is put forward to reduce the influence of critical information loss between modules and improve the reliability of data. Besides, in order to avoid the issues such as high information redundancy and high computational burden when too many channels are set, the attention based supervision scheme is presented to dynamically adjust the weight of each channel and remove the redundant information. Furthermore, noting the difference between the original image and the output of SuperTA-Net, the reinforcement network is developed, where the main component called residual recovery network (RR-Net) is lightweight and can be added to reinforce all kinds of CS reconstruction networks. Experiments on reconstructing CS images demonstrate the effectiveness of the proposed networks.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"55 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142917324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-02DOI: 10.1007/s40747-024-01717-4
Chengrong Yang, Qiwen Jin, Fei Du, Jing Guo, Yujue Zhou
By leveraging large-scale image-text paired data for pre-training, the model can efficiently learn the alignment between images and text, significantly advancing the development of zero-shot learning (ZSL) in the field of intelligent medical image analysis. However, the heterogeneity between cross-modalities, false negatives in image-text pairs, and domain shift phenomena pose challenges, making it difficult for existing methods to effectively learn the deep semantic relationships between images and text. To address these challenges, we propose a multi-label chest X-ray recognition generalized ZSL framework based on placeholder learning, termed PLZero. Specifically, we first introduce a jointed embedding space learning module (JESL) to encourage the model to better capture the diversity among different labels. Secondly, we propose a hallucinated class generation module (HCG), which generates hallucinated classes by feature diffusion and feature fusion based on the visual and semantic features of seen classes, using these hallucinated classes as placeholders for unseen classes. Finally, we propose a hallucinated class-based prototype learning module (HCPL), which leverages contrastive learning to control the distribution of hallucinated classes around seen classes without significant deviation from the original data, encouraging high dispersion of class prototypes for seen classes to create sufficient space for inserting unseen class samples. Extensive experiments demonstrate that our method exhibits sufficient generalization and achieves the best performance across three classic and challenging chest X-ray datasets: NIH Chest X-ray 14, CheXpert, and ChestX-Det10. Notably, our method outperforms others even when the number of unseen classes exceeds the experimental settings of other methods. The codes are available at: https://github.com/jinqiwen/PLZero.
{"title":"PLZero: placeholder based approach to generalized zero-shot learning for multi-label recognition in chest radiographs","authors":"Chengrong Yang, Qiwen Jin, Fei Du, Jing Guo, Yujue Zhou","doi":"10.1007/s40747-024-01717-4","DOIUrl":"https://doi.org/10.1007/s40747-024-01717-4","url":null,"abstract":"<p>By leveraging large-scale image-text paired data for pre-training, the model can efficiently learn the alignment between images and text, significantly advancing the development of zero-shot learning (ZSL) in the field of intelligent medical image analysis. However, the heterogeneity between cross-modalities, false negatives in image-text pairs, and domain shift phenomena pose challenges, making it difficult for existing methods to effectively learn the deep semantic relationships between images and text. To address these challenges, we propose a multi-label chest X-ray recognition generalized ZSL framework based on placeholder learning, termed PLZero. Specifically, we first introduce a jointed embedding space learning module (JESL) to encourage the model to better capture the diversity among different labels. Secondly, we propose a hallucinated class generation module (HCG), which generates hallucinated classes by feature diffusion and feature fusion based on the visual and semantic features of seen classes, using these hallucinated classes as placeholders for unseen classes. Finally, we propose a hallucinated class-based prototype learning module (HCPL), which leverages contrastive learning to control the distribution of hallucinated classes around seen classes without significant deviation from the original data, encouraging high dispersion of class prototypes for seen classes to create sufficient space for inserting unseen class samples. Extensive experiments demonstrate that our method exhibits sufficient generalization and achieves the best performance across three classic and challenging chest X-ray datasets: NIH Chest X-ray 14, CheXpert, and ChestX-Det10. Notably, our method outperforms others even when the number of unseen classes exceeds the experimental settings of other methods. The codes are available at: https://github.com/jinqiwen/PLZero.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"27 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142917069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-02DOI: 10.1007/s40747-024-01714-7
Hongbin Zhang, Jin Zhang, Xuan Zhong, Ya Feng, Guangli Li, Xiong Li, Jingqin Lv, Donghong Ji
Retinal image segmentation is crucial for the early diagnosis of some diseases like diabetes and hypertension. Current methods face many challenges, such as inadequate multi-scale semantics and insufficient global information. In view of this, we propose a network called multi-scale semantics mining and tiny details enhancement (MSM-TDE). First, a multi-scale feature input module is designed to capture multi-scale semantics information from the source. Then a fresh multi-scale attention guidance module is constructed to mine local multi-scale semantics while a global semantics enhancement module is proposed to extract global multi-scale semantics. Additionally, an auxiliary vessel detail enhancement branch using dynamic snake convolution is built to enhance the tiny vessel details. Extensive experimental results on four public datasets validate the superiority of MSM-TDE, which obtains competitive performance with satisfactory model complexity. Notably, this study provides an innovative idea of multi-scale semantics mining by diverse methods.
{"title":"MSM-TDE: multi-scale semantics mining and tiny details enhancement network for retinal vessel segmentation","authors":"Hongbin Zhang, Jin Zhang, Xuan Zhong, Ya Feng, Guangli Li, Xiong Li, Jingqin Lv, Donghong Ji","doi":"10.1007/s40747-024-01714-7","DOIUrl":"https://doi.org/10.1007/s40747-024-01714-7","url":null,"abstract":"<p>Retinal image segmentation is crucial for the early diagnosis of some diseases like diabetes and hypertension. Current methods face many challenges, such as inadequate multi-scale semantics and insufficient global information. In view of this, we propose a network called multi-scale semantics mining and tiny details enhancement (MSM-TDE). First, a multi-scale feature input module is designed to capture multi-scale semantics information from the source. Then a fresh multi-scale attention guidance module is constructed to mine local multi-scale semantics while a global semantics enhancement module is proposed to extract global multi-scale semantics. Additionally, an auxiliary vessel detail enhancement branch using dynamic snake convolution is built to enhance the tiny vessel details. Extensive experimental results on four public datasets validate the superiority of MSM-TDE, which obtains competitive performance with satisfactory model complexity. Notably, this study provides an innovative idea of multi-scale semantics mining by diverse methods.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"70 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142917068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-02DOI: 10.1007/s40747-024-01748-x
Jiale Hu, Xiang Li, Changzheng Liu, Ronghua Zhang, Junwei Tang, Yi Sun, Yuedong Wang
Recent research has shown that deep learning models are vulnerable to adversarial attacks, including gradient attacks, which can lead to incorrect outputs. The existing gradient attack methods typically rely on repetitive multistep strategies to improve their attack success rates, resulting in longer training times and severe overfitting. To address these issues, we propose an adaptive perturbation-based gradient attack method with dual-loss optimization (APDL). This method adaptively adjusts the single-step perturbation magnitude based on an exponential distance function, thereby accelerating the convergence process. APDL achieves convergence in fewer than 10 iterations, outperforming the traditional nonadaptive methods and achieving a high attack success rate with fewer iterations. Furthermore, to increase the transferability of gradient attacks such as APDL across different models and reduce the effects of overfitting on the training model, we introduce a triple-differential logit fusion (TDLF) method grounded in knowledge distillation principles. This approach mitigates the edge effects associated with gradient attacks by adjusting the hardness and softness of labels. Experiments conducted on ImageNet-compatible datasets demonstrate that APDL is significantly faster than the commonly used nonadaptive methods, whereas the TDLF method exhibits strong transferability.
{"title":"APDL: an adaptive step size method for white-box adversarial attacks","authors":"Jiale Hu, Xiang Li, Changzheng Liu, Ronghua Zhang, Junwei Tang, Yi Sun, Yuedong Wang","doi":"10.1007/s40747-024-01748-x","DOIUrl":"https://doi.org/10.1007/s40747-024-01748-x","url":null,"abstract":"<p>Recent research has shown that deep learning models are vulnerable to adversarial attacks, including gradient attacks, which can lead to incorrect outputs. The existing gradient attack methods typically rely on repetitive multistep strategies to improve their attack success rates, resulting in longer training times and severe overfitting. To address these issues, we propose an adaptive perturbation-based gradient attack method with dual-loss optimization (APDL). This method adaptively adjusts the single-step perturbation magnitude based on an exponential distance function, thereby accelerating the convergence process. APDL achieves convergence in fewer than 10 iterations, outperforming the traditional nonadaptive methods and achieving a high attack success rate with fewer iterations. Furthermore, to increase the transferability of gradient attacks such as APDL across different models and reduce the effects of overfitting on the training model, we introduce a triple-differential logit fusion (TDLF) method grounded in knowledge distillation principles. This approach mitigates the edge effects associated with gradient attacks by adjusting the hardness and softness of labels. Experiments conducted on ImageNet-compatible datasets demonstrate that APDL is significantly faster than the commonly used nonadaptive methods, whereas the TDLF method exhibits strong transferability.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"34 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142917070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-02DOI: 10.1007/s40747-024-01708-5
Xiangzuo Huo, Shengwei Tian, Long Yu, Wendong Zhang, Aolun Li, Qimeng Yang, Jinmiao Song
Esophageal cancer is a globally significant but understudied type of cancer with high mortality rates. The staging and differentiation of esophageal cancer are crucial factors in determining the prognosis and surgical treatment plan for patients, as well as improving their chances of survival. Endoscopy and histopathological examination are considered as the gold standard for esophageal cancer diagnosis. However, some previous studies have employed deep learning-based methods for esophageal cancer analysis, which are limited to single-modal features, resulting in inadequate classification results. In response to these limitations, multi-modal learning has emerged as a promising alternative for medical image analysis tasks. In this paper, we propose a hierarchical feature fusion network, MM-HiFuse, for multi-modal multitask learning to improve the classification accuracy of esophageal cancer staging and differentiation level. The proposed architecture combines low-level to deep-level features of both pathological and endoscopic images to achieve accurate classification results. The key characteristics of MM-HiFuse include: (i) a parallel hierarchy of convolution and self-attention layers specifically designed for pathological and endoscopic image features; (ii) a multi-modal hierarchical feature fusion module (MHF) and a new multitask weighted combination loss function. The benefits of these features are the effective extraction of multi-modal representations at different semantic scales and the mutual complementarity of the multitask learning, leading to improved classification performance. Experimental results demonstrate that MM-HiFuse outperforms single-modal methods in esophageal cancer staging and differentiation classification. Our findings provide evidence for the early diagnosis and accurate staging of esophageal cancer and serve as a new inspiration for the application of multi-modal multitask learning in medical image analysis. Code is available at https://github.com/huoxiangzuo/MM-HiFuse.
{"title":"MM-HiFuse: multi-modal multi-task hierarchical feature fusion for esophagus cancer staging and differentiation classification","authors":"Xiangzuo Huo, Shengwei Tian, Long Yu, Wendong Zhang, Aolun Li, Qimeng Yang, Jinmiao Song","doi":"10.1007/s40747-024-01708-5","DOIUrl":"https://doi.org/10.1007/s40747-024-01708-5","url":null,"abstract":"<p>Esophageal cancer is a globally significant but understudied type of cancer with high mortality rates. The staging and differentiation of esophageal cancer are crucial factors in determining the prognosis and surgical treatment plan for patients, as well as improving their chances of survival. Endoscopy and histopathological examination are considered as the gold standard for esophageal cancer diagnosis. However, some previous studies have employed deep learning-based methods for esophageal cancer analysis, which are limited to single-modal features, resulting in inadequate classification results. In response to these limitations, multi-modal learning has emerged as a promising alternative for medical image analysis tasks. In this paper, we propose a hierarchical feature fusion network, MM-HiFuse, for multi-modal multitask learning to improve the classification accuracy of esophageal cancer staging and differentiation level. The proposed architecture combines low-level to deep-level features of both pathological and endoscopic images to achieve accurate classification results. The key characteristics of MM-HiFuse include: (i) a parallel hierarchy of convolution and self-attention layers specifically designed for pathological and endoscopic image features; (ii) a multi-modal hierarchical feature fusion module (MHF) and a new multitask weighted combination loss function. The benefits of these features are the effective extraction of multi-modal representations at different semantic scales and the mutual complementarity of the multitask learning, leading to improved classification performance. Experimental results demonstrate that MM-HiFuse outperforms single-modal methods in esophageal cancer staging and differentiation classification. Our findings provide evidence for the early diagnosis and accurate staging of esophageal cancer and serve as a new inspiration for the application of multi-modal multitask learning in medical image analysis. Code is available at https://github.com/huoxiangzuo/MM-HiFuse.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"67 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142911458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Link prediction infers the likelihood of a connection between two nodes based on network structural information, aiming to foresee potential latent relationships within the network. In social networks, nodes typically represent users, and links denote the relationships between users. However, some user nodes in social networks are hidden due to unknown or incomplete link information. The prediction of implicit links between these nodes and other user nodes is hampered by incomplete network structures and partial node information, affecting the accuracy of link prediction. To address these issues, this paper introduces an implicit link prediction algorithm based on extended social graph (ILP-ESG). The algorithm completes user attribute information through a multi-task fusion attribute inference framework built on associative learning. Subsequently, an extended social graph is constructed based on user attribute relations, social relations, and discourse interaction relations, enriching user nodes with comprehensive representational information. A semi-supervised graph autoencoder is then employed to extract features from the three types of relationships in the extended social graph, obtaining feature vectors that effectively represent the multidimensional relationship information of users. This facilitates the inference of potential implicit links between nodes and the prediction of hidden user relationships with others. This algorithm is validated on real datasets, and the results show that under the Facebook dataset, the algorithm improves the AUC and Precision metrics by an average of 5.17(%) and 9.25(%) compared to the baseline method, and under the Instagram dataset, it improves by 7.71(%) and 16.16(%), respectively. Good stability and robustness are exhibited, ensuring the accuracy of link prediction.
{"title":"Implicit link prediction based on extended social graph","authors":"Ling Xing, Jinxin Liu, Qi Zhang, Honghai Wu, Huahong Ma, Xiaohui Zhang","doi":"10.1007/s40747-024-01736-1","DOIUrl":"https://doi.org/10.1007/s40747-024-01736-1","url":null,"abstract":"<p>Link prediction infers the likelihood of a connection between two nodes based on network structural information, aiming to foresee potential latent relationships within the network. In social networks, nodes typically represent users, and links denote the relationships between users. However, some user nodes in social networks are hidden due to unknown or incomplete link information. The prediction of implicit links between these nodes and other user nodes is hampered by incomplete network structures and partial node information, affecting the accuracy of link prediction. To address these issues, this paper introduces an implicit link prediction algorithm based on extended social graph (ILP-ESG). The algorithm completes user attribute information through a multi-task fusion attribute inference framework built on associative learning. Subsequently, an extended social graph is constructed based on user attribute relations, social relations, and discourse interaction relations, enriching user nodes with comprehensive representational information. A semi-supervised graph autoencoder is then employed to extract features from the three types of relationships in the extended social graph, obtaining feature vectors that effectively represent the multidimensional relationship information of users. This facilitates the inference of potential implicit links between nodes and the prediction of hidden user relationships with others. This algorithm is validated on real datasets, and the results show that under the Facebook dataset, the algorithm improves the AUC and Precision metrics by an average of 5.17<span>(%)</span> and 9.25<span>(%)</span> compared to the baseline method, and under the Instagram dataset, it improves by 7.71<span>(%)</span> and 16.16<span>(%)</span>, respectively. Good stability and robustness are exhibited, ensuring the accuracy of link prediction.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"178 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1007/s40747-024-01733-4
Jiacheng Huang, Long Chen, Xiaoyin Yi, Ning Yu
Deep neural networks have a recognized susceptibility to diverse forms of adversarial attacks in the field of natural language processing and such a security issue poses substantial security risks and erodes trust in artificial intelligence applications among people who use them. Meanwhile, quantum theory-inspired models that represent word composition as a quantum mixture of words have modeled the non-linear semantic interaction. However, modeling without considering the non-linear semantic interaction between sentences in the current literature does not exploit the potential of the quantum probabilistic description for improving the robustness in adversarial settings. In the present study, a novel quantum theory-inspired inter-sentence semantic interaction model is proposed for enhancing adversarial robustness via fusing contextual semantics. More specifically, it is analyzed why humans are able to understand textual adversarial examples, and a crucial point is observed that humans are adept at associating information from the context to comprehend a paragraph. Guided by this insight, the input text is segmented into subsentences, with the model simulating contextual comprehension by representing each subsentence as a particle within a mixture system, utilizing a density matrix to model inter-sentence interactions. A loss function integrating cross-entropy and orthogonality losses is employed to encourage the orthogonality of measurement states. Comprehensive experiments are conducted to validate the efficacy of proposed methodology, and the results underscore its superiority over baseline models even commercial applications based on large language models in terms of accuracy across diverse adversarial attack scenarios, showing the potential of proposed approach in enhancing the robustness of neural networks under adversarial attacks.
{"title":"Quantum theory-inspired inter-sentence semantic interaction model for textual adversarial defense","authors":"Jiacheng Huang, Long Chen, Xiaoyin Yi, Ning Yu","doi":"10.1007/s40747-024-01733-4","DOIUrl":"https://doi.org/10.1007/s40747-024-01733-4","url":null,"abstract":"<p>Deep neural networks have a recognized susceptibility to diverse forms of adversarial attacks in the field of natural language processing and such a security issue poses substantial security risks and erodes trust in artificial intelligence applications among people who use them. Meanwhile, quantum theory-inspired models that represent word composition as a quantum mixture of words have modeled the non-linear semantic interaction. However, modeling without considering the non-linear semantic interaction between sentences in the current literature does not exploit the potential of the quantum probabilistic description for improving the robustness in adversarial settings. In the present study, a novel quantum theory-inspired inter-sentence semantic interaction model is proposed for enhancing adversarial robustness via fusing contextual semantics. More specifically, it is analyzed why humans are able to understand textual adversarial examples, and a crucial point is observed that humans are adept at associating information from the context to comprehend a paragraph. Guided by this insight, the input text is segmented into subsentences, with the model simulating contextual comprehension by representing each subsentence as a particle within a mixture system, utilizing a density matrix to model inter-sentence interactions. A loss function integrating cross-entropy and orthogonality losses is employed to encourage the orthogonality of measurement states. Comprehensive experiments are conducted to validate the efficacy of proposed methodology, and the results underscore its superiority over baseline models even commercial applications based on large language models in terms of accuracy across diverse adversarial attack scenarios, showing the potential of proposed approach in enhancing the robustness of neural networks under adversarial attacks.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"114 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chronic disease risk prediction based on electronic health record (EHR) is an important research direction of Internet healthcare. Current studies mainly focused on developing well-designed deep learning models to predict the disease risk based on large-scale and high-quality longitudinal EHR data. However, in real-world scenarios, people’s medical habits and low prevalence of diseases often lead to few-shot and imbalanced longitudinal EHR data. This has become an urgent challenge for chronic disease risk prediction based on EHR. Aiming at this challenge, this study combines EHR based pre-training and deep reinforcement learning to develop a novel chronic disease risk prediction model called CDR-Detector. The model adopts the Q-learning architecture with a custom reward function. In order to improve the few-shot learning ability of model, a self-adaptive EHR based pre-training model with two new pre-training tasks is developed to mine valuable dependencies from single-visit EHR data. In order to solve the problem of data imbalance, a dual experience replay strategy is realized to help the model select representative data samples and accelerate model convergence on the imbalanced EHR data. A group of experiments have been conducted on real personal physical examination data. Experimental results show that, compared with the existing state-of-art methods, the proposed CDR-Detector has better accuracy and robustness on the few-shot and imbalanced EHR data.
{"title":"CDR-Detector: a chronic disease risk prediction model combining pre-training with deep reinforcement learning","authors":"Shaofu Lin, Shiwei Zhou, Han Jiao, Mengzhen Wang, Haokang Yan, Peng Dou, Jianhui Chen","doi":"10.1007/s40747-024-01697-5","DOIUrl":"https://doi.org/10.1007/s40747-024-01697-5","url":null,"abstract":"<p>Chronic disease risk prediction based on electronic health record (EHR) is an important research direction of Internet healthcare. Current studies mainly focused on developing well-designed deep learning models to predict the disease risk based on large-scale and high-quality longitudinal EHR data. However, in real-world scenarios, people’s medical habits and low prevalence of diseases often lead to few-shot and imbalanced longitudinal EHR data. This has become an urgent challenge for chronic disease risk prediction based on EHR. Aiming at this challenge, this study combines EHR based pre-training and deep reinforcement learning to develop a novel chronic disease risk prediction model called CDR-Detector. The model adopts the Q-learning architecture with a custom reward function. In order to improve the few-shot learning ability of model, a self-adaptive EHR based pre-training model with two new pre-training tasks is developed to mine valuable dependencies from single-visit EHR data. In order to solve the problem of data imbalance, a dual experience replay strategy is realized to help the model select representative data samples and accelerate model convergence on the imbalanced EHR data. A group of experiments have been conducted on real personal physical examination data. Experimental results show that, compared with the existing state-of-art methods, the proposed CDR-Detector has better accuracy and robustness on the few-shot and imbalanced EHR data.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"23 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1007/s40747-024-01722-7
Yumei Tan, Haiying Xia, Shuxiang Song
Noisy labels are unavoidable in facial expression recognition (FER) task, significantly hindering FER performance in real-world scenarios. Recent advances tackle this problem by leveraging uncertainty for sample partitioning or constructing label distributions. However, these approaches primarily depend on labels, leading to confirmation bias issues and performance degradation. We argue that mining both label-independent features and label-dependent information can mitigate the confirmation bias induced by noisy labels. In this paper, we propose MCR, that is, mining simple yet effective label-free consistency regularization (MCR) to learn robust representations against noisy labels. The proposed MCR incorporates three label-free consistency regularizations: instance-level embedding consistency regularization, pairwise distance consistency regularization, and neighbour consistency regularization. Initially, we employ instance-level embedding consistency regularization to learn instance-level discriminative information from identical facial samples under perturbations in an unsupervised manner. This facilitates the efficacy of mitigating inherent noise in data. Subsequently, a pairwise distance consistency regularization is constructed to regularize the classifier and alleviate bias induced by noisy labels. Finally, we use the neighbour consistency regularization to further strengthen the discriminative capability of the model against noise. Benefiting from the advantages of these three label-free consistency regularizations, MCR can learn discriminative and robust representations against noise. Extensive experimental results demonstrate the superior performance of MCR on three popular in-the-wild facial expression datasets, including RAF-DB, FERPlus, and AffectNet. Moreover, MCR demonstrates superior generalization capability on other datasets with noisy labels, such as CIFAR100 and Tiny-ImageNet.
{"title":"Mining label-free consistency regularization for noisy facial expression recognition","authors":"Yumei Tan, Haiying Xia, Shuxiang Song","doi":"10.1007/s40747-024-01722-7","DOIUrl":"https://doi.org/10.1007/s40747-024-01722-7","url":null,"abstract":"<p>Noisy labels are unavoidable in facial expression recognition (FER) task, significantly hindering FER performance in real-world scenarios. Recent advances tackle this problem by leveraging uncertainty for sample partitioning or constructing label distributions. However, these approaches primarily depend on labels, leading to confirmation bias issues and performance degradation. We argue that mining both label-independent features and label-dependent information can mitigate the confirmation bias induced by noisy labels. In this paper, we propose MCR, that is, mining simple yet effective label-free consistency regularization (MCR) to learn robust representations against noisy labels. The proposed MCR incorporates three label-free consistency regularizations: instance-level embedding consistency regularization, pairwise distance consistency regularization, and neighbour consistency regularization. Initially, we employ instance-level embedding consistency regularization to learn instance-level discriminative information from identical facial samples under perturbations in an unsupervised manner. This facilitates the efficacy of mitigating inherent noise in data. Subsequently, a pairwise distance consistency regularization is constructed to regularize the classifier and alleviate bias induced by noisy labels. Finally, we use the neighbour consistency regularization to further strengthen the discriminative capability of the model against noise. Benefiting from the advantages of these three label-free consistency regularizations, MCR can learn discriminative and robust representations against noise. Extensive experimental results demonstrate the superior performance of MCR on three popular in-the-wild facial expression datasets, including RAF-DB, FERPlus, and AffectNet. Moreover, MCR demonstrates superior generalization capability on other datasets with noisy labels, such as CIFAR100 and Tiny-ImageNet.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"54 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1007/s40747-024-01731-6
Haotian Zhou, Yunhan Lin, Huasong Min
Backward chained behavior trees (BTs) are an approach to generate BTs through backward chaining. Starting from the goal conditions for a task, this approach recursively expands unmet conditions with actions, aiming to achieve those conditions. It provides disturbance rejection for robots at the task level in the sense that if a disturbance changes the state of a condition, this condition will be expanded with new actions in the same way. However, backward chained BTs fail to handle disturbances optimally in multi-goal tasks. In this paper, we address this by formulating it as a global optimization problem and propose an approach termed BCBT-D, which endows backward chained BTs with the ability to achieve globally optimal disturbance rejection. Firstly, we define Implicit Constraint Conditions (ICCs) as the subsequent goals of nodes in BTs. In BCBT-D, ICCs act as global constraints on actions to optimize their execution and as global heuristics for selecting optimal actions that can achieve unmet conditions. We design various multi-goal tasks with time limits and disturbances for comparison. The experimental results demonstrate that our approach ensures the convergence of backward chained BTs and exhibits superior robustness compared to existing approaches.
{"title":"Backward chained behavior trees with deliberation for multi-goal tasks","authors":"Haotian Zhou, Yunhan Lin, Huasong Min","doi":"10.1007/s40747-024-01731-6","DOIUrl":"https://doi.org/10.1007/s40747-024-01731-6","url":null,"abstract":"<p>Backward chained behavior trees (BTs) are an approach to generate BTs through backward chaining. Starting from the goal conditions for a task, this approach recursively expands unmet conditions with actions, aiming to achieve those conditions. It provides disturbance rejection for robots at the task level in the sense that if a disturbance changes the state of a condition, this condition will be expanded with new actions in the same way. However, backward chained BTs fail to handle disturbances optimally in multi-goal tasks. In this paper, we address this by formulating it as a global optimization problem and propose an approach termed BCBT-D, which endows backward chained BTs with the ability to achieve globally optimal disturbance rejection. Firstly, we define Implicit Constraint Conditions (ICCs) as the subsequent goals of nodes in BTs. In BCBT-D, ICCs act as global constraints on actions to optimize their execution and as global heuristics for selecting optimal actions that can achieve unmet conditions. We design various multi-goal tasks with time limits and disturbances for comparison. The experimental results demonstrate that our approach ensures the convergence of backward chained BTs and exhibits superior robustness compared to existing approaches.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"180 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}