Complex & Intelligent Systems最新文献_第7页

A novel transmission-augmented deep unfolding network with consideration of residual recovery 一种考虑残差恢复的新型传输增强深度展开网络

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2025-01-03 DOI: 10.1007/s40747-024-01727-2

Zhijie Zhang, Huang Bai, Ljubiša Stanković, Junmei Sun, Xiumei Li

Compressive sensing (CS) has been widely applied in signal processing field, especially for image reconstruction tasks. CS simplifies the sampling and compression procedures, but leaves the difficulty to the nonlinear reconstruction. Traditional CS reconstruction algorithms are usually iterative, having a complete theoretical foundation. Nevertheless, these iterative algorithms suffer from the high computational complexity. The fashionable deep network-based methods can achieve high-precision CS reconstruction with satisfactory speed but are short of theoretical analysis and interpretability. To combine the merits of the above two kinds of CS methods, the deep unfolding networks (DUNs) have been developed. In this paper, a novel DUN named supervised transmission-augmented network (SuperTA-Net) is proposed. Based on the framework of our previous work PIPO-Net, the multi-channel transmission strategy is put forward to reduce the influence of critical information loss between modules and improve the reliability of data. Besides, in order to avoid the issues such as high information redundancy and high computational burden when too many channels are set, the attention based supervision scheme is presented to dynamically adjust the weight of each channel and remove the redundant information. Furthermore, noting the difference between the original image and the output of SuperTA-Net, the reinforcement network is developed, where the main component called residual recovery network (RR-Net) is lightweight and can be added to reinforce all kinds of CS reconstruction networks. Experiments on reconstructing CS images demonstrate the effectiveness of the proposed networks.

压缩感知技术在信号处理领域，特别是图像重建任务中得到了广泛的应用。CS简化了采样和压缩过程，但给非线性重构留下了困难。传统的CS重建算法通常是迭代的，具有完整的理论基础。然而，这些迭代算法的计算复杂度较高。目前流行的基于深度网络的方法能够以令人满意的速度实现高精度的CS重建，但缺乏理论分析和可解释性。为了结合以上两种CS方法的优点，人们发展了深度展开网络（DUNs）。本文提出了一种新的DUN——监督传输增强网络（SuperTA-Net）。在前人工作的PIPO-Net框架基础上，提出了多通道传输策略，以减少模块间关键信息丢失的影响，提高数据的可靠性。此外，为了避免信道设置过多造成的信息冗余度高、计算量大等问题，提出了基于注意力的监督方案，动态调整各信道的权重，去除冗余信息。此外，注意到原始图像与SuperTA-Net输出的差异，开发了增强网络，其中的主要成分称为残差恢复网络（residual recovery network, RR-Net），重量轻，可以加入增强各种CS重建网络。重建CS图像的实验证明了所提网络的有效性。

{"title":"A novel transmission-augmented deep unfolding network with consideration of residual recovery","authors":"Zhijie Zhang, Huang Bai, Ljubiša Stanković, Junmei Sun, Xiumei Li","doi":"10.1007/s40747-024-01727-2","DOIUrl":"https://doi.org/10.1007/s40747-024-01727-2","url":null,"abstract":"Compressive sensing (CS) has been widely applied in signal processing field, especially for image reconstruction tasks. CS simplifies the sampling and compression procedures, but leaves the difficulty to the nonlinear reconstruction. Traditional CS reconstruction algorithms are usually iterative, having a complete theoretical foundation. Nevertheless, these iterative algorithms suffer from the high computational complexity. The fashionable deep network-based methods can achieve high-precision CS reconstruction with satisfactory speed but are short of theoretical analysis and interpretability. To combine the merits of the above two kinds of CS methods, the deep unfolding networks (DUNs) have been developed. In this paper, a novel DUN named supervised transmission-augmented network (SuperTA-Net) is proposed. Based on the framework of our previous work PIPO-Net, the multi-channel transmission strategy is put forward to reduce the influence of critical information loss between modules and improve the reliability of data. Besides, in order to avoid the issues such as high information redundancy and high computational burden when too many channels are set, the attention based supervision scheme is presented to dynamically adjust the weight of each channel and remove the redundant information. Furthermore, noting the difference between the original image and the output of SuperTA-Net, the reinforcement network is developed, where the main component called residual recovery network (RR-Net) is lightweight and can be added to reinforce all kinds of CS reconstruction networks. Experiments on reconstructing CS images demonstrate the effectiveness of the proposed networks.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"55 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142917324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PLZero: placeholder based approach to generalized zero-shot learning for multi-label recognition in chest radiographs PLZero：基于占位符的胸片多标签识别广义零学习方法

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2025-01-02 DOI: 10.1007/s40747-024-01717-4

Chengrong Yang, Qiwen Jin, Fei Du, Jing Guo, Yujue Zhou

By leveraging large-scale image-text paired data for pre-training, the model can efficiently learn the alignment between images and text, significantly advancing the development of zero-shot learning (ZSL) in the field of intelligent medical image analysis. However, the heterogeneity between cross-modalities, false negatives in image-text pairs, and domain shift phenomena pose challenges, making it difficult for existing methods to effectively learn the deep semantic relationships between images and text. To address these challenges, we propose a multi-label chest X-ray recognition generalized ZSL framework based on placeholder learning, termed PLZero. Specifically, we first introduce a jointed embedding space learning module (JESL) to encourage the model to better capture the diversity among different labels. Secondly, we propose a hallucinated class generation module (HCG), which generates hallucinated classes by feature diffusion and feature fusion based on the visual and semantic features of seen classes, using these hallucinated classes as placeholders for unseen classes. Finally, we propose a hallucinated class-based prototype learning module (HCPL), which leverages contrastive learning to control the distribution of hallucinated classes around seen classes without significant deviation from the original data, encouraging high dispersion of class prototypes for seen classes to create sufficient space for inserting unseen class samples. Extensive experiments demonstrate that our method exhibits sufficient generalization and achieves the best performance across three classic and challenging chest X-ray datasets: NIH Chest X-ray 14, CheXpert, and ChestX-Det10. Notably, our method outperforms others even when the number of unseen classes exceeds the experimental settings of other methods. The codes are available at: https://github.com/jinqiwen/PLZero.

通过利用大规模的图像-文本配对数据进行预训练，该模型可以高效地学习图像和文本之间的对齐，极大地推动了零射击学习（zero-shot learning， ZSL）在智能医学图像分析领域的发展。然而，跨模态之间的异质性、图像-文本对的假阴性和领域转移现象给现有方法带来了挑战，使其难以有效地学习图像和文本之间的深层语义关系。为了解决这些挑战，我们提出了一个基于占位符学习的多标签胸部x射线识别广义ZSL框架，称为PLZero。具体来说，我们首先引入了一个联合嵌入空间学习模块（JESL），以鼓励模型更好地捕获不同标签之间的多样性。其次，我们提出了一个幻觉类生成模块（HCG），该模块基于可见类的视觉和语义特征，通过特征扩散和特征融合生成幻觉类，并将这些幻觉类作为未见类的占位符。最后，我们提出了一个基于幻觉类的原型学习模块（HCPL），它利用对比学习来控制幻觉类在视觉类周围的分布，而不会明显偏离原始数据，鼓励视觉类的类原型高度分散，从而为插入未见类样本创造足够的空间。广泛的实验表明，我们的方法具有足够的泛化性，并在三个经典和具有挑战性的胸部x射线数据集（NIH chest X-ray 14， CheXpert和ChestX-Det10）中实现了最佳性能。值得注意的是，即使未见类的数量超过其他方法的实验设置，我们的方法也优于其他方法。代码可在https://github.com/jinqiwen/PLZero上获得。

{"title":"PLZero: placeholder based approach to generalized zero-shot learning for multi-label recognition in chest radiographs","authors":"Chengrong Yang, Qiwen Jin, Fei Du, Jing Guo, Yujue Zhou","doi":"10.1007/s40747-024-01717-4","DOIUrl":"https://doi.org/10.1007/s40747-024-01717-4","url":null,"abstract":"By leveraging large-scale image-text paired data for pre-training, the model can efficiently learn the alignment between images and text, significantly advancing the development of zero-shot learning (ZSL) in the field of intelligent medical image analysis. However, the heterogeneity between cross-modalities, false negatives in image-text pairs, and domain shift phenomena pose challenges, making it difficult for existing methods to effectively learn the deep semantic relationships between images and text. To address these challenges, we propose a multi-label chest X-ray recognition generalized ZSL framework based on placeholder learning, termed PLZero. Specifically, we first introduce a jointed embedding space learning module (JESL) to encourage the model to better capture the diversity among different labels. Secondly, we propose a hallucinated class generation module (HCG), which generates hallucinated classes by feature diffusion and feature fusion based on the visual and semantic features of seen classes, using these hallucinated classes as placeholders for unseen classes. Finally, we propose a hallucinated class-based prototype learning module (HCPL), which leverages contrastive learning to control the distribution of hallucinated classes around seen classes without significant deviation from the original data, encouraging high dispersion of class prototypes for seen classes to create sufficient space for inserting unseen class samples. Extensive experiments demonstrate that our method exhibits sufficient generalization and achieves the best performance across three classic and challenging chest X-ray datasets: NIH Chest X-ray 14, CheXpert, and ChestX-Det10. Notably, our method outperforms others even when the number of unseen classes exceeds the experimental settings of other methods. The codes are available at: https://github.com/jinqiwen/PLZero.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"27 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142917069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MSM-TDE: multi-scale semantics mining and tiny details enhancement network for retinal vessel segmentation 基于多尺度语义挖掘和微小细节增强网络的视网膜血管分割

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2025-01-02 DOI: 10.1007/s40747-024-01714-7

Hongbin Zhang, Jin Zhang, Xuan Zhong, Ya Feng, Guangli Li, Xiong Li, Jingqin Lv, Donghong Ji

Retinal image segmentation is crucial for the early diagnosis of some diseases like diabetes and hypertension. Current methods face many challenges, such as inadequate multi-scale semantics and insufficient global information. In view of this, we propose a network called multi-scale semantics mining and tiny details enhancement (MSM-TDE). First, a multi-scale feature input module is designed to capture multi-scale semantics information from the source. Then a fresh multi-scale attention guidance module is constructed to mine local multi-scale semantics while a global semantics enhancement module is proposed to extract global multi-scale semantics. Additionally, an auxiliary vessel detail enhancement branch using dynamic snake convolution is built to enhance the tiny vessel details. Extensive experimental results on four public datasets validate the superiority of MSM-TDE, which obtains competitive performance with satisfactory model complexity. Notably, this study provides an innovative idea of multi-scale semantics mining by diverse methods.

视网膜图像分割对于糖尿病、高血压等疾病的早期诊断至关重要。现有的方法面临着多尺度语义不足、全局信息不足等诸多挑战。鉴于此，我们提出了一种多尺度语义挖掘和微小细节增强（MSM-TDE）网络。首先，设计了多尺度特征输入模块，从源中捕获多尺度语义信息。然后构建了新的多尺度注意力引导模块来挖掘局部多尺度语义，同时提出了全局语义增强模块来提取全局多尺度语义。此外，利用动态蛇形卷积建立辅助血管细节增强分支，增强微血管细节。在四个公共数据集上的大量实验结果验证了MSM-TDE的优越性，该方法在获得满意的模型复杂度的同时获得了具有竞争力的性能。值得注意的是，本研究提供了一种采用多种方法进行多尺度语义挖掘的创新思路。

{"title":"MSM-TDE: multi-scale semantics mining and tiny details enhancement network for retinal vessel segmentation","authors":"Hongbin Zhang, Jin Zhang, Xuan Zhong, Ya Feng, Guangli Li, Xiong Li, Jingqin Lv, Donghong Ji","doi":"10.1007/s40747-024-01714-7","DOIUrl":"https://doi.org/10.1007/s40747-024-01714-7","url":null,"abstract":"Retinal image segmentation is crucial for the early diagnosis of some diseases like diabetes and hypertension. Current methods face many challenges, such as inadequate multi-scale semantics and insufficient global information. In view of this, we propose a network called multi-scale semantics mining and tiny details enhancement (MSM-TDE). First, a multi-scale feature input module is designed to capture multi-scale semantics information from the source. Then a fresh multi-scale attention guidance module is constructed to mine local multi-scale semantics while a global semantics enhancement module is proposed to extract global multi-scale semantics. Additionally, an auxiliary vessel detail enhancement branch using dynamic snake convolution is built to enhance the tiny vessel details. Extensive experimental results on four public datasets validate the superiority of MSM-TDE, which obtains competitive performance with satisfactory model complexity. Notably, this study provides an innovative idea of multi-scale semantics mining by diverse methods.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"70 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142917068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

APDL: an adaptive step size method for white-box adversarial attacks APDL：用于白盒对抗性攻击的自适应步长方法

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2025-01-02 DOI: 10.1007/s40747-024-01748-x

Jiale Hu, Xiang Li, Changzheng Liu, Ronghua Zhang, Junwei Tang, Yi Sun, Yuedong Wang

Recent research has shown that deep learning models are vulnerable to adversarial attacks, including gradient attacks, which can lead to incorrect outputs. The existing gradient attack methods typically rely on repetitive multistep strategies to improve their attack success rates, resulting in longer training times and severe overfitting. To address these issues, we propose an adaptive perturbation-based gradient attack method with dual-loss optimization (APDL). This method adaptively adjusts the single-step perturbation magnitude based on an exponential distance function, thereby accelerating the convergence process. APDL achieves convergence in fewer than 10 iterations, outperforming the traditional nonadaptive methods and achieving a high attack success rate with fewer iterations. Furthermore, to increase the transferability of gradient attacks such as APDL across different models and reduce the effects of overfitting on the training model, we introduce a triple-differential logit fusion (TDLF) method grounded in knowledge distillation principles. This approach mitigates the edge effects associated with gradient attacks by adjusting the hardness and softness of labels. Experiments conducted on ImageNet-compatible datasets demonstrate that APDL is significantly faster than the commonly used nonadaptive methods, whereas the TDLF method exhibits strong transferability.

最近的研究表明，深度学习模型容易受到对抗性攻击，包括梯度攻击，这可能导致错误的输出。现有的梯度攻击方法通常依靠重复的多步策略来提高攻击成功率，导致训练时间长，过拟合严重。为了解决这些问题，我们提出了一种基于双损失优化（APDL）的自适应微扰梯度攻击方法。该方法基于指数距离函数自适应调整单步扰动幅度，从而加快了收敛过程。APDL在不到10次迭代的情况下实现了收敛，优于传统的非自适应方法，并且以较少的迭代实现了较高的攻击成功率。此外，为了提高梯度攻击（如APDL）在不同模型之间的可转移性，并减少过拟合对训练模型的影响，我们引入了基于知识蒸馏原理的三微分logit融合（TDLF）方法。这种方法通过调整标签的硬度和柔软度来减轻与梯度攻击相关的边缘效应。在与imagenet兼容的数据集上进行的实验表明，APDL方法比常用的非自适应方法要快得多，而TDLF方法具有较强的可移植性。

{"title":"APDL: an adaptive step size method for white-box adversarial attacks","authors":"Jiale Hu, Xiang Li, Changzheng Liu, Ronghua Zhang, Junwei Tang, Yi Sun, Yuedong Wang","doi":"10.1007/s40747-024-01748-x","DOIUrl":"https://doi.org/10.1007/s40747-024-01748-x","url":null,"abstract":"Recent research has shown that deep learning models are vulnerable to adversarial attacks, including gradient attacks, which can lead to incorrect outputs. The existing gradient attack methods typically rely on repetitive multistep strategies to improve their attack success rates, resulting in longer training times and severe overfitting. To address these issues, we propose an adaptive perturbation-based gradient attack method with dual-loss optimization (APDL). This method adaptively adjusts the single-step perturbation magnitude based on an exponential distance function, thereby accelerating the convergence process. APDL achieves convergence in fewer than 10 iterations, outperforming the traditional nonadaptive methods and achieving a high attack success rate with fewer iterations. Furthermore, to increase the transferability of gradient attacks such as APDL across different models and reduce the effects of overfitting on the training model, we introduce a triple-differential logit fusion (TDLF) method grounded in knowledge distillation principles. This approach mitigates the edge effects associated with gradient attacks by adjusting the hardness and softness of labels. Experiments conducted on ImageNet-compatible datasets demonstrate that APDL is significantly faster than the commonly used nonadaptive methods, whereas the TDLF method exhibits strong transferability.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"34 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142917070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MM-HiFuse: multi-modal multi-task hierarchical feature fusion for esophagus cancer staging and differentiation classification MM-HiFuse：多模式多任务分层特征融合用于食管癌分期与分化分型

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2025-01-02 DOI: 10.1007/s40747-024-01708-5

Xiangzuo Huo, Shengwei Tian, Long Yu, Wendong Zhang, Aolun Li, Qimeng Yang, Jinmiao Song

Esophageal cancer is a globally significant but understudied type of cancer with high mortality rates. The staging and differentiation of esophageal cancer are crucial factors in determining the prognosis and surgical treatment plan for patients, as well as improving their chances of survival. Endoscopy and histopathological examination are considered as the gold standard for esophageal cancer diagnosis. However, some previous studies have employed deep learning-based methods for esophageal cancer analysis, which are limited to single-modal features, resulting in inadequate classification results. In response to these limitations, multi-modal learning has emerged as a promising alternative for medical image analysis tasks. In this paper, we propose a hierarchical feature fusion network, MM-HiFuse, for multi-modal multitask learning to improve the classification accuracy of esophageal cancer staging and differentiation level. The proposed architecture combines low-level to deep-level features of both pathological and endoscopic images to achieve accurate classification results. The key characteristics of MM-HiFuse include: (i) a parallel hierarchy of convolution and self-attention layers specifically designed for pathological and endoscopic image features; (ii) a multi-modal hierarchical feature fusion module (MHF) and a new multitask weighted combination loss function. The benefits of these features are the effective extraction of multi-modal representations at different semantic scales and the mutual complementarity of the multitask learning, leading to improved classification performance. Experimental results demonstrate that MM-HiFuse outperforms single-modal methods in esophageal cancer staging and differentiation classification. Our findings provide evidence for the early diagnosis and accurate staging of esophageal cancer and serve as a new inspiration for the application of multi-modal multitask learning in medical image analysis. Code is available at https://github.com/huoxiangzuo/MM-HiFuse.

食管癌是一种全球重要但研究不足的高死亡率癌症。食管癌的分期和分化是决定患者预后和手术治疗方案的关键因素，也是提高患者生存机会的重要因素。内镜检查和组织病理学检查被认为是食管癌诊断的金标准。然而，之前的一些研究采用基于深度学习的方法进行食管癌分析，这些方法仅限于单模态特征，导致分类结果不充分。针对这些限制，多模态学习已经成为医学图像分析任务的一个有希望的替代方案。本文提出了一种分层特征融合网络MM-HiFuse，用于多模态多任务学习，以提高食管癌分期和分化水平的分类准确率。所提出的架构结合了病理和内镜图像的低层次和深层特征，以获得准确的分类结果。MM-HiFuse的主要特点包括：(i)为病理和内窥镜图像特征设计的卷积层和自关注层的平行层次；（ii）多模态分层特征融合模块（MHF）和一种新的多任务加权组合损失函数。这些特征的好处是在不同的语义尺度上有效地提取多模态表示和多任务学习的互补性，从而提高分类性能。实验结果表明，MM-HiFuse在食管癌分期和分化分类方面优于单模态方法。本研究结果为食管癌的早期诊断和准确分期提供了依据，并为多模态多任务学习在医学图像分析中的应用提供了新的启示。代码可从https://github.com/huoxiangzuo/MM-HiFuse获得。

{"title":"MM-HiFuse: multi-modal multi-task hierarchical feature fusion for esophagus cancer staging and differentiation classification","authors":"Xiangzuo Huo, Shengwei Tian, Long Yu, Wendong Zhang, Aolun Li, Qimeng Yang, Jinmiao Song","doi":"10.1007/s40747-024-01708-5","DOIUrl":"https://doi.org/10.1007/s40747-024-01708-5","url":null,"abstract":"Esophageal cancer is a globally significant but understudied type of cancer with high mortality rates. The staging and differentiation of esophageal cancer are crucial factors in determining the prognosis and surgical treatment plan for patients, as well as improving their chances of survival. Endoscopy and histopathological examination are considered as the gold standard for esophageal cancer diagnosis. However, some previous studies have employed deep learning-based methods for esophageal cancer analysis, which are limited to single-modal features, resulting in inadequate classification results. In response to these limitations, multi-modal learning has emerged as a promising alternative for medical image analysis tasks. In this paper, we propose a hierarchical feature fusion network, MM-HiFuse, for multi-modal multitask learning to improve the classification accuracy of esophageal cancer staging and differentiation level. The proposed architecture combines low-level to deep-level features of both pathological and endoscopic images to achieve accurate classification results. The key characteristics of MM-HiFuse include: (i) a parallel hierarchy of convolution and self-attention layers specifically designed for pathological and endoscopic image features; (ii) a multi-modal hierarchical feature fusion module (MHF) and a new multitask weighted combination loss function. The benefits of these features are the effective extraction of multi-modal representations at different semantic scales and the mutual complementarity of the multitask learning, leading to improved classification performance. Experimental results demonstrate that MM-HiFuse outperforms single-modal methods in esophageal cancer staging and differentiation classification. Our findings provide evidence for the early diagnosis and accurate staging of esophageal cancer and serve as a new inspiration for the application of multi-modal multitask learning in medical image analysis. Code is available at https://github.com/huoxiangzuo/MM-HiFuse.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"67 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142911458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Implicit link prediction based on extended social graph 基于扩展社交图的隐式链接预测

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-31 DOI: 10.1007/s40747-024-01736-1

Ling Xing, Jinxin Liu, Qi Zhang, Honghai Wu, Huahong Ma, Xiaohui Zhang

Link prediction infers the likelihood of a connection between two nodes based on network structural information, aiming to foresee potential latent relationships within the network. In social networks, nodes typically represent users, and links denote the relationships between users. However, some user nodes in social networks are hidden due to unknown or incomplete link information. The prediction of implicit links between these nodes and other user nodes is hampered by incomplete network structures and partial node information, affecting the accuracy of link prediction. To address these issues, this paper introduces an implicit link prediction algorithm based on extended social graph (ILP-ESG). The algorithm completes user attribute information through a multi-task fusion attribute inference framework built on associative learning. Subsequently, an extended social graph is constructed based on user attribute relations, social relations, and discourse interaction relations, enriching user nodes with comprehensive representational information. A semi-supervised graph autoencoder is then employed to extract features from the three types of relationships in the extended social graph, obtaining feature vectors that effectively represent the multidimensional relationship information of users. This facilitates the inference of potential implicit links between nodes and the prediction of hidden user relationships with others. This algorithm is validated on real datasets, and the results show that under the Facebook dataset, the algorithm improves the AUC and Precision metrics by an average of 5.17(%) and 9.25(%) compared to the baseline method, and under the Instagram dataset, it improves by 7.71(%) and 16.16(%), respectively. Good stability and robustness are exhibited, ensuring the accuracy of link prediction.

链路预测是根据网络结构信息推断两个节点之间存在连接的可能性，旨在预测网络中潜在的潜在关系。在社交网络中，节点通常代表用户，链接表示用户之间的关系。然而，社交网络中的一些用户节点由于链接信息未知或不完整而被隐藏。这些节点与其他用户节点之间的隐式链路预测受到网络结构不完整和节点信息不完整的阻碍，影响了链路预测的准确性。为了解决这些问题，本文提出了一种基于扩展社交图（ILP-ESG）的隐含链接预测算法。该算法通过基于联想学习的多任务融合属性推理框架完成用户属性信息。随后，基于用户属性关系、社会关系、话语交互关系构建扩展社交图，为用户节点丰富全面的表征信息。然后利用半监督图自编码器从扩展的社交图中的三种关系中提取特征，得到有效表示用户多维关系信息的特征向量。这有助于推断节点之间潜在的隐式链接，并预测隐藏的用户与其他人的关系。在真实数据集上对该算法进行了验证，结果表明，在Facebook数据集下，该算法的AUC和Precision指标比基线方法平均提高了5.17 (%)和9.25 (%)，在Instagram数据集下，该算法分别提高了7.71 (%)和16.16 (%)。具有良好的稳定性和鲁棒性，保证了链路预测的准确性。

{"title":"Implicit link prediction based on extended social graph","authors":"Ling Xing, Jinxin Liu, Qi Zhang, Honghai Wu, Huahong Ma, Xiaohui Zhang","doi":"10.1007/s40747-024-01736-1","DOIUrl":"https://doi.org/10.1007/s40747-024-01736-1","url":null,"abstract":"Link prediction infers the likelihood of a connection between two nodes based on network structural information, aiming to foresee potential latent relationships within the network. In social networks, nodes typically represent users, and links denote the relationships between users. However, some user nodes in social networks are hidden due to unknown or incomplete link information. The prediction of implicit links between these nodes and other user nodes is hampered by incomplete network structures and partial node information, affecting the accuracy of link prediction. To address these issues, this paper introduces an implicit link prediction algorithm based on extended social graph (ILP-ESG). The algorithm completes user attribute information through a multi-task fusion attribute inference framework built on associative learning. Subsequently, an extended social graph is constructed based on user attribute relations, social relations, and discourse interaction relations, enriching user nodes with comprehensive representational information. A semi-supervised graph autoencoder is then employed to extract features from the three types of relationships in the extended social graph, obtaining feature vectors that effectively represent the multidimensional relationship information of users. This facilitates the inference of potential implicit links between nodes and the prediction of hidden user relationships with others. This algorithm is validated on real datasets, and the results show that under the Facebook dataset, the algorithm improves the AUC and Precision metrics by an average of 5.17(%) and 9.25(%) compared to the baseline method, and under the Instagram dataset, it improves by 7.71(%) and 16.16(%), respectively. Good stability and robustness are exhibited, ensuring the accuracy of link prediction.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"178 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quantum theory-inspired inter-sentence semantic interaction model for textual adversarial defense 量子理论启发的句间语义交互模型用于文本对抗防御

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-30 DOI: 10.1007/s40747-024-01733-4

Jiacheng Huang, Long Chen, Xiaoyin Yi, Ning Yu

Deep neural networks have a recognized susceptibility to diverse forms of adversarial attacks in the field of natural language processing and such a security issue poses substantial security risks and erodes trust in artificial intelligence applications among people who use them. Meanwhile, quantum theory-inspired models that represent word composition as a quantum mixture of words have modeled the non-linear semantic interaction. However, modeling without considering the non-linear semantic interaction between sentences in the current literature does not exploit the potential of the quantum probabilistic description for improving the robustness in adversarial settings. In the present study, a novel quantum theory-inspired inter-sentence semantic interaction model is proposed for enhancing adversarial robustness via fusing contextual semantics. More specifically, it is analyzed why humans are able to understand textual adversarial examples, and a crucial point is observed that humans are adept at associating information from the context to comprehend a paragraph. Guided by this insight, the input text is segmented into subsentences, with the model simulating contextual comprehension by representing each subsentence as a particle within a mixture system, utilizing a density matrix to model inter-sentence interactions. A loss function integrating cross-entropy and orthogonality losses is employed to encourage the orthogonality of measurement states. Comprehensive experiments are conducted to validate the efficacy of proposed methodology, and the results underscore its superiority over baseline models even commercial applications based on large language models in terms of accuracy across diverse adversarial attack scenarios, showing the potential of proposed approach in enhancing the robustness of neural networks under adversarial attacks.

在自然语言处理领域，深度神经网络对各种形式的对抗性攻击具有公认的敏感性，这种安全问题带来了巨大的安全风险，并侵蚀了用户对人工智能应用程序的信任。同时，量子理论启发的模型将词的组成表示为词的量子混合，模拟了非线性语义相互作用。然而，在当前文献中，没有考虑句子之间非线性语义交互的建模并没有利用量子概率描述在对抗环境中提高鲁棒性的潜力。在本研究中，提出了一种新的量子理论启发的句子间语义交互模型，通过融合上下文语义来增强对抗鲁棒性。更具体地说，它分析了为什么人类能够理解文本对抗性示例，并观察到一个关键点，即人类善于将信息与上下文联系起来以理解段落。在这种见解的指导下，输入文本被分割成子句，模型通过将每个子句表示为混合系统中的粒子来模拟上下文理解，利用密度矩阵来模拟句子间的相互作用。利用交叉熵和正交损失的积分损失函数来促进测量状态的正交性。我们进行了全面的实验来验证所提出方法的有效性，结果强调了其在不同对抗性攻击场景下的准确性优于基线模型甚至基于大型语言模型的商业应用，显示了所提出方法在增强神经网络在对抗性攻击下的鲁棒性方面的潜力。

{"title":"Quantum theory-inspired inter-sentence semantic interaction model for textual adversarial defense","authors":"Jiacheng Huang, Long Chen, Xiaoyin Yi, Ning Yu","doi":"10.1007/s40747-024-01733-4","DOIUrl":"https://doi.org/10.1007/s40747-024-01733-4","url":null,"abstract":"Deep neural networks have a recognized susceptibility to diverse forms of adversarial attacks in the field of natural language processing and such a security issue poses substantial security risks and erodes trust in artificial intelligence applications among people who use them. Meanwhile, quantum theory-inspired models that represent word composition as a quantum mixture of words have modeled the non-linear semantic interaction. However, modeling without considering the non-linear semantic interaction between sentences in the current literature does not exploit the potential of the quantum probabilistic description for improving the robustness in adversarial settings. In the present study, a novel quantum theory-inspired inter-sentence semantic interaction model is proposed for enhancing adversarial robustness via fusing contextual semantics. More specifically, it is analyzed why humans are able to understand textual adversarial examples, and a crucial point is observed that humans are adept at associating information from the context to comprehend a paragraph. Guided by this insight, the input text is segmented into subsentences, with the model simulating contextual comprehension by representing each subsentence as a particle within a mixture system, utilizing a density matrix to model inter-sentence interactions. A loss function integrating cross-entropy and orthogonality losses is employed to encourage the orthogonality of measurement states. Comprehensive experiments are conducted to validate the efficacy of proposed methodology, and the results underscore its superiority over baseline models even commercial applications based on large language models in terms of accuracy across diverse adversarial attack scenarios, showing the potential of proposed approach in enhancing the robustness of neural networks under adversarial attacks.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"114 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CDR-Detector: a chronic disease risk prediction model combining pre-training with deep reinforcement learning CDR-Detector：一种结合预训练和深度强化学习的慢性病风险预测模型

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-30 DOI: 10.1007/s40747-024-01697-5

Shaofu Lin, Shiwei Zhou, Han Jiao, Mengzhen Wang, Haokang Yan, Peng Dou, Jianhui Chen

Chronic disease risk prediction based on electronic health record (EHR) is an important research direction of Internet healthcare. Current studies mainly focused on developing well-designed deep learning models to predict the disease risk based on large-scale and high-quality longitudinal EHR data. However, in real-world scenarios, people’s medical habits and low prevalence of diseases often lead to few-shot and imbalanced longitudinal EHR data. This has become an urgent challenge for chronic disease risk prediction based on EHR. Aiming at this challenge, this study combines EHR based pre-training and deep reinforcement learning to develop a novel chronic disease risk prediction model called CDR-Detector. The model adopts the Q-learning architecture with a custom reward function. In order to improve the few-shot learning ability of model, a self-adaptive EHR based pre-training model with two new pre-training tasks is developed to mine valuable dependencies from single-visit EHR data. In order to solve the problem of data imbalance, a dual experience replay strategy is realized to help the model select representative data samples and accelerate model convergence on the imbalanced EHR data. A group of experiments have been conducted on real personal physical examination data. Experimental results show that, compared with the existing state-of-art methods, the proposed CDR-Detector has better accuracy and robustness on the few-shot and imbalanced EHR data.

基于电子病历（EHR）的慢性病风险预测是互联网医疗的一个重要研究方向。目前的研究主要集中在基于大规模、高质量的纵向电子病历数据，开发设计良好的深度学习模型来预测疾病风险。然而，在现实场景中，人们的医疗习惯和疾病的低患病率往往导致少针和不平衡的纵向电子病历数据。这已成为基于电子病历的慢性病风险预测面临的紧迫挑战。针对这一挑战，本研究将基于电子病历的预训练与深度强化学习相结合，开发了一种新的慢性疾病风险预测模型CDR-Detector。模型采用自定义奖励函数的Q-learning架构。为了提高模型的少次学习能力，提出了一种基于自适应EHR的预训练模型，该模型具有两个新的预训练任务，从单次就诊的EHR数据中挖掘有价值的依赖关系。为了解决数据不平衡的问题，实现了双体验重放策略，帮助模型选择有代表性的数据样本，加速模型对不平衡的电子病历数据的收敛。对真实的个人体检数据进行了一组实验。实验结果表明，与现有方法相比，所提出的cdr检测器在少镜头和不平衡电子病历数据上具有更好的准确性和鲁棒性。

{"title":"CDR-Detector: a chronic disease risk prediction model combining pre-training with deep reinforcement learning","authors":"Shaofu Lin, Shiwei Zhou, Han Jiao, Mengzhen Wang, Haokang Yan, Peng Dou, Jianhui Chen","doi":"10.1007/s40747-024-01697-5","DOIUrl":"https://doi.org/10.1007/s40747-024-01697-5","url":null,"abstract":"Chronic disease risk prediction based on electronic health record (EHR) is an important research direction of Internet healthcare. Current studies mainly focused on developing well-designed deep learning models to predict the disease risk based on large-scale and high-quality longitudinal EHR data. However, in real-world scenarios, people’s medical habits and low prevalence of diseases often lead to few-shot and imbalanced longitudinal EHR data. This has become an urgent challenge for chronic disease risk prediction based on EHR. Aiming at this challenge, this study combines EHR based pre-training and deep reinforcement learning to develop a novel chronic disease risk prediction model called CDR-Detector. The model adopts the Q-learning architecture with a custom reward function. In order to improve the few-shot learning ability of model, a self-adaptive EHR based pre-training model with two new pre-training tasks is developed to mine valuable dependencies from single-visit EHR data. In order to solve the problem of data imbalance, a dual experience replay strategy is realized to help the model select representative data samples and accelerate model convergence on the imbalanced EHR data. A group of experiments have been conducted on real personal physical examination data. Experimental results show that, compared with the existing state-of-art methods, the proposed CDR-Detector has better accuracy and robustness on the few-shot and imbalanced EHR data.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"23 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mining label-free consistency regularization for noisy facial expression recognition 基于无标签一致性正则化的噪声面部表情识别

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-30 DOI: 10.1007/s40747-024-01722-7

Yumei Tan, Haiying Xia, Shuxiang Song

Noisy labels are unavoidable in facial expression recognition (FER) task, significantly hindering FER performance in real-world scenarios. Recent advances tackle this problem by leveraging uncertainty for sample partitioning or constructing label distributions. However, these approaches primarily depend on labels, leading to confirmation bias issues and performance degradation. We argue that mining both label-independent features and label-dependent information can mitigate the confirmation bias induced by noisy labels. In this paper, we propose MCR, that is, mining simple yet effective label-free consistency regularization (MCR) to learn robust representations against noisy labels. The proposed MCR incorporates three label-free consistency regularizations: instance-level embedding consistency regularization, pairwise distance consistency regularization, and neighbour consistency regularization. Initially, we employ instance-level embedding consistency regularization to learn instance-level discriminative information from identical facial samples under perturbations in an unsupervised manner. This facilitates the efficacy of mitigating inherent noise in data. Subsequently, a pairwise distance consistency regularization is constructed to regularize the classifier and alleviate bias induced by noisy labels. Finally, we use the neighbour consistency regularization to further strengthen the discriminative capability of the model against noise. Benefiting from the advantages of these three label-free consistency regularizations, MCR can learn discriminative and robust representations against noise. Extensive experimental results demonstrate the superior performance of MCR on three popular in-the-wild facial expression datasets, including RAF-DB, FERPlus, and AffectNet. Moreover, MCR demonstrates superior generalization capability on other datasets with noisy labels, such as CIFAR100 and Tiny-ImageNet.

在面部表情识别（FER）任务中，噪声标签是不可避免的，严重影响了人脸表情识别在现实场景中的性能。最近的进展通过利用样本划分的不确定性或构造标签分布来解决这个问题。然而，这些方法主要依赖于标签，导致确认偏差问题和性能下降。我们认为，挖掘标签无关特征和标签相关信息可以减轻由噪声标签引起的确认偏差。在本文中，我们提出了MCR，即挖掘简单而有效的无标签一致性正则化（MCR）来学习针对噪声标签的鲁棒表示。提出的MCR包含三种无标签一致性正则化：实例级嵌入一致性正则化、两两距离一致性正则化和邻居一致性正则化。首先，我们采用实例级嵌入一致性正则化以无监督的方式从扰动下的相同面部样本中学习实例级判别信息。这有助于有效地减轻数据中的固有噪声。随后，构造了一个两两距离一致性正则化来正则化分类器并减轻噪声标签引起的偏差。最后，利用邻域一致性正则化进一步增强了模型对噪声的判别能力。得益于这三种无标签一致性正则化的优点，MCR可以学习对噪声的判别和鲁棒表示。大量的实验结果表明，MCR在RAF-DB、FERPlus和AffectNet三种常用的野外面部表情数据集上具有优越的性能。此外，MCR在其他带有噪声标签的数据集（如CIFAR100和Tiny-ImageNet）上也表现出了出色的泛化能力。

{"title":"Mining label-free consistency regularization for noisy facial expression recognition","authors":"Yumei Tan, Haiying Xia, Shuxiang Song","doi":"10.1007/s40747-024-01722-7","DOIUrl":"https://doi.org/10.1007/s40747-024-01722-7","url":null,"abstract":"Noisy labels are unavoidable in facial expression recognition (FER) task, significantly hindering FER performance in real-world scenarios. Recent advances tackle this problem by leveraging uncertainty for sample partitioning or constructing label distributions. However, these approaches primarily depend on labels, leading to confirmation bias issues and performance degradation. We argue that mining both label-independent features and label-dependent information can mitigate the confirmation bias induced by noisy labels. In this paper, we propose MCR, that is, mining simple yet effective label-free consistency regularization (MCR) to learn robust representations against noisy labels. The proposed MCR incorporates three label-free consistency regularizations: instance-level embedding consistency regularization, pairwise distance consistency regularization, and neighbour consistency regularization. Initially, we employ instance-level embedding consistency regularization to learn instance-level discriminative information from identical facial samples under perturbations in an unsupervised manner. This facilitates the efficacy of mitigating inherent noise in data. Subsequently, a pairwise distance consistency regularization is constructed to regularize the classifier and alleviate bias induced by noisy labels. Finally, we use the neighbour consistency regularization to further strengthen the discriminative capability of the model against noise. Benefiting from the advantages of these three label-free consistency regularizations, MCR can learn discriminative and robust representations against noise. Extensive experimental results demonstrate the superior performance of MCR on three popular in-the-wild facial expression datasets, including RAF-DB, FERPlus, and AffectNet. Moreover, MCR demonstrates superior generalization capability on other datasets with noisy labels, such as CIFAR100 and Tiny-ImageNet.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"54 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Backward chained behavior trees with deliberation for multi-goal tasks 考虑多目标任务的反向链式行为树

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-30 DOI: 10.1007/s40747-024-01731-6

Haotian Zhou, Yunhan Lin, Huasong Min

Backward chained behavior trees (BTs) are an approach to generate BTs through backward chaining. Starting from the goal conditions for a task, this approach recursively expands unmet conditions with actions, aiming to achieve those conditions. It provides disturbance rejection for robots at the task level in the sense that if a disturbance changes the state of a condition, this condition will be expanded with new actions in the same way. However, backward chained BTs fail to handle disturbances optimally in multi-goal tasks. In this paper, we address this by formulating it as a global optimization problem and propose an approach termed BCBT-D, which endows backward chained BTs with the ability to achieve globally optimal disturbance rejection. Firstly, we define Implicit Constraint Conditions (ICCs) as the subsequent goals of nodes in BTs. In BCBT-D, ICCs act as global constraints on actions to optimize their execution and as global heuristics for selecting optimal actions that can achieve unmet conditions. We design various multi-goal tasks with time limits and disturbances for comparison. The experimental results demonstrate that our approach ensures the convergence of backward chained BTs and exhibits superior robustness compared to existing approaches.

反向链行为树（bt）是一种通过反向链生成bt的方法。该方法从任务的目标条件开始，递归地用动作展开未满足的条件，旨在实现这些条件。它在任务层面上为机器人提供干扰抑制，如果干扰改变了一个条件的状态，这个条件将以同样的方式扩展为新的动作。然而，在多目标任务中，后向链bt不能最优地处理干扰。在本文中，我们通过将其表述为全局优化问题来解决这一问题，并提出了一种称为BCBT-D的方法，该方法赋予反向链bt实现全局最优抑制干扰的能力。首先，我们定义隐式约束条件（ICCs）作为bt中节点的后续目标。在BCBT-D中，icc作为行动的全局约束来优化其执行，并作为全局启发式来选择可以实现未满足条件的最优行动。我们设计了各种具有时间限制和干扰的多目标任务进行比较。实验结果表明，与现有方法相比，我们的方法保证了后向链bt的收敛性，并且具有更好的鲁棒性。

{"title":"Backward chained behavior trees with deliberation for multi-goal tasks","authors":"Haotian Zhou, Yunhan Lin, Huasong Min","doi":"10.1007/s40747-024-01731-6","DOIUrl":"https://doi.org/10.1007/s40747-024-01731-6","url":null,"abstract":"Backward chained behavior trees (BTs) are an approach to generate BTs through backward chaining. Starting from the goal conditions for a task, this approach recursively expands unmet conditions with actions, aiming to achieve those conditions. It provides disturbance rejection for robots at the task level in the sense that if a disturbance changes the state of a condition, this condition will be expanded with new actions in the same way. However, backward chained BTs fail to handle disturbances optimally in multi-goal tasks. In this paper, we address this by formulating it as a global optimization problem and propose an approach termed BCBT-D, which endows backward chained BTs with the ability to achieve globally optimal disturbance rejection. Firstly, we define Implicit Constraint Conditions (ICCs) as the subsequent goals of nodes in BTs. In BCBT-D, ICCs act as global constraints on actions to optimize their execution and as global heuristics for selecting optimal actions that can achieve unmet conditions. We design various multi-goal tasks with time limits and disturbances for comparison. The experimental results demonstrate that our approach ensures the convergence of backward chained BTs and exhibits superior robustness compared to existing approaches.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"180 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0