首页 > 最新文献

CAAI Transactions on Intelligence Technology最新文献

英文 中文
Approximate-Guided Representation Learning in Vision Transformer 视觉转换器中的近似引导表示学习
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-15 DOI: 10.1049/cit2.70041
Kaili Wang, Xinwei Sun, Huijie He, Fenhua Bai, Tao Shen

In recent years, the transformer model has demonstrated excellent performance in computer vision (CV) applications. The key lies in its guided representation attention mechanism, which uses dot-product to depict complex feature relationships, and comprehensively understands the context semantics to obtain feature weights. Then feature enhancement is implemented by guiding the target matrix through feature weights. However, the uncertainty and inconsistency of features are widespread that prone to confusion in the description of relationships within dot-product attention mechanisms. To solve this problem, this paper proposed a novel approximate-guided representation learning methodology for vision transformer. The kernelised matroids fuzzy rough set is defined, wherein the closed sets inside kernelised fuzzy information granules of matroids structures can constitute the subspace of lower approximation in rough sets. Thus, the kernel relation is employed to characterise image feature granules that will be reconstructed according to the independent set in matroids theory. Then, according to the characteristics of the closed set within matroids, the feature attention weight is formed by using the lower approximation to realise the approximate guidance of features. The approximate-guided representation mechanism can be flexibly deployed as a plug-and-play component in a wide range of CV tasks. Extensive empirical results demonstrate that the proposed method outperforms the majority of advanced prevalent models, especially in terms of robustness.

近年来,变压器模型在计算机视觉(CV)应用中表现出优异的性能。关键在于其引导表示注意机制,利用点积描述复杂的特征关系,综合理解上下文语义获取特征权重。然后通过特征权值引导目标矩阵实现特征增强。然而,特征的不确定性和不一致性是普遍存在的,这容易在描述点积注意机制中的关系时造成混乱。为了解决这一问题,本文提出了一种新的视觉转换器近似引导表示学习方法。定义了核化拟阵模糊粗糙集,其中拟阵结构的核化模糊信息粒内的闭集可以构成粗糙集中下逼近的子空间。因此,利用核关系对图像特征颗粒进行表征,并根据拟阵理论中的独立集进行重构。然后,根据拟阵内闭集的特征,利用下逼近形成特征关注权,实现特征的近似引导;近似引导表示机制可以作为即插即用组件灵活地部署在广泛的CV任务中。大量的实证结果表明,该方法优于大多数先进的流行模型,特别是在鲁棒性方面。
{"title":"Approximate-Guided Representation Learning in Vision Transformer","authors":"Kaili Wang,&nbsp;Xinwei Sun,&nbsp;Huijie He,&nbsp;Fenhua Bai,&nbsp;Tao Shen","doi":"10.1049/cit2.70041","DOIUrl":"https://doi.org/10.1049/cit2.70041","url":null,"abstract":"<p>In recent years, the transformer model has demonstrated excellent performance in computer vision (CV) applications. The key lies in its guided representation attention mechanism, which uses dot-product to depict complex feature relationships, and comprehensively understands the context semantics to obtain feature weights. Then feature enhancement is implemented by guiding the target matrix through feature weights. However, the uncertainty and inconsistency of features are widespread that prone to confusion in the description of relationships within dot-product attention mechanisms. To solve this problem, this paper proposed a novel approximate-guided representation learning methodology for vision transformer. The kernelised matroids fuzzy rough set is defined, wherein the closed sets inside kernelised fuzzy information granules of matroids structures can constitute the subspace of lower approximation in rough sets. Thus, the kernel relation is employed to characterise image feature granules that will be reconstructed according to the independent set in matroids theory. Then, according to the characteristics of the closed set within matroids, the feature attention weight is formed by using the lower approximation to realise the approximate guidance of features. The approximate-guided representation mechanism can be flexibly deployed as a plug-and-play component in a wide range of CV tasks. Extensive empirical results demonstrate that the proposed method outperforms the majority of advanced prevalent models, especially in terms of robustness.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1459-1477"},"PeriodicalIF":7.3,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70041","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RDHNet: Reversible Data Hiding Method for Securing Colour Images Using AlexNet and Watershed Transform in a Fusion Domain RDHNet:利用AlexNet和融合域分水岭变换保护彩色图像的可逆数据隐藏方法
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-13 DOI: 10.1049/cit2.70038
Mohamed Meselhy Eltoukhy, Faisal S. Alsubaei, Mostafa M. Abdel-Aziz, Khalid M. Hosny

Medical images play a crucial role in diagnosis, treatment procedures and overall healthcare. Nevertheless, they also pose substantial risks to patient confidentiality and safety. Safeguarding the confidentiality of patients' data has become an urgent and practical concern. We present a novel approach for reversible data hiding for colour medical images. In a hybrid domain, we employ AlexNet, tuned with watershed transform (WST) and L-shaped fractal Tromino encryption. Our approach commences by constructing the host image's feature vector using a pre-trained AlexNet model. Next, we use the watershed transform to convert the extracted feature vector into a vector for a topographic map, which we then encrypt using an L-shaped fractal Tromino cryptosystem. We embed the secret image in the transformed image vector using a histogram-based embedding strategy to enhance payload and visual fidelity. When there are no attacks, the RDHNet exhibits robust performance, can be reversed to the original image and maintains a visually appealing stego image, with an average PSNR of 73.14 dB, an SSIM of 0.9999 and perfect values of NC = 1 and BER = 0 under normal conditions. The proposed RDHNet demonstrates a robust ability to withstand detrimental geometric and noise-adding attacks as well as various steganalysis methods. Furthermore, our RDHNet method initiative demonstrates efficacy in tackling contemporary confidentiality issues.

医学图像在诊断、治疗程序和整体医疗保健中起着至关重要的作用。然而,它们也对患者的保密和安全构成重大风险。保护患者数据的机密性已成为一个紧迫而现实的问题。提出了一种彩色医学图像可逆数据隐藏的新方法。在混合域,我们使用AlexNet,调优分水岭变换(WST)和l形分形Tromino加密。我们的方法首先使用预训练的AlexNet模型构建主机图像的特征向量。接下来,我们使用分水岭变换将提取的特征向量转换为地形图的向量,然后我们使用l形分形特罗米诺密码系统对其进行加密。我们使用基于直方图的嵌入策略将秘密图像嵌入变换后的图像向量中,以增强有效载荷和视觉保真度。在没有攻击的情况下,RDHNet表现出鲁棒性,可以还原原始图像并保持视觉上吸引人的隐写图像,在正常情况下,平均PSNR为73.14 dB, SSIM为0.9999,NC = 1和BER = 0的完美值。所提出的RDHNet显示出抵御有害几何和噪声添加攻击以及各种隐写分析方法的强大能力。此外,我们的RDHNet方法倡议在解决当代保密问题方面显示出有效性。
{"title":"RDHNet: Reversible Data Hiding Method for Securing Colour Images Using AlexNet and Watershed Transform in a Fusion Domain","authors":"Mohamed Meselhy Eltoukhy,&nbsp;Faisal S. Alsubaei,&nbsp;Mostafa M. Abdel-Aziz,&nbsp;Khalid M. Hosny","doi":"10.1049/cit2.70038","DOIUrl":"https://doi.org/10.1049/cit2.70038","url":null,"abstract":"<p>Medical images play a crucial role in diagnosis, treatment procedures and overall healthcare. Nevertheless, they also pose substantial risks to patient confidentiality and safety. Safeguarding the confidentiality of patients' data has become an urgent and practical concern. We present a novel approach for reversible data hiding for colour medical images. In a hybrid domain, we employ AlexNet, tuned with watershed transform (WST) and L-shaped fractal Tromino encryption. Our approach commences by constructing the host image's feature vector using a pre-trained AlexNet model. Next, we use the watershed transform to convert the extracted feature vector into a vector for a topographic map, which we then encrypt using an L-shaped fractal Tromino cryptosystem. We embed the secret image in the transformed image vector using a histogram-based embedding strategy to enhance payload and visual fidelity. When there are no attacks, the RDHNet exhibits robust performance, can be reversed to the original image and maintains a visually appealing stego image, with an average PSNR of 73.14 dB, an SSIM of 0.9999 and perfect values of NC = 1 and BER = 0 under normal conditions. The proposed RDHNet demonstrates a robust ability to withstand detrimental geometric and noise-adding attacks as well as various steganalysis methods. Furthermore, our RDHNet method initiative demonstrates efficacy in tackling contemporary confidentiality issues.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1422-1445"},"PeriodicalIF":7.3,"publicationDate":"2025-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving 3D Object Detection in Neural Radiance Fields With Channel Attention 利用通道关注改进神经辐射场中的三维目标检测
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-10 DOI: 10.1049/cit2.70045
Minling Zhu, Yadong Gong, Dongbing Gu, Chunwei Tian

In recent years, 3D object detection using neural radiance fields (NeRF) has advanced significantly, yet challenges remain in effectively utilising the density field. Current methods often treat NeRF as a geometry learning tool or rely on volume rendering, neglecting the density field's potential and feature dependencies. To address this, we propose NeRF-C3D, a novel framework incorporating a multi-scale feature fusion module with channel attention (MFCA). MFCA leverages channel attention to model feature dependencies, dynamically adjusting channel weights during fusion to enhance important features and suppress redundancy. This optimises density field representation and improves feature discriminability. Experiments on 3D-FRONT, Hypersim, and ScanNet demonstrate NeRF-C3D's superior performance validating MFCA's effectiveness in capturing feature relationships and showcasing its innovation in NeRF-based 3D detection.

近年来,利用神经辐射场(NeRF)进行三维目标检测取得了显著进展,但在有效利用密度场方面仍存在挑战。目前的方法通常将NeRF视为几何学习工具或依赖于体绘制,而忽略了密度场的潜力和特征依赖性。为了解决这个问题,我们提出了NeRF-C3D,这是一个结合多尺度特征融合模块和信道注意(MFCA)的新框架。MFCA利用信道对模型特征依赖性的关注,在融合过程中动态调整信道权重以增强重要特征并抑制冗余。这优化了密度场表示,提高了特征的可分辨性。在3D- front、Hypersim和ScanNet上的实验证明了NeRF-C3D的卓越性能,验证了MFCA在捕获特征关系方面的有效性,并展示了其在基于nerf的3D检测方面的创新。
{"title":"Improving 3D Object Detection in Neural Radiance Fields With Channel Attention","authors":"Minling Zhu,&nbsp;Yadong Gong,&nbsp;Dongbing Gu,&nbsp;Chunwei Tian","doi":"10.1049/cit2.70045","DOIUrl":"https://doi.org/10.1049/cit2.70045","url":null,"abstract":"<p>In recent years, 3D object detection using neural radiance fields (NeRF) has advanced significantly, yet challenges remain in effectively utilising the density field. Current methods often treat NeRF as a geometry learning tool or rely on volume rendering, neglecting the density field's potential and feature dependencies. To address this, we propose NeRF-C3D, a novel framework incorporating a multi-scale feature fusion module with channel attention (MFCA). MFCA leverages channel attention to model feature dependencies, dynamically adjusting channel weights during fusion to enhance important features and suppress redundancy. This optimises density field representation and improves feature discriminability. Experiments on 3D-FRONT, Hypersim, and ScanNet demonstrate NeRF-C3D's superior performance validating MFCA's effectiveness in capturing feature relationships and showcasing its innovation in NeRF-based 3D detection.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1446-1458"},"PeriodicalIF":7.3,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70045","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning Approach for Automated Estimation of 3D Vertebral Orientation of the Lumbar Spine 腰椎三维椎体方向自动估计的深度学习方法
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-10 DOI: 10.1049/cit2.70033
Nanfang Xu, Shanshan Liu, Yuepeng Chen, Kailai Zhang, Chenyi Guo, Cheng Zhang, Fei Xu, Qifeng Lan, Wanyi Fu, Xingyu Zhou, Bo Zhao, Aodong He, Xiangling Fu, Ji Wu, Weishi Li

Lumbar degenerative disc diseases constitute a major contributor to lower back pain. In pursuit of an enhanced understanding of lumbar degenerative pathology and the development of more effective treatment modalities, the application of precise measurement techniques for lumbar segment kinematics is imperative. This study aims to pioneer a novel automated lumbar spine orientation estimation method using deep learning techniques, to facilitate the automatic 2D–3D pre-registration of the lumbar spine during physiological movements, to enhance the efficiency of image registration and the accuracy of spinal segment kinematic measurements. A total of 12 asymptomatic volunteers were enrolled and captured in 2 oblique views with 7 different postures. Images were used for deep learning model development training and evaluation. The model was composed of a segmentation module using Mask R-CNN and an estimation module using ResNet50 architecture with a Squeeze-and-Excitation module. The cosine value of the angle between the prediction vector and the vector of ground truth was used to quantify the model performance. Data from another two prospective recruited asymptomatic volunteers were used to compare the time cost between model-assisted registration and manual registration without a model. The cosine values of vector deviation angles at three axes in the cartesian coordinate system were 0.9667 ± 0.004, 0.9593 ± 0.0047 and 0.9828 ± 0.0025, respectively. The value of the angular deviation between the intermediate vector obtained by utilising the three direction vectors and ground truth was 10.7103 ± 0.7466. Results show the consistency and reliability of the model's predictions across different experiments and axes and demonstrate that our approach significantly reduces the registration time (3.47 ± 0.90 min vs. 8.10 ± 1.60 min, p < 0.001), enhances the efficiency, and expands its broader utilisation of clinical research about kinematic measurements.

腰椎间盘退行性疾病是造成腰痛的主要原因。为了提高对腰椎退行性病理的理解和发展更有效的治疗方式,腰椎节段运动学精确测量技术的应用是必要的。本研究旨在利用深度学习技术,开拓一种新的自动腰椎方位估计方法,促进腰椎生理运动过程中2D-3D的自动预配准,提高图像配准的效率和脊柱节段运动测量的准确性。共有12名无症状的志愿者被招募,并在2个斜位视图中以7种不同的姿势拍摄。图像用于深度学习模型开发、训练和评估。该模型由一个使用Mask R-CNN的分割模块和一个使用ResNet50架构的估计模块组成,该模块带有一个挤压和激励模块。利用预测向量与地面真值向量夹角的余弦值来量化模型的性能。来自另外两名预期招募的无症状志愿者的数据被用来比较模型辅助注册和没有模型的手动注册之间的时间成本。三轴矢量偏差角在直角坐标系下的余弦值分别为0.9667±0.004、0.9593±0.0047和0.9828±0.0025。利用三个方向矢量得到的中间矢量与地面真值的角偏差值为10.7103±0.7466。结果显示了模型在不同实验和轴上预测的一致性和可靠性,并表明我们的方法显着缩短了配准时间(3.47±0.90 min vs. 8.10±1.60 min, p < 0.001),提高了效率,并扩大了其在运动学测量临床研究中的广泛应用。
{"title":"Deep Learning Approach for Automated Estimation of 3D Vertebral Orientation of the Lumbar Spine","authors":"Nanfang Xu,&nbsp;Shanshan Liu,&nbsp;Yuepeng Chen,&nbsp;Kailai Zhang,&nbsp;Chenyi Guo,&nbsp;Cheng Zhang,&nbsp;Fei Xu,&nbsp;Qifeng Lan,&nbsp;Wanyi Fu,&nbsp;Xingyu Zhou,&nbsp;Bo Zhao,&nbsp;Aodong He,&nbsp;Xiangling Fu,&nbsp;Ji Wu,&nbsp;Weishi Li","doi":"10.1049/cit2.70033","DOIUrl":"https://doi.org/10.1049/cit2.70033","url":null,"abstract":"<p>Lumbar degenerative disc diseases constitute a major contributor to lower back pain. In pursuit of an enhanced understanding of lumbar degenerative pathology and the development of more effective treatment modalities, the application of precise measurement techniques for lumbar segment kinematics is imperative. This study aims to pioneer a novel automated lumbar spine orientation estimation method using deep learning techniques, to facilitate the automatic 2D–3D pre-registration of the lumbar spine during physiological movements, to enhance the efficiency of image registration and the accuracy of spinal segment kinematic measurements. A total of 12 asymptomatic volunteers were enrolled and captured in 2 oblique views with 7 different postures. Images were used for deep learning model development training and evaluation. The model was composed of a segmentation module using Mask R-CNN and an estimation module using ResNet50 architecture with a Squeeze-and-Excitation module. The cosine value of the angle between the prediction vector and the vector of ground truth was used to quantify the model performance. Data from another two prospective recruited asymptomatic volunteers were used to compare the time cost between model-assisted registration and manual registration without a model. The cosine values of vector deviation angles at three axes in the cartesian coordinate system were 0.9667 ± 0.004, 0.9593 ± 0.0047 and 0.9828 ± 0.0025, respectively. The value of the angular deviation between the intermediate vector obtained by utilising the three direction vectors and ground truth was 10.7103 ± 0.7466. Results show the consistency and reliability of the model's predictions across different experiments and axes and demonstrate that our approach significantly reduces the registration time (3.47 ± 0.90 min vs. 8.10 ± 1.60 min, <i>p</i> &lt; 0.001), enhances the efficiency, and expands its broader utilisation of clinical research about kinematic measurements.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1306-1319"},"PeriodicalIF":7.3,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70033","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VSMI 2 ${text{VSMI}}^{mathbf{2}}$ -PANet: Versatile Scale-Malleable Image Integration and Patch Wise Attention Network With Transformer for Lung Tumour Segmentation Using Multi-Modal Imaging Techniques VSMI 2 ${text{VSMI}}^{mathbf{2}}$ -PANet:基于多模态成像技术的多功能尺度可伸缩图像集成和补丁智能关注网络
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-09 DOI: 10.1049/cit2.70039
Nayef Alqahtani, Arfat Ahmad Khan, Rakesh Kumar Mahendran, Muhammad Faheem

Lung cancer (LC) is a major cancer which accounts for higher mortality rates worldwide. Doctors utilise many imaging modalities for identifying lung tumours and their severity in earlier stages. Nowadays, machine learning (ML) and deep learning (DL) methodologies are utilised for the robust detection and prediction of lung tumours. Recently, multi modal imaging emerged as a robust technique for lung tumour detection by combining various imaging features. To cope with that, we propose a novel multi modal imaging technique named versatile scale malleable image integration and patch wise attention network (VSMI2PANet ${text{VSMI}}^{2}-text{PANet}$) which adopts three imaging modalities named computed tomography (CT), magnetic resonance imaging (MRI) and single photon emission computed tomography (SPECT). The designed model accepts input from CT and MRI images and passes it to the VSMI2 ${text{VSMI}}^{2}$ module that is composed of three sub-modules named image cropping module, scale malleable convolution layer (SMCL) and PANet module. CT and MRI images are subjected to image cropping module in a parallel manner to crop the meaningful image patches and provide them to the SMCL module. The SMCL module is composed of adaptive convolutional layers that investigate those patches in a parallel manner by preserving the spatial information. The output from the SMCL is then fused and provided to the PANet module. The PANet module examines the fused patches by analysing its height, width and channels of the image patch. As a result, it provides an output as high-resolution spatial attention maps indicating the location of suspicious tumours. The high-resolution spatial attention maps are then provided as an input to the backbone module which uses light wave transformer (LWT) for segmenting the lung tumours into three classes, such as normal, benign and malignant. In addition, the LWT also accepts SPECT image as input for capturing the variations precisely to segment the lung tumours. The performance of the proposed model is validated using several performance metrics, such as accuracy, precision, recall, F1-score and AUC curve, and the results show that the proposed work outperforms the existing approaches.

肺癌(LC)是世界范围内死亡率较高的主要癌症。医生利用许多成像方式来识别早期阶段的肺肿瘤及其严重程度。如今,机器学习(ML)和深度学习(DL)方法被用于肺肿瘤的鲁棒检测和预测。近年来,多模态影像结合多种影像特征,成为一种强大的肺部肿瘤检测技术。为了解决这一问题,我们提出了一种新的多模态成像技术,称为通用尺度可扩展图像集成和补丁智能关注网络(VSMI 2−PANet ${text{VSMI}}^{2}-text{PANet}$),该技术采用三种成像模式,称为计算机断层扫描(CT)。磁共振成像(MRI)和单光子发射计算机断层扫描(SPECT)。所设计的模型接受CT和MRI图像的输入并传递给VSMI 2 ${text{VSMI}}^{2}$模块,该模块由图像裁剪模块、尺度可延展卷积层(SMCL)和PANet模块三个子模块组成。CT和MRI图像并行进行图像裁剪模块,裁剪出有意义的图像斑块,提供给SMCL模块。SMCL模块由自适应卷积层组成,这些层通过保留空间信息以并行方式研究这些斑块。然后将SMCL的输出融合并提供给PANet模块。PANet模块通过分析图像补丁的高度、宽度和通道来检查融合的补丁。因此,它提供了一个高分辨率的空间注意图的输出,表明可疑肿瘤的位置。然后将高分辨率空间注意力图作为输入提供给主干模块,主干模块使用光波转换器(LWT)将肺肿瘤分为三类,如正常、良性和恶性。此外,LWT还接受SPECT图像作为输入,以准确捕获变化以分割肺肿瘤。利用准确度、精密度、召回率、f1分数和AUC曲线等性能指标对所提模型的性能进行了验证,结果表明所提模型的性能优于现有方法。
{"title":"VSMI\u0000 2\u0000 \u0000 \u0000 ${text{VSMI}}^{mathbf{2}}$\u0000 -PANet: Versatile Scale-Malleable Image Integration and Patch Wise Attention Network With Transformer for Lung Tumour Segmentation Using Multi-Modal Imaging Techniques","authors":"Nayef Alqahtani,&nbsp;Arfat Ahmad Khan,&nbsp;Rakesh Kumar Mahendran,&nbsp;Muhammad Faheem","doi":"10.1049/cit2.70039","DOIUrl":"https://doi.org/10.1049/cit2.70039","url":null,"abstract":"<p>Lung cancer (LC) is a major cancer which accounts for higher mortality rates worldwide. Doctors utilise many imaging modalities for identifying lung tumours and their severity in earlier stages. Nowadays, machine learning (ML) and deep learning (DL) methodologies are utilised for the robust detection and prediction of lung tumours. Recently, multi modal imaging emerged as a robust technique for lung tumour detection by combining various imaging features. To cope with that, we propose a novel multi modal imaging technique named versatile scale malleable image integration and patch wise attention network (<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mtext>VSMI</mtext>\u0000 <mn>2</mn>\u0000 </msup>\u0000 <mo>−</mo>\u0000 <mtext>PANet</mtext>\u0000 </mrow>\u0000 <annotation> ${text{VSMI}}^{2}-text{PANet}$</annotation>\u0000 </semantics></math>) which adopts three imaging modalities named computed tomography (CT), magnetic resonance imaging (MRI) and single photon emission computed tomography (SPECT). The designed model accepts input from CT and MRI images and passes it to the <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mtext>VSMI</mtext>\u0000 <mn>2</mn>\u0000 </msup>\u0000 </mrow>\u0000 <annotation> ${text{VSMI}}^{2}$</annotation>\u0000 </semantics></math> module that is composed of three sub-modules named image cropping module, scale malleable convolution layer (SMCL) and PANet module. CT and MRI images are subjected to image cropping module in a parallel manner to crop the meaningful image patches and provide them to the SMCL module. The SMCL module is composed of adaptive convolutional layers that investigate those patches in a parallel manner by preserving the spatial information. The output from the SMCL is then fused and provided to the PANet module. The PANet module examines the fused patches by analysing its height, width and channels of the image patch. As a result, it provides an output as high-resolution spatial attention maps indicating the location of suspicious tumours. The high-resolution spatial attention maps are then provided as an input to the backbone module which uses light wave transformer (LWT) for segmenting the lung tumours into three classes, such as normal, benign and malignant. In addition, the LWT also accepts SPECT image as input for capturing the variations precisely to segment the lung tumours. The performance of the proposed model is validated using several performance metrics, such as accuracy, precision, recall, <i>F</i>1-score and AUC curve, and the results show that the proposed work outperforms the existing approaches.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1376-1393"},"PeriodicalIF":7.3,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70039","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring a Hybrid Convolutional Framework for Camouflage Target Classification in Land-Based Hyperspectral Images 基于混合卷积框架的陆基高光谱图像伪装目标分类研究
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-09 DOI: 10.1049/cit2.70051
Jiale Zhao, Dan Fang, Jianghu Deng, Jiaju Ying, Yudan Chen, Guanglong Wang, Bing Zhou

In recent years, camouflage technology has evolved from single-spectral-band applications to multifunctional and multispectral implementations. Hyperspectral imaging has emerged as a powerful technique for target identification due to its capacity to capture both spectral and spatial information. The advancement of imaging spectroscopy technology has significantly enhanced reconnaissance capabilities, offering substantial advantages in camouflaged target classification and detection. However, the increasing spectral similarity between camouflaged targets and their backgrounds has significantly compromised detection performance in specific scenarios. Conventional feature extraction methods are often limited to single, shallow spectral or spatial features, failing to extract deep features and consequently yielding suboptimal classification accuracy. To address these limitations, this study proposes an innovative 3D-2D convolutional neural networks architecture incorporating depthwise separable convolution (DSC) and attention mechanisms (AM). The framework first applies dimensionality reduction to hyperspectral images and extracts preliminary spectral-spatial features. It then employs an alternating combination of 3D and 2D convolutions for deep feature extraction. For target classification, the LogSoftmax function is implemented. The integration of depthwise separable convolution not only enhances classification accuracy but also substantially reduces model parameters. Furthermore, the attention mechanisms significantly improve the network's ability to represent multidimensional features. Extensive experiments were conducted on a custom land-based hyperspectral image dataset. The results demonstrate remarkable classification accuracy: 98.74% for grassland camouflage, 99.13% for dead leaf camouflage and 98.94% for wild grass camouflage. Comparative analysis shows that the proposed framework is outstanding in terms of classification accuracy and robustness for camouflage target classification.

近年来,伪装技术已经从单光谱波段应用发展到多功能和多光谱实现。高光谱成像由于能够同时捕获光谱和空间信息,已成为一种强大的目标识别技术。成像光谱技术的进步大大提高了侦察能力,在伪装目标的分类和探测方面提供了实质性的优势。然而,伪装目标与其背景之间不断增加的光谱相似性在特定场景下显著降低了探测性能。传统的特征提取方法往往局限于单一的、浅层的光谱或空间特征,无法提取深层特征,从而产生次优的分类精度。为了解决这些限制,本研究提出了一种创新的3D-2D卷积神经网络架构,该架构结合了深度可分离卷积(DSC)和注意机制(AM)。该框架首先对高光谱图像进行降维,提取初步的光谱空间特征。然后,它采用3D和2D卷积的交替组合进行深度特征提取。对于目标分类,实现LogSoftmax函数。深度可分卷积的集成不仅提高了分类精度,而且大大减少了模型参数。此外,注意机制显著提高了网络表示多维特征的能力。在自定义陆基高光谱图像数据集上进行了大量实验。结果表明:草地伪装的分类准确率为98.74%,枯叶伪装为99.13%,野草伪装为98.94%。对比分析表明,该框架在伪装目标分类精度和鲁棒性方面具有突出的优势。
{"title":"Exploring a Hybrid Convolutional Framework for Camouflage Target Classification in Land-Based Hyperspectral Images","authors":"Jiale Zhao,&nbsp;Dan Fang,&nbsp;Jianghu Deng,&nbsp;Jiaju Ying,&nbsp;Yudan Chen,&nbsp;Guanglong Wang,&nbsp;Bing Zhou","doi":"10.1049/cit2.70051","DOIUrl":"https://doi.org/10.1049/cit2.70051","url":null,"abstract":"<p>In recent years, camouflage technology has evolved from single-spectral-band applications to multifunctional and multispectral implementations. Hyperspectral imaging has emerged as a powerful technique for target identification due to its capacity to capture both spectral and spatial information. The advancement of imaging spectroscopy technology has significantly enhanced reconnaissance capabilities, offering substantial advantages in camouflaged target classification and detection. However, the increasing spectral similarity between camouflaged targets and their backgrounds has significantly compromised detection performance in specific scenarios. Conventional feature extraction methods are often limited to single, shallow spectral or spatial features, failing to extract deep features and consequently yielding suboptimal classification accuracy. To address these limitations, this study proposes an innovative 3D-2D convolutional neural networks architecture incorporating depthwise separable convolution (DSC) and attention mechanisms (AM). The framework first applies dimensionality reduction to hyperspectral images and extracts preliminary spectral-spatial features. It then employs an alternating combination of 3D and 2D convolutions for deep feature extraction. For target classification, the LogSoftmax function is implemented. The integration of depthwise separable convolution not only enhances classification accuracy but also substantially reduces model parameters. Furthermore, the attention mechanisms significantly improve the network's ability to represent multidimensional features. Extensive experiments were conducted on a custom land-based hyperspectral image dataset. The results demonstrate remarkable classification accuracy: 98.74% for grassland camouflage, 99.13% for dead leaf camouflage and 98.94% for wild grass camouflage. Comparative analysis shows that the proposed framework is outstanding in terms of classification accuracy and robustness for camouflage target classification.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1559-1572"},"PeriodicalIF":7.3,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Brain MRI Super-Resolution Through Multi-Slice Aware Matching and Fusion 多层感知匹配融合增强脑MRI超分辨率
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-04 DOI: 10.1049/cit2.70032
Jie Xiang, Ang Zhao, Xia Li, Xubin Wu, Yanqing Dong, Yan Niu, Xin Wen, Yidi Li

In clinical diagnosis, magnetic resonance imaging (MRI) allows different contrast images to be obtained. High-resolution (HR) MRI presents fine anatomical structures, which is important for improving the efficiency of expert diagnosis and realising smart healthcare. However, due to the cost of scanning equipment and the time required for scanning, obtaining an HR brain MRI is quite challenging. Therefore, to improve the quality of images, reference-based super-resolution technology has come into existence. Nevertheless, the existing methods still have some drawbacks: (1) The advantages of different contrast images are not fully utilised. (2) The slice-by-slice scanning nature of magnetic resonance imaging is not considered. (3) The ability to capture contextual information and to match and fuse multi-scale, multi-contrast features is lacking. In this paper, we propose the multi-slice aware matching and fusion (MSAMF) network, which makes full use of multi-slice reference images information by introducing a multi-slice aware module and multi-scale matching strategy to capture corresponding contextual information in reference features at other scales. To further integrate matching features, a multi-scale fusion mechanism is also designed to progressively fuse multi-scale matching features, thereby generating more detailed super-resolution images. The experimental results support the benefits of our network in enhancing the quality of brain MRI reconstruction.

在临床诊断中,磁共振成像(MRI)可以获得不同的对比图像。高分辨率(HR) MRI显示出精细的解剖结构,对于提高专家诊断效率和实现智能医疗具有重要意义。然而,由于扫描设备的成本和扫描所需的时间,获得HR脑MRI是相当具有挑战性的。因此,为了提高图像质量,基于参考的超分辨率技术应运而生。然而,现有的方法仍然存在一些不足:(1)没有充分利用不同对比度图像的优势。(2)没有考虑到磁共振成像的逐层扫描性质。(3)缺乏捕捉上下文信息、匹配融合多尺度、多对比度特征的能力。本文提出了多片感知匹配与融合(MSAMF)网络,该网络通过引入多片感知模块和多尺度匹配策略,在其他尺度下捕获参考特征中相应的上下文信息,充分利用多片参考图像信息。为了进一步整合匹配特征,还设计了一种多尺度融合机制,逐步融合多尺度匹配特征,从而生成更精细的超分辨率图像。实验结果支持了我们的网络在提高脑MRI重建质量方面的优势。
{"title":"Enhancing Brain MRI Super-Resolution Through Multi-Slice Aware Matching and Fusion","authors":"Jie Xiang,&nbsp;Ang Zhao,&nbsp;Xia Li,&nbsp;Xubin Wu,&nbsp;Yanqing Dong,&nbsp;Yan Niu,&nbsp;Xin Wen,&nbsp;Yidi Li","doi":"10.1049/cit2.70032","DOIUrl":"https://doi.org/10.1049/cit2.70032","url":null,"abstract":"<p>In clinical diagnosis, magnetic resonance imaging (MRI) allows different contrast images to be obtained. High-resolution (HR) MRI presents fine anatomical structures, which is important for improving the efficiency of expert diagnosis and realising smart healthcare. However, due to the cost of scanning equipment and the time required for scanning, obtaining an HR brain MRI is quite challenging. Therefore, to improve the quality of images, reference-based super-resolution technology has come into existence. Nevertheless, the existing methods still have some drawbacks: (1) The advantages of different contrast images are not fully utilised. (2) The slice-by-slice scanning nature of magnetic resonance imaging is not considered. (3) The ability to capture contextual information and to match and fuse multi-scale, multi-contrast features is lacking. In this paper, we propose the multi-slice aware matching and fusion (MSAMF) network, which makes full use of multi-slice reference images information by introducing a multi-slice aware module and multi-scale matching strategy to capture corresponding contextual information in reference features at other scales. To further integrate matching features, a multi-scale fusion mechanism is also designed to progressively fuse multi-scale matching features, thereby generating more detailed super-resolution images. The experimental results support the benefits of our network in enhancing the quality of brain MRI reconstruction.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1411-1421"},"PeriodicalIF":7.3,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70032","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrastive Learning-Based Multi-Level Knowledge Distillation 基于对比学习的多层次知识蒸馏
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-04 DOI: 10.1049/cit2.70036
Lin Li, Jianping Gou, Weihua Ou, Wenbai Chen, Lan Du

With the increasing constraints of hardware devices, there is a growing demand for compact models to be deployed on device endpoints. Knowledge distillation, a widely used technique for model compression and knowledge transfer, has gained significant attention in recent years. However, traditional distillation approaches compare the knowledge of individual samples indirectly through class prototypes overlooking the structural relationships between samples. Although recent distillation methods based on contrastive learning can capture relational knowledge, their relational constraints often distort the positional information of the samples leading to compromised performance in the distilled model. To address these challenges and further enhance the performance of compact models, we propose a novel approach, termed contrastive learning-based multi-level knowledge distillation (CLMKD). The CLMKD framework introduces three key modules: class-guided contrastive distillation, gradient relation contrastive distillation, and semantic similarity distillation. These modules are effectively integrated into a unified framework to extract feature knowledge from multiple levels, capturing not only the representational consistency of individual samples but also their higher-order structure and semantic similarity. We evaluate the proposed CLMKD method on multiple image classification datasets and the results demonstrate its superior performance compared to state-of-the-art knowledge distillation methods.

随着硬件设备的限制不断增加,在设备端点上部署紧凑型模型的需求不断增长。知识蒸馏是一种广泛应用于模型压缩和知识转移的技术,近年来得到了广泛的关注。然而,传统的蒸馏方法通过类原型间接地比较单个样品的知识,忽略了样品之间的结构关系。尽管最近基于对比学习的蒸馏方法可以捕获关系知识,但它们的关系约束往往会扭曲样本的位置信息,从而降低蒸馏模型的性能。为了解决这些挑战并进一步提高紧凑模型的性能,我们提出了一种新的方法,称为基于对比学习的多层次知识蒸馏(CLMKD)。CLMKD框架引入了三个关键模块:类引导的对比蒸馏、梯度关系对比蒸馏和语义相似度蒸馏。这些模块有效地集成到一个统一的框架中,从多个层面提取特征知识,不仅捕获单个样本的表征一致性,还捕获它们的高阶结构和语义相似性。我们在多个图像分类数据集上对所提出的CLMKD方法进行了评估,结果表明与最先进的知识蒸馏方法相比,该方法具有优越的性能。
{"title":"Contrastive Learning-Based Multi-Level Knowledge Distillation","authors":"Lin Li,&nbsp;Jianping Gou,&nbsp;Weihua Ou,&nbsp;Wenbai Chen,&nbsp;Lan Du","doi":"10.1049/cit2.70036","DOIUrl":"https://doi.org/10.1049/cit2.70036","url":null,"abstract":"<p>With the increasing constraints of hardware devices, there is a growing demand for compact models to be deployed on device endpoints. Knowledge distillation, a widely used technique for model compression and knowledge transfer, has gained significant attention in recent years. However, traditional distillation approaches compare the knowledge of individual samples indirectly through class prototypes overlooking the structural relationships between samples. Although recent distillation methods based on contrastive learning can capture relational knowledge, their relational constraints often distort the positional information of the samples leading to compromised performance in the distilled model. To address these challenges and further enhance the performance of compact models, we propose a novel approach, termed contrastive learning-based multi-level knowledge distillation (CLMKD). The CLMKD framework introduces three key modules: class-guided contrastive distillation, gradient relation contrastive distillation, and semantic similarity distillation. These modules are effectively integrated into a unified framework to extract feature knowledge from multiple levels, capturing not only the representational consistency of individual samples but also their higher-order structure and semantic similarity. We evaluate the proposed CLMKD method on multiple image classification datasets and the results demonstrate its superior performance compared to state-of-the-art knowledge distillation methods.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1478-1488"},"PeriodicalIF":7.3,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70036","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of a Marker-Based Human–Robot Following Motion Control Strategy 基于标记的人-机器人跟随运动控制策略设计
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-01 DOI: 10.1049/cit2.70023
Zhigang Zhang, Yongsheng Guo, Xiaoxia Yu, Shuaishuai Ge

To address the challenges of jerky movements and poor tracking performance in outdoor environments for a following-type mobile robot, a novel marker-based human–machine following-motion control strategy is explored. This strategy decouples the control of linear velocity and angular velocity, handling them separately. First, in the design of linear-velocity control, using the identification of markers to determine the distance between the human and the robot, an enhanced virtual spring model is developed. This involves designing a weighted dynamic damping coefficient to address the rationality issues of the range and trend of the robot's following speed, thereby improving its smoothness and reducing the risk of target loss. Second, in the design of angular velocity control, a new concept of an ‘insensitive zone’ based on the offset of the marker's centre point is proposed, combined with a fuzzy controller to address the issue of robot jitter and enhance its resistance to interference. The experimental results indicate that the average variance in the human–robot distance is 1.037 m, whereas the average variance in the robot's linear velocity is 0.345 m/s. Due to the design of an insensitive region in parameter-adaptive fuzzy control, the average variance of angular velocity is only 0.031 rad/s. When the human–robot distance exhibits significant fluctuations, the fluctuations in both linear and angular velocities are comparatively small, allowing for stable and smooth following movements. This demonstrates the effectiveness of the motion control strategy designed in this study.

针对跟随型移动机器人在室外环境中运动不稳和跟踪性能差的问题,提出了一种基于标记的人机跟随运动控制策略。该策略将线速度和角速度的控制解耦,分别处理它们。首先,在线速度控制的设计中,利用标记物的识别来确定人与机器人之间的距离,建立了一个增强的虚拟弹簧模型。这涉及到设计一个加权动态阻尼系数,以解决机器人跟随速度范围和趋势的合理性问题,从而提高其平滑性,降低目标丢失的风险。其次,在角速度控制设计中,提出了基于标记中心点偏移量的“不敏感区”的新概念,并结合模糊控制器来解决机器人抖动问题,增强其抗干扰能力。实验结果表明,人-机器人距离的平均方差为1.037 m,机器人线速度的平均方差为0.345 m/s。由于在参数自适应模糊控制中设计了不敏感区域,使得角速度的平均方差仅为0.031 rad/s。当人-机器人距离出现显著波动时,线速度和角速度的波动相对较小,从而允许稳定和平滑的跟随运动。这证明了本研究设计的运动控制策略的有效性。
{"title":"Design of a Marker-Based Human–Robot Following Motion Control Strategy","authors":"Zhigang Zhang,&nbsp;Yongsheng Guo,&nbsp;Xiaoxia Yu,&nbsp;Shuaishuai Ge","doi":"10.1049/cit2.70023","DOIUrl":"https://doi.org/10.1049/cit2.70023","url":null,"abstract":"<p>To address the challenges of jerky movements and poor tracking performance in outdoor environments for a following-type mobile robot, a novel marker-based human–machine following-motion control strategy is explored. This strategy decouples the control of linear velocity and angular velocity, handling them separately. First, in the design of linear-velocity control, using the identification of markers to determine the distance between the human and the robot, an enhanced virtual spring model is developed. This involves designing a weighted dynamic damping coefficient to address the rationality issues of the range and trend of the robot's following speed, thereby improving its smoothness and reducing the risk of target loss. Second, in the design of angular velocity control, a new concept of an ‘insensitive zone’ based on the offset of the marker's centre point is proposed, combined with a fuzzy controller to address the issue of robot jitter and enhance its resistance to interference. The experimental results indicate that the average variance in the human–robot distance is 1.037 m, whereas the average variance in the robot's linear velocity is 0.345 m/s. Due to the design of an insensitive region in parameter-adaptive fuzzy control, the average variance of angular velocity is only 0.031 rad/s. When the human–robot distance exhibits significant fluctuations, the fluctuations in both linear and angular velocities are comparatively small, allowing for stable and smooth following movements. This demonstrates the effectiveness of the motion control strategy designed in this study.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1489-1500"},"PeriodicalIF":7.3,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70023","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Deep Backtracking Bare-Bones Particle Swarm Optimisation Algorithm for High-Dimensional Nonlinear Functions 高维非线性函数的深度回溯裸骨架粒子群优化算法
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-01 DOI: 10.1049/cit2.70028
Jia Guo, Guoyuan Zhou, Ke Yan, Yi Di, Yuji Sato, Zhou He, Binghua Shi

The challenge of optimising multimodal functions within high-dimensional domains constitutes a notable difficulty in evolutionary computation research. Addressing this issue, this study introduces the Deep Backtracking Bare-Bones Particle Swarm Optimisation (DBPSO) algorithm, an innovative approach built upon the integration of the Deep Memory Storage Mechanism (DMSM) and the Dynamic Memory Activation Strategy (DMAS). The DMSM enhances the memory retention for the globally optimal particle, promoting interaction between standard particles and their historically optimal counterparts. In parallel, DMAS assures the updated position of the globally optimal particle is appropriately aligned with the deep memory repository. The efficacy of DBPSO was rigorously assessed through a series of simulations employing the CEC2017 benchmark suite. A comparative analysis juxtaposed DBPSO's performance against five contemporary evolutionary algorithms across two experimental conditions: Dimension-50 and Dimension-100. In the 50D trials, DBPSO attained an average ranking of 2.03, whereas in the 100D scenarios, it improved to an average ranking of 1.9. Further examination utilising the CEC2019 benchmark functions revealed DBPSO's robustness, securing four first-place finishes, three second-place standings, and three third-place positions, culminating in an unmatched average ranking of 1.9 across all algorithms. These empirical results corroborate DBPSO's proficiency in delivering precise solutions for complex, high-dimensional optimisation challenges.

高维域内多模态函数的优化问题是进化计算研究中的一个重要难题。为了解决这个问题,本研究引入了深度回溯裸骨架粒子群优化(DBPSO)算法,这是一种建立在深度记忆存储机制(DMSM)和动态记忆激活策略(DMAS)集成基础上的创新方法。DMSM增强了对全局最优粒子的记忆保留,促进了标准粒子与历史最优粒子之间的相互作用。同时,DMAS确保全局最优粒子的更新位置与深度存储库适当对齐。DBPSO的有效性通过采用CEC2017基准套件的一系列模拟进行了严格评估。一项比较分析将DBPSO与五种当代进化算法在两种实验条件下的性能进行了对比:维度-50和维度-100。在50D的试验中,DBPSO的平均排名为2.03,而在100D的试验中,DBPSO的平均排名提高到1.9。利用CEC2019基准函数的进一步检查显示,DBPSO的稳健性很强,获得了4个第一名,3个第二名和3个第三名,最终在所有算法中获得了无与伦比的1.9的平均排名。这些实证结果证实了DBPSO在为复杂、高维优化挑战提供精确解决方案方面的熟练程度。
{"title":"A Deep Backtracking Bare-Bones Particle Swarm Optimisation Algorithm for High-Dimensional Nonlinear Functions","authors":"Jia Guo,&nbsp;Guoyuan Zhou,&nbsp;Ke Yan,&nbsp;Yi Di,&nbsp;Yuji Sato,&nbsp;Zhou He,&nbsp;Binghua Shi","doi":"10.1049/cit2.70028","DOIUrl":"https://doi.org/10.1049/cit2.70028","url":null,"abstract":"<p>The challenge of optimising multimodal functions within high-dimensional domains constitutes a notable difficulty in evolutionary computation research. Addressing this issue, this study introduces the Deep Backtracking Bare-Bones Particle Swarm Optimisation (DBPSO) algorithm, an innovative approach built upon the integration of the Deep Memory Storage Mechanism (DMSM) and the Dynamic Memory Activation Strategy (DMAS). The DMSM enhances the memory retention for the globally optimal particle, promoting interaction between standard particles and their historically optimal counterparts. In parallel, DMAS assures the updated position of the globally optimal particle is appropriately aligned with the deep memory repository. The efficacy of DBPSO was rigorously assessed through a series of simulations employing the CEC2017 benchmark suite. A comparative analysis juxtaposed DBPSO's performance against five contemporary evolutionary algorithms across two experimental conditions: Dimension-50 and Dimension-100. In the 50D trials, DBPSO attained an average ranking of 2.03, whereas in the 100D scenarios, it improved to an average ranking of 1.9. Further examination utilising the CEC2019 benchmark functions revealed DBPSO's robustness, securing four first-place finishes, three second-place standings, and three third-place positions, culminating in an unmatched average ranking of 1.9 across all algorithms. These empirical results corroborate DBPSO's proficiency in delivering precise solutions for complex, high-dimensional optimisation challenges.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1501-1520"},"PeriodicalIF":7.3,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
CAAI Transactions on Intelligence Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1