Kaili Wang, Xinwei Sun, Huijie He, Fenhua Bai, Tao Shen
In recent years, the transformer model has demonstrated excellent performance in computer vision (CV) applications. The key lies in its guided representation attention mechanism, which uses dot-product to depict complex feature relationships, and comprehensively understands the context semantics to obtain feature weights. Then feature enhancement is implemented by guiding the target matrix through feature weights. However, the uncertainty and inconsistency of features are widespread that prone to confusion in the description of relationships within dot-product attention mechanisms. To solve this problem, this paper proposed a novel approximate-guided representation learning methodology for vision transformer. The kernelised matroids fuzzy rough set is defined, wherein the closed sets inside kernelised fuzzy information granules of matroids structures can constitute the subspace of lower approximation in rough sets. Thus, the kernel relation is employed to characterise image feature granules that will be reconstructed according to the independent set in matroids theory. Then, according to the characteristics of the closed set within matroids, the feature attention weight is formed by using the lower approximation to realise the approximate guidance of features. The approximate-guided representation mechanism can be flexibly deployed as a plug-and-play component in a wide range of CV tasks. Extensive empirical results demonstrate that the proposed method outperforms the majority of advanced prevalent models, especially in terms of robustness.
{"title":"Approximate-Guided Representation Learning in Vision Transformer","authors":"Kaili Wang, Xinwei Sun, Huijie He, Fenhua Bai, Tao Shen","doi":"10.1049/cit2.70041","DOIUrl":"https://doi.org/10.1049/cit2.70041","url":null,"abstract":"<p>In recent years, the transformer model has demonstrated excellent performance in computer vision (CV) applications. The key lies in its guided representation attention mechanism, which uses dot-product to depict complex feature relationships, and comprehensively understands the context semantics to obtain feature weights. Then feature enhancement is implemented by guiding the target matrix through feature weights. However, the uncertainty and inconsistency of features are widespread that prone to confusion in the description of relationships within dot-product attention mechanisms. To solve this problem, this paper proposed a novel approximate-guided representation learning methodology for vision transformer. The kernelised matroids fuzzy rough set is defined, wherein the closed sets inside kernelised fuzzy information granules of matroids structures can constitute the subspace of lower approximation in rough sets. Thus, the kernel relation is employed to characterise image feature granules that will be reconstructed according to the independent set in matroids theory. Then, according to the characteristics of the closed set within matroids, the feature attention weight is formed by using the lower approximation to realise the approximate guidance of features. The approximate-guided representation mechanism can be flexibly deployed as a plug-and-play component in a wide range of CV tasks. Extensive empirical results demonstrate that the proposed method outperforms the majority of advanced prevalent models, especially in terms of robustness.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1459-1477"},"PeriodicalIF":7.3,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70041","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed Meselhy Eltoukhy, Faisal S. Alsubaei, Mostafa M. Abdel-Aziz, Khalid M. Hosny
Medical images play a crucial role in diagnosis, treatment procedures and overall healthcare. Nevertheless, they also pose substantial risks to patient confidentiality and safety. Safeguarding the confidentiality of patients' data has become an urgent and practical concern. We present a novel approach for reversible data hiding for colour medical images. In a hybrid domain, we employ AlexNet, tuned with watershed transform (WST) and L-shaped fractal Tromino encryption. Our approach commences by constructing the host image's feature vector using a pre-trained AlexNet model. Next, we use the watershed transform to convert the extracted feature vector into a vector for a topographic map, which we then encrypt using an L-shaped fractal Tromino cryptosystem. We embed the secret image in the transformed image vector using a histogram-based embedding strategy to enhance payload and visual fidelity. When there are no attacks, the RDHNet exhibits robust performance, can be reversed to the original image and maintains a visually appealing stego image, with an average PSNR of 73.14 dB, an SSIM of 0.9999 and perfect values of NC = 1 and BER = 0 under normal conditions. The proposed RDHNet demonstrates a robust ability to withstand detrimental geometric and noise-adding attacks as well as various steganalysis methods. Furthermore, our RDHNet method initiative demonstrates efficacy in tackling contemporary confidentiality issues.
{"title":"RDHNet: Reversible Data Hiding Method for Securing Colour Images Using AlexNet and Watershed Transform in a Fusion Domain","authors":"Mohamed Meselhy Eltoukhy, Faisal S. Alsubaei, Mostafa M. Abdel-Aziz, Khalid M. Hosny","doi":"10.1049/cit2.70038","DOIUrl":"https://doi.org/10.1049/cit2.70038","url":null,"abstract":"<p>Medical images play a crucial role in diagnosis, treatment procedures and overall healthcare. Nevertheless, they also pose substantial risks to patient confidentiality and safety. Safeguarding the confidentiality of patients' data has become an urgent and practical concern. We present a novel approach for reversible data hiding for colour medical images. In a hybrid domain, we employ AlexNet, tuned with watershed transform (WST) and L-shaped fractal Tromino encryption. Our approach commences by constructing the host image's feature vector using a pre-trained AlexNet model. Next, we use the watershed transform to convert the extracted feature vector into a vector for a topographic map, which we then encrypt using an L-shaped fractal Tromino cryptosystem. We embed the secret image in the transformed image vector using a histogram-based embedding strategy to enhance payload and visual fidelity. When there are no attacks, the RDHNet exhibits robust performance, can be reversed to the original image and maintains a visually appealing stego image, with an average PSNR of 73.14 dB, an SSIM of 0.9999 and perfect values of NC = 1 and BER = 0 under normal conditions. The proposed RDHNet demonstrates a robust ability to withstand detrimental geometric and noise-adding attacks as well as various steganalysis methods. Furthermore, our RDHNet method initiative demonstrates efficacy in tackling contemporary confidentiality issues.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1422-1445"},"PeriodicalIF":7.3,"publicationDate":"2025-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, 3D object detection using neural radiance fields (NeRF) has advanced significantly, yet challenges remain in effectively utilising the density field. Current methods often treat NeRF as a geometry learning tool or rely on volume rendering, neglecting the density field's potential and feature dependencies. To address this, we propose NeRF-C3D, a novel framework incorporating a multi-scale feature fusion module with channel attention (MFCA). MFCA leverages channel attention to model feature dependencies, dynamically adjusting channel weights during fusion to enhance important features and suppress redundancy. This optimises density field representation and improves feature discriminability. Experiments on 3D-FRONT, Hypersim, and ScanNet demonstrate NeRF-C3D's superior performance validating MFCA's effectiveness in capturing feature relationships and showcasing its innovation in NeRF-based 3D detection.
{"title":"Improving 3D Object Detection in Neural Radiance Fields With Channel Attention","authors":"Minling Zhu, Yadong Gong, Dongbing Gu, Chunwei Tian","doi":"10.1049/cit2.70045","DOIUrl":"https://doi.org/10.1049/cit2.70045","url":null,"abstract":"<p>In recent years, 3D object detection using neural radiance fields (NeRF) has advanced significantly, yet challenges remain in effectively utilising the density field. Current methods often treat NeRF as a geometry learning tool or rely on volume rendering, neglecting the density field's potential and feature dependencies. To address this, we propose NeRF-C3D, a novel framework incorporating a multi-scale feature fusion module with channel attention (MFCA). MFCA leverages channel attention to model feature dependencies, dynamically adjusting channel weights during fusion to enhance important features and suppress redundancy. This optimises density field representation and improves feature discriminability. Experiments on 3D-FRONT, Hypersim, and ScanNet demonstrate NeRF-C3D's superior performance validating MFCA's effectiveness in capturing feature relationships and showcasing its innovation in NeRF-based 3D detection.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1446-1458"},"PeriodicalIF":7.3,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70045","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nanfang Xu, Shanshan Liu, Yuepeng Chen, Kailai Zhang, Chenyi Guo, Cheng Zhang, Fei Xu, Qifeng Lan, Wanyi Fu, Xingyu Zhou, Bo Zhao, Aodong He, Xiangling Fu, Ji Wu, Weishi Li
Lumbar degenerative disc diseases constitute a major contributor to lower back pain. In pursuit of an enhanced understanding of lumbar degenerative pathology and the development of more effective treatment modalities, the application of precise measurement techniques for lumbar segment kinematics is imperative. This study aims to pioneer a novel automated lumbar spine orientation estimation method using deep learning techniques, to facilitate the automatic 2D–3D pre-registration of the lumbar spine during physiological movements, to enhance the efficiency of image registration and the accuracy of spinal segment kinematic measurements. A total of 12 asymptomatic volunteers were enrolled and captured in 2 oblique views with 7 different postures. Images were used for deep learning model development training and evaluation. The model was composed of a segmentation module using Mask R-CNN and an estimation module using ResNet50 architecture with a Squeeze-and-Excitation module. The cosine value of the angle between the prediction vector and the vector of ground truth was used to quantify the model performance. Data from another two prospective recruited asymptomatic volunteers were used to compare the time cost between model-assisted registration and manual registration without a model. The cosine values of vector deviation angles at three axes in the cartesian coordinate system were 0.9667 ± 0.004, 0.9593 ± 0.0047 and 0.9828 ± 0.0025, respectively. The value of the angular deviation between the intermediate vector obtained by utilising the three direction vectors and ground truth was 10.7103 ± 0.7466. Results show the consistency and reliability of the model's predictions across different experiments and axes and demonstrate that our approach significantly reduces the registration time (3.47 ± 0.90 min vs. 8.10 ± 1.60 min, p < 0.001), enhances the efficiency, and expands its broader utilisation of clinical research about kinematic measurements.
腰椎间盘退行性疾病是造成腰痛的主要原因。为了提高对腰椎退行性病理的理解和发展更有效的治疗方式,腰椎节段运动学精确测量技术的应用是必要的。本研究旨在利用深度学习技术,开拓一种新的自动腰椎方位估计方法,促进腰椎生理运动过程中2D-3D的自动预配准,提高图像配准的效率和脊柱节段运动测量的准确性。共有12名无症状的志愿者被招募,并在2个斜位视图中以7种不同的姿势拍摄。图像用于深度学习模型开发、训练和评估。该模型由一个使用Mask R-CNN的分割模块和一个使用ResNet50架构的估计模块组成,该模块带有一个挤压和激励模块。利用预测向量与地面真值向量夹角的余弦值来量化模型的性能。来自另外两名预期招募的无症状志愿者的数据被用来比较模型辅助注册和没有模型的手动注册之间的时间成本。三轴矢量偏差角在直角坐标系下的余弦值分别为0.9667±0.004、0.9593±0.0047和0.9828±0.0025。利用三个方向矢量得到的中间矢量与地面真值的角偏差值为10.7103±0.7466。结果显示了模型在不同实验和轴上预测的一致性和可靠性,并表明我们的方法显着缩短了配准时间(3.47±0.90 min vs. 8.10±1.60 min, p < 0.001),提高了效率,并扩大了其在运动学测量临床研究中的广泛应用。
{"title":"Deep Learning Approach for Automated Estimation of 3D Vertebral Orientation of the Lumbar Spine","authors":"Nanfang Xu, Shanshan Liu, Yuepeng Chen, Kailai Zhang, Chenyi Guo, Cheng Zhang, Fei Xu, Qifeng Lan, Wanyi Fu, Xingyu Zhou, Bo Zhao, Aodong He, Xiangling Fu, Ji Wu, Weishi Li","doi":"10.1049/cit2.70033","DOIUrl":"https://doi.org/10.1049/cit2.70033","url":null,"abstract":"<p>Lumbar degenerative disc diseases constitute a major contributor to lower back pain. In pursuit of an enhanced understanding of lumbar degenerative pathology and the development of more effective treatment modalities, the application of precise measurement techniques for lumbar segment kinematics is imperative. This study aims to pioneer a novel automated lumbar spine orientation estimation method using deep learning techniques, to facilitate the automatic 2D–3D pre-registration of the lumbar spine during physiological movements, to enhance the efficiency of image registration and the accuracy of spinal segment kinematic measurements. A total of 12 asymptomatic volunteers were enrolled and captured in 2 oblique views with 7 different postures. Images were used for deep learning model development training and evaluation. The model was composed of a segmentation module using Mask R-CNN and an estimation module using ResNet50 architecture with a Squeeze-and-Excitation module. The cosine value of the angle between the prediction vector and the vector of ground truth was used to quantify the model performance. Data from another two prospective recruited asymptomatic volunteers were used to compare the time cost between model-assisted registration and manual registration without a model. The cosine values of vector deviation angles at three axes in the cartesian coordinate system were 0.9667 ± 0.004, 0.9593 ± 0.0047 and 0.9828 ± 0.0025, respectively. The value of the angular deviation between the intermediate vector obtained by utilising the three direction vectors and ground truth was 10.7103 ± 0.7466. Results show the consistency and reliability of the model's predictions across different experiments and axes and demonstrate that our approach significantly reduces the registration time (3.47 ± 0.90 min vs. 8.10 ± 1.60 min, <i>p</i> < 0.001), enhances the efficiency, and expands its broader utilisation of clinical research about kinematic measurements.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1306-1319"},"PeriodicalIF":7.3,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70033","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nayef Alqahtani, Arfat Ahmad Khan, Rakesh Kumar Mahendran, Muhammad Faheem
Lung cancer (LC) is a major cancer which accounts for higher mortality rates worldwide. Doctors utilise many imaging modalities for identifying lung tumours and their severity in earlier stages. Nowadays, machine learning (ML) and deep learning (DL) methodologies are utilised for the robust detection and prediction of lung tumours. Recently, multi modal imaging emerged as a robust technique for lung tumour detection by combining various imaging features. To cope with that, we propose a novel multi modal imaging technique named versatile scale malleable image integration and patch wise attention network (