Point cloud registration, a core task in 3D computer vision for aligning two point clouds via rotation and translation, underpins critical applications like robotic navigation and 3D reconstruction. Classical methods (e.g., Iterative Closest Point) easily converge to local minima under poor initial alignment. Deep learning–based approaches, while efficient, suffer from high annotation costs for large-scale data. Existing reinforcement learning (RL)-based methods rely on simple PointNet feature extractors, which are insensitive to local geometric details and thus yield suboptimal registration precision. To address these challenges, we propose ACT-Agent: Affinity-Cross Transformer for point cloud registration via reinforcement learning, a novel method that formulates point cloud registration as an RL Markov decision process for iterative optimisation. We leverage Pointnet and Affinity-Cross Transformer to extract and enhance expressive salient features and assign adaptive weights to channels based on their relative importance. We use RL to autonomously learn from feedback in the environment, freeing ourselves from dependence on data annotation. Experimental results on ModelNet40 (synthetic data) and ScanObjectNN (real-world data) demonstrate that our proposed ACT-Agent achieves higher accuracy, efficiency, and generalisation ability than the state-of-the-art methods of point cloud registration.
{"title":"ACT-Agent: Affinity-Cross Transformer for Point Cloud Registration via Reinforcement Learning","authors":"Fengguang Xiong, Haixin Gong, Qiao Ma, Yingbo Jia, Ruize Guo, Yu Cao, Ligang He, Liqun Kuang, Xie Han","doi":"10.1049/ipr2.70283","DOIUrl":"10.1049/ipr2.70283","url":null,"abstract":"<p>Point cloud registration, a core task in 3D computer vision for aligning two point clouds via rotation and translation, underpins critical applications like robotic navigation and 3D reconstruction. Classical methods (e.g., Iterative Closest Point) easily converge to local minima under poor initial alignment. Deep learning–based approaches, while efficient, suffer from high annotation costs for large-scale data. Existing reinforcement learning (RL)-based methods rely on simple PointNet feature extractors, which are insensitive to local geometric details and thus yield suboptimal registration precision. To address these challenges, we propose ACT-Agent: Affinity-Cross Transformer for point cloud registration via reinforcement learning, a novel method that formulates point cloud registration as an RL Markov decision process for iterative optimisation. We leverage Pointnet and Affinity-Cross Transformer to extract and enhance expressive salient features and assign adaptive weights to channels based on their relative importance. We use RL to autonomously learn from feedback in the environment, freeing ourselves from dependence on data annotation. Experimental results on ModelNet40 (synthetic data) and ScanObjectNN (real-world data) demonstrate that our proposed ACT-Agent achieves higher accuracy, efficiency, and generalisation ability than the state-of-the-art methods of point cloud registration.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70283","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lijuan Zhu, Chun Feng, Peng Wang, Xiaoyu Dou, Hao Chang, Lu Li
Image classification is a fundamental task in computer vision, with deep learning significantly improving its accuracy. However, the accurate classification of defect types in industrial imaging, such as for oil country tubular goods (OCTGs), remains a challenge, particularly when dealing with limited datasets. This paper addresses the classification of four distinct damage types in OCTG images under small sample conditions using the residual attention smoothing mixup network (RASMN) model. Our approach integrates a residual attention network for efficient feature extraction, label smoothing to mitigate overfitting, and mixup data augmentation for enhanced model robustness. Experimental results demonstrate that RASMN significantly improves classification accuracy, achieving a Top-1 error rate of 7.6%. This represents a substantial improvement, cutting the error of our baseline residual attention network (15.5%) by more than half and outperforming widely-used architectures like ResNet18 (16.4%) on this specific task. The significance of these results lies in providing a validated, high-performance model for a challenging industrial classification task with limited data, balancing high accuracy with an efficient inference time of 3.94 ms. This study offers an effective deep learning solution for classifying tube defect images, highlighting the efficacy of combining residual attention networks with regularization strategies.
{"title":"Residual Attention Smoothing Mixup Network for Efficient Oil Country Tubular Goods Defect Classification","authors":"Lijuan Zhu, Chun Feng, Peng Wang, Xiaoyu Dou, Hao Chang, Lu Li","doi":"10.1049/ipr2.70278","DOIUrl":"https://doi.org/10.1049/ipr2.70278","url":null,"abstract":"<p>Image classification is a fundamental task in computer vision, with deep learning significantly improving its accuracy. However, the accurate classification of defect types in industrial imaging, such as for oil country tubular goods (OCTGs), remains a challenge, particularly when dealing with limited datasets. This paper addresses the classification of four distinct damage types in OCTG images under small sample conditions using the residual attention smoothing mixup network (RASMN) model. Our approach integrates a residual attention network for efficient feature extraction, label smoothing to mitigate overfitting, and mixup data augmentation for enhanced model robustness. Experimental results demonstrate that RASMN significantly improves classification accuracy, achieving a Top-1 error rate of 7.6%. This represents a substantial improvement, cutting the error of our baseline residual attention network (15.5%) by more than half and outperforming widely-used architectures like ResNet18 (16.4%) on this specific task. The significance of these results lies in providing a validated, high-performance model for a challenging industrial classification task with limited data, balancing high accuracy with an efficient inference time of 3.94 ms. This study offers an effective deep learning solution for classifying tube defect images, highlighting the efficacy of combining residual attention networks with regularization strategies.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70278","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145969992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roland Kotroczó, Dániel Varga, János Márk Szalai-Gindl, Bence Formanek, Péter Vaderna
Localization and place recognition are important tasks in many fields, including autonomous driving, robotics, and AR/VR applications. Local and global feature-based solutions typically rely on exact nearest neighbour search methods, such as KD-tree, to retrieve candidate places or frames and estimate the precise sensor position using point correspondences. However, in large-scale applications, maintaining real-time online processing without loss of performance can be challenging. We propose that by using an approximate nearest neighbour search method instead of exact methods, runtime can be significantly reduced without sacrificing accuracy. To demonstrate this, we developed a localization pipeline based on a keypoint voting mechanism, employing the hierarchical navigable small world (HNSW) structure as the nearest neighbour search method. Graph-based structures like HNSW are widely used in other domains, such as recommender systems and large language models. We argue that for the use case of matching local feature descriptors, the slightly lower accuracy in terms of exact neighbours does not lead to a significant increase in localization error. We evaluated our pipeline on widely known datasets and performed parameter tuning of HNSW specifically for this use case.
{"title":"Localization With Approximate Nearest Neighbour Search","authors":"Roland Kotroczó, Dániel Varga, János Márk Szalai-Gindl, Bence Formanek, Péter Vaderna","doi":"10.1049/ipr2.70242","DOIUrl":"https://doi.org/10.1049/ipr2.70242","url":null,"abstract":"<p>Localization and place recognition are important tasks in many fields, including autonomous driving, robotics, and AR/VR applications. Local and global feature-based solutions typically rely on exact nearest neighbour search methods, such as KD-tree, to retrieve candidate places or frames and estimate the precise sensor position using point correspondences. However, in large-scale applications, maintaining real-time online processing without loss of performance can be challenging. We propose that by using an approximate nearest neighbour search method instead of exact methods, runtime can be significantly reduced without sacrificing accuracy. To demonstrate this, we developed a localization pipeline based on a keypoint voting mechanism, employing the hierarchical navigable small world (HNSW) structure as the nearest neighbour search method. Graph-based structures like HNSW are widely used in other domains, such as recommender systems and large language models. We argue that for the use case of matching local feature descriptors, the slightly lower accuracy in terms of exact neighbours does not lead to a significant increase in localization error. We evaluated our pipeline on widely known datasets and performed parameter tuning of HNSW specifically for this use case.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70242","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145963951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunbiao Liu, Chunyi Chen, Jun Peng, Xiaojuan Hu, Yu Fan
Real-time photorealistic novel view synthesis of dynamic scenes remains a challenging task, primarily due to the inherent complexity of temporal dynamics and motion patterns. Despite recent methods based on Gaussian Splatting having shown considerable progress in this regard, they are still limited by high memory consumption. In this paper, we propose a time-varying 3D Gaussian splatting (TVGS) representation for dynamic scenes, which incorporates two key components. First, we model the scene using 3D Gaussians endowed with temporal opacity and time-varying motion parameters. These attributes effectively capture transient phenomena such as the sudden appearance or disappearance of dynamic elements. Second, we introduce an adaptive density control mechanism to optimise the distribution of these time-varying 3D Gaussians throughout the sequence. As an explicit dynamic scene representation, TVGS not only achieves high-fidelity view synthesis but also attains a real-time rendering speed of 160 FPS on the Neural 3D Video Dataset using a single RTX 4090 GPU.
{"title":"Time-Varying 3D Gaussian Splatting Representation for Dynamic Scenes","authors":"Yunbiao Liu, Chunyi Chen, Jun Peng, Xiaojuan Hu, Yu Fan","doi":"10.1049/ipr2.70280","DOIUrl":"10.1049/ipr2.70280","url":null,"abstract":"<p>Real-time photorealistic novel view synthesis of dynamic scenes remains a challenging task, primarily due to the inherent complexity of temporal dynamics and motion patterns. Despite recent methods based on Gaussian Splatting having shown considerable progress in this regard, they are still limited by high memory consumption. In this paper, we propose a time-varying 3D Gaussian splatting (TVGS) representation for dynamic scenes, which incorporates two key components. First, we model the scene using 3D Gaussians endowed with temporal opacity and time-varying motion parameters. These attributes effectively capture transient phenomena such as the sudden appearance or disappearance of dynamic elements. Second, we introduce an adaptive density control mechanism to optimise the distribution of these time-varying 3D Gaussians throughout the sequence. As an explicit dynamic scene representation, TVGS not only achieves high-fidelity view synthesis but also attains a real-time rendering speed of 160 FPS on the Neural 3D Video Dataset using a single RTX 4090 GPU.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70280","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145969993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sivapriya T, K. R. Sri Preethaa, Yuvaraj Natarajan, M. Shyamala Devi
Accurate bone fracture classification from radiographs is hindered by low fracture visibility, imaging artefacts and high intra-class similarity. To overcome this, multi-channel fusion residual network (MFResNet18) is proposed that integrates a multi-modal channel (MMC) filter with a multi-path early feature extraction scheme to enrich fracture relevant features before deep inference. The MMC filter transforms each fracture image into five complementary channels as the original image, the Frangi filter for fracture line enhancement, the Difference of Gaussian (DoG) edge map, mid-frequency wavelet decomposition and the bone mask for contextual details. These channels are processed through three parallel shallow CNN paths. Path 1 handles pathological features with the original image and the Frangi filter, path 2 processes wavelet features having DoG and wavelet, and path 3 processes the anatomical features with the bone mask as an attention channel. The outputs are fused through convolution in a feature fusion layer, which adaptively learns inter-modal features while preserving spatial fidelity. The fused feature map is then propagated through a modified MFResNet18 backbone for hierarchical residual learning. Experimental results with the bone fracture dataset demonstrate that MF-ResNet18 achieves 99.72% classification accuracy, significantly outperforming conventional ResNet18 and other existing models. The integration of MMC filtering, multi-path early specialisation and learnable feature fusion serves as a key novelty of this work that offers a robust, extensible framework for fine-grained bone fracture classification.
{"title":"Multi-Channel Fusion Residual Network for Robust Bone Fracture Classification From Radiographs","authors":"Sivapriya T, K. R. Sri Preethaa, Yuvaraj Natarajan, M. Shyamala Devi","doi":"10.1049/ipr2.70277","DOIUrl":"https://doi.org/10.1049/ipr2.70277","url":null,"abstract":"<p>Accurate bone fracture classification from radiographs is hindered by low fracture visibility, imaging artefacts and high intra-class similarity. To overcome this, multi-channel fusion residual network (MFResNet18) is proposed that integrates a multi-modal channel (MMC) filter with a multi-path early feature extraction scheme to enrich fracture relevant features before deep inference. The MMC filter transforms each fracture image into five complementary channels as the original image, the Frangi filter for fracture line enhancement, the Difference of Gaussian (DoG) edge map, mid-frequency wavelet decomposition and the bone mask for contextual details. These channels are processed through three parallel shallow CNN paths. Path 1 handles pathological features with the original image and the Frangi filter, path 2 processes wavelet features having DoG and wavelet, and path 3 processes the anatomical features with the bone mask as an attention channel. The outputs are fused through convolution in a feature fusion layer, which adaptively learns inter-modal features while preserving spatial fidelity. The fused feature map is then propagated through a modified MFResNet18 backbone for hierarchical residual learning. Experimental results with the bone fracture dataset demonstrate that MF-ResNet18 achieves 99.72% classification accuracy, significantly outperforming conventional ResNet18 and other existing models. The integration of MMC filtering, multi-path early specialisation and learnable feature fusion serves as a key novelty of this work that offers a robust, extensible framework for fine-grained bone fracture classification.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70277","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145969632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robust place recognition is essential for reliable localization in robotics, particularly in complex environments with frequent indoor–outdoor transitions. However, existing LiDAR-based datasets often focus on outdoor scenarios and lack seamless domain shifts. In this paper, we propose RoboLoc, a benchmark dataset designed for GPS-free place recognition in indoor–outdoor environments with floor transitions. RoboLoc features real-world robot trajectories, diverse elevation profiles, and transitions between structured indoor and unstructured outdoor domains. We benchmark a variety of state-of-the-art models, point-based, voxel-based, and BEV-based architectures, highlighting their generalizability domain shifts. RoboLoc provides a realistic testbed for developing multi-domain localization systems in robotics and autonomous navigation.
{"title":"RoboLoc: A Benchmark Dataset for Point Place Recognition and Localization in Indoor–Outdoor Integrated Environments","authors":"Jaejin Jeon, Seonghoon Ryoo, Sang-Duck Lee, Soomok Lee, Seungwoo Jeong","doi":"10.1049/ipr2.70267","DOIUrl":"https://doi.org/10.1049/ipr2.70267","url":null,"abstract":"<p>Robust place recognition is essential for reliable localization in robotics, particularly in complex environments with frequent indoor–outdoor transitions. However, existing LiDAR-based datasets often focus on outdoor scenarios and lack seamless domain shifts. In this paper, we propose RoboLoc, a benchmark dataset designed for GPS-free place recognition in indoor–outdoor environments with floor transitions. RoboLoc features real-world robot trajectories, diverse elevation profiles, and transitions between structured indoor and unstructured outdoor domains. We benchmark a variety of state-of-the-art models, point-based, voxel-based, and BEV-based architectures, highlighting their generalizability domain shifts. RoboLoc provides a realistic testbed for developing multi-domain localization systems in robotics and autonomous navigation.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70267","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145983479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To address the challenges of feature loss, inaccurate localization, and false or missed detections in small-object detection of loosened and spaced nuts in power transmission lines, this study proposes an enhanced detection model, you only look once-EPDS (YOLO-EPDS), built upon an improved YOLOv9 framework. A RepNCSPELAN4_EMA module is integrated into the backbone network to incorporate a multi-scale attention mechanism, enhancing the extraction of subtle nut texture features via cross-space interactions and parallel multi-branch feature recalibration. SPD-Conv modules replace conventional downsampling layers in the backbone, effectively preserving spatial details in feature maps. Additionally, a RepNCSPELAN4_DCNv4 module employs dynamic deformable convolutions (DCNv4) to improve adaptability to geometrically deformed objects. The shape-IoU loss function is utilized to optimize bounding box regression for small objects. Experimental results indicate that the proposed model achieves a mAP@50 of 79.7% on a self-constructed transmission line nut dataset, outperforming the baseline by 4.3%. These enhancements synergistically increase confidence scores while reducing false-positive and false-negative rates, demonstrating superior capability in extracting defective features of loosened nuts and substantially improving the reliability of transmission line inspection.
{"title":"YOLO-EPDS: A Small Object Detection Algorithm for Power Transmission Line Nut Spacing Looseness","authors":"Guilan Wang, Zenglei Hao, Wangbin Cao, Huawei Mei","doi":"10.1049/ipr2.70279","DOIUrl":"10.1049/ipr2.70279","url":null,"abstract":"<p>To address the challenges of feature loss, inaccurate localization, and false or missed detections in small-object detection of loosened and spaced nuts in power transmission lines, this study proposes an enhanced detection model, you only look once-EPDS (YOLO-EPDS), built upon an improved YOLOv9 framework. A RepNCSPELAN4_EMA module is integrated into the backbone network to incorporate a multi-scale attention mechanism, enhancing the extraction of subtle nut texture features via cross-space interactions and parallel multi-branch feature recalibration. SPD-Conv modules replace conventional downsampling layers in the backbone, effectively preserving spatial details in feature maps. Additionally, a RepNCSPELAN4_DCNv4 module employs dynamic deformable convolutions (DCNv4) to improve adaptability to geometrically deformed objects. The shape-IoU loss function is utilized to optimize bounding box regression for small objects. Experimental results indicate that the proposed model achieves a mAP@50 of 79.7% on a self-constructed transmission line nut dataset, outperforming the baseline by 4.3%. These enhancements synergistically increase confidence scores while reducing false-positive and false-negative rates, demonstrating superior capability in extracting defective features of loosened nuts and substantially improving the reliability of transmission line inspection.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70279","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145963846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Detecting ritual implements in Thangka paintings—such as swords and scriptures—remains challenging due to their intricate visual composition and symbolic complexity. Existing object detection models, typically trained on natural scenes, tend to perform poorly in this domain. To address this limitation, we summarize the principles of composition in Thangka and identify key spatial and co-occurrence priors specific to ritual implements. Based on these insights, we propose GPCDet: a guided by principles of composition detector that integrates domain-specific priors into the detection process. Specifically, we introduce a spatial coordinate attention module to emphasize critical spatial regions where implements frequently appear. In addition, we design a graph convolution network-auxiliary detection module to model inter-category co-occurrence, thereby enhancing feature representation and improving classification performance. Experiments on the newly curated ritual implements in Thangka (RITK) dataset show that GPCDet achieves substantial improvements over existing methods, establishing a new state-of-the-art baseline for this challenging task.
{"title":"Guided by Principles of Composition: A Domain-Specific Priors Based Detector for Recognizing Ritual Implements in Thangka","authors":"Jiachen Li, Hongyun Wang, Xiaolong Peng, Jinyu Xu, Qing Xie, Yanchun Ma, Wenbo Jiang, Mengzi Tang","doi":"10.1049/ipr2.70271","DOIUrl":"https://doi.org/10.1049/ipr2.70271","url":null,"abstract":"<p>Detecting ritual implements in Thangka paintings—such as swords and scriptures—remains challenging due to their intricate visual composition and symbolic complexity. Existing object detection models, typically trained on natural scenes, tend to perform poorly in this domain. To address this limitation, we summarize the principles of composition in Thangka and identify key spatial and co-occurrence priors specific to ritual implements. Based on these insights, we propose GPCDet: a guided by principles of composition detector that integrates domain-specific priors into the detection process. Specifically, we introduce a spatial coordinate attention module to emphasize critical spatial regions where implements frequently appear. In addition, we design a graph convolution network-auxiliary detection module to model inter-category co-occurrence, thereby enhancing feature representation and improving classification performance. Experiments on the newly curated ritual implements in Thangka (RITK) dataset show that GPCDet achieves substantial improvements over existing methods, establishing a new state-of-the-art baseline for this challenging task.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70271","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145963845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaqi Wang, Juntong Liu, Zhengyuan Xu, Yunfeng Zhou, Mingquan Ye
Accurate classification and segmentation of intracranial aneurysms from 3D point cloud data are critical for computer-aided diagnosis and surgical planning. However, existing point-based deep learning methods suffer from limited feature representation and poor segmentation performance on medical data due to insufficient training samples and complex geometric variations. M-PointNet introduces a novel multi-layer embedded deep learning architecture that significantly enhances the classification and segmentation of intracranial aneurysms through three key innovations: (1) an enhanced PointNet++ with an expanded hierarchical structure for better geometric feature extraction; (2) a multi-layer embedding mechanism that integrates preprocessed and resampled point cloud data at multiple hierarchical levels to enrich feature representation; and (3) a deep supervision strategy with auxiliary output layers to accelerate convergence and improve performance. Experiments on the IntrA dataset demonstrate that M-PointNet achieves 91.96% accuracy and a 0.923 F1-score in classification, surpassing baseline by 5.27% and 3.0%, respectively. For segmentation, it attains 83.85% IoU and 90.25% DSC for aneurysm regions and 95.81% IoU and 97.82% DSC for vessel regions. Additionally, its generalization capability is validated by a 92.8% accuracy on the ModelNet40 dataset. M-PointNet effectively addresses the challenges of medical point cloud analysis, achieving state-of-the-art performance in intracranial aneurysms classification and segmentation while maintaining robust cross-domain generalization.
{"title":"M-PointNet: A Multi-Layer Embedded Deep Learning Model for 3D Intracranial Aneurysm Classification and Segmentation","authors":"Jiaqi Wang, Juntong Liu, Zhengyuan Xu, Yunfeng Zhou, Mingquan Ye","doi":"10.1049/ipr2.70275","DOIUrl":"10.1049/ipr2.70275","url":null,"abstract":"<p>Accurate classification and segmentation of intracranial aneurysms from 3D point cloud data are critical for computer-aided diagnosis and surgical planning. However, existing point-based deep learning methods suffer from limited feature representation and poor segmentation performance on medical data due to insufficient training samples and complex geometric variations. M-PointNet introduces a novel multi-layer embedded deep learning architecture that significantly enhances the classification and segmentation of intracranial aneurysms through three key innovations: (1) an enhanced PointNet++ with an expanded hierarchical structure for better geometric feature extraction; (2) a multi-layer embedding mechanism that integrates preprocessed and resampled point cloud data at multiple hierarchical levels to enrich feature representation; and (3) a deep supervision strategy with auxiliary output layers to accelerate convergence and improve performance. Experiments on the IntrA dataset demonstrate that M-PointNet achieves 91.96% accuracy and a 0.923 F1-score in classification, surpassing baseline by 5.27% and 3.0%, respectively. For segmentation, it attains 83.85% IoU and 90.25% DSC for aneurysm regions and 95.81% IoU and 97.82% DSC for vessel regions. Additionally, its generalization capability is validated by a 92.8% accuracy on the ModelNet40 dataset. M-PointNet effectively addresses the challenges of medical point cloud analysis, achieving state-of-the-art performance in intracranial aneurysms classification and segmentation while maintaining robust cross-domain generalization.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70275","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145963838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a robust reversible watermarking algorithm based on a two-stage embedding strategy. In the first stage, the host image is partitioned into non-overlapping blocks. Embedding locations are selected within the inscribed circle of the host image by leveraging the Just Noticeable Distortion (JND) threshold, and copyright watermarks are embedded into the lower-order Tchebichef Moments (TMs) at these positions. The watermark quantisation error is converted to an integer value using an enhanced Distortion-Compensated Quantised Index Modulation (DC-QIM) technique. In the second stage, compensation data is embedded into image blocks located outside the inscribed circle, thereby ensuring reversibility in the absence of attacks. Prior to watermark extraction, a resynchronisation method tailored to the specific type of attack is applied to realign the block positions, significantly improving robustness against geometric distortions. Compared with existing advanced methods of the same kind, the proposed algorithm effectively addresses key limitations of traditional methods, including the trade-off between robustness and reversibility, redundancy in compensation data, and insufficient resistance to geometric attacks. The amount of compensation information is reduced by over 90%. Under comparable experimental settings, the average peak signal-to-noise ratio (PSNR) is improved by 0.5–2.2 dB.
Extensive experiments demonstrate that, in terms of resistance to noise interference, the performance of the proposed algorithm is comparable to that of methods based on Zernike Moments (ZMs) and Pseudo-Zernike Moments (PZMs). The algorithm achieves a bit error rate (BER) of less than 1% under Joint Photographic Experts Group (JPEG) compression, salt-and-pepper noise with intensity ≤ 0.017, Gaussian noise with variance ≤ 0.011, and rotation and scaling attacks under ideal resynchronisation conditions. When subjected to random cropping attacks of 128 × 128 pixels, the average BER remains below 7%. It also demonstrates robust resilience against various attacks, including filtering and translation.
{"title":"A Robust Reversible Watermarking Algorithm Resistant to Geometric Attacks Based on Tchebichef Moments","authors":"Wenjing Sun, Ling Zhang, Hongjun Zhang","doi":"10.1049/ipr2.70265","DOIUrl":"10.1049/ipr2.70265","url":null,"abstract":"<p>This paper presents a robust reversible watermarking algorithm based on a two-stage embedding strategy. In the first stage, the host image is partitioned into non-overlapping blocks. Embedding locations are selected within the inscribed circle of the host image by leveraging the Just Noticeable Distortion (JND) threshold, and copyright watermarks are embedded into the lower-order Tchebichef Moments (TMs) at these positions. The watermark quantisation error is converted to an integer value using an enhanced Distortion-Compensated Quantised Index Modulation (DC-QIM) technique. In the second stage, compensation data is embedded into image blocks located outside the inscribed circle, thereby ensuring reversibility in the absence of attacks. Prior to watermark extraction, a resynchronisation method tailored to the specific type of attack is applied to realign the block positions, significantly improving robustness against geometric distortions. Compared with existing advanced methods of the same kind, the proposed algorithm effectively addresses key limitations of traditional methods, including the trade-off between robustness and reversibility, redundancy in compensation data, and insufficient resistance to geometric attacks. The amount of compensation information is reduced by over 90%. Under comparable experimental settings, the average peak signal-to-noise ratio (PSNR) is improved by 0.5–2.2 dB.</p><p>Extensive experiments demonstrate that, in terms of resistance to noise interference, the performance of the proposed algorithm is comparable to that of methods based on Zernike Moments (ZMs) and Pseudo-Zernike Moments (PZMs). The algorithm achieves a bit error rate (BER) of less than 1% under Joint Photographic Experts Group (JPEG) compression, salt-and-pepper noise with intensity ≤ 0.017, Gaussian noise with variance ≤ 0.011, and rotation and scaling attacks under ideal resynchronisation conditions. When subjected to random cropping attacks of 128 × 128 pixels, the average BER remains below 7%. It also demonstrates robust resilience against various attacks, including filtering and translation.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70265","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145963839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}