IET Computer Vision最新文献

英文中文

A New Large-Scale Dataset for Marine Vessel Re-Identification Based on Swin Transformer Network in Ocean Surveillance Scenario

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2025-03-02 DOI: 10.1049/cvi2.70007

Zhi Lu, Liguo Sun, Pin Lv, Jiuwu Hao, Bo Tang, Xuanzhen Chen

In recent years, there has been an upward trend that marine vessels, an important object category in marine monitoring, have gradually become a research focal point in the field of computer vision, such as detection, tracking, and classification. Among them, marine vessel re-identification (Re-ID) emerges as a significant frontier research topics, which not only faces the dual challenge of huge intra-class and small inter-class differences, but also has complex environmental interference in the port monitoring scenarios. To propel advancements in marine vessel Re-ID technology, SwinTransReID, a framework grounded in the Swin Transformer for marine vessel Re-ID, is introduced. Specifically, the project initially encodes the triplet images separately as a sequence of blocks and construct a baseline model leveraging the Swin Transformer, achieving better performance on the Re-ID benchmark dataset in comparison to convolution neural network (CNN)-based approaches. And it introduces side information embedding (SIE) to further enhance the robust feature-learning capabilities of Swin Transformer, thus, integrating non-visual cues (orientation and type of vessel) and other auxiliary information (hull colour) through the insertion of learnable embedding modules. Additionally, the project presents VesselReID-1656, the first annotated large-scale benchmark dataset for vessel Re-ID in real-world ocean surveillance, comprising 135,866 images of 1656 vessels along with 5 orientations, 12 types, and 17 colours. The proposed method achieves 87.1 $� � � % � � �$ mAP and 96.1 $� � � % � � �$ Rank-1 accuracy on the newly-labelled challenging dataset, which surpasses the state-of-the-art (SOTA) method by 1.9 $� � � % � � �$ mAP regarding to performance. Moreover, extensive empirical results demonstrate the superiority of the proposed SwinTransReID on the person Market-1501 dataset, vehicle VeRi-776 dataset, and Boat Re-ID vessel dataset.

{"title":"A New Large-Scale Dataset for Marine Vessel Re-Identification Based on Swin Transformer Network in Ocean Surveillance Scenario","authors":"Zhi Lu, Liguo Sun, Pin Lv, Jiuwu Hao, Bo Tang, Xuanzhen Chen","doi":"10.1049/cvi2.70007","DOIUrl":"https://doi.org/10.1049/cvi2.70007","url":null,"abstract":"In recent years, there has been an upward trend that marine vessels, an important object category in marine monitoring, have gradually become a research focal point in the field of computer vision, such as detection, tracking, and classification. Among them, marine vessel re-identification (Re-ID) emerges as a significant frontier research topics, which not only faces the dual challenge of huge intra-class and small inter-class differences, but also has complex environmental interference in the port monitoring scenarios. To propel advancements in marine vessel Re-ID technology, SwinTransReID, a framework grounded in the Swin Transformer for marine vessel Re-ID, is introduced. Specifically, the project initially encodes the triplet images separately as a sequence of blocks and construct a baseline model leveraging the Swin Transformer, achieving better performance on the Re-ID benchmark dataset in comparison to convolution neural network (CNN)-based approaches. And it introduces side information embedding (SIE) to further enhance the robust feature-learning capabilities of Swin Transformer, thus, integrating non-visual cues (orientation and type of vessel) and other auxiliary information (hull colour) through the insertion of learnable embedding modules. Additionally, the project presents VesselReID-1656, the first annotated large-scale benchmark dataset for vessel Re-ID in real-world ocean surveillance, comprising 135,866 images of 1656 vessels along with 5 orientations, 12 types, and 17 colours. The proposed method achieves 87.1<math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>%</mi>\u0000 </mrow>\u0000 <annotation> $%$</annotation>\u0000 </semantics></math> mAP and 96.1<math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>%</mi>\u0000 </mrow>\u0000 <annotation> $%$</annotation>\u0000 </semantics></math> Rank-1 accuracy on the newly-labelled challenging dataset, which surpasses the state-of-the-art (SOTA) method by 1.9<math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>%</mi>\u0000 </mrow>\u0000 <annotation> $%$</annotation>\u0000 </semantics></math> mAP regarding to performance. Moreover, extensive empirical results demonstrate the superiority of the proposed SwinTransReID on the person Market-1501 dataset, vehicle VeRi-776 dataset, and Boat Re-ID vessel dataset.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143530544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Feature-Level Compensation and Alignment for Visible-Infrared Person Re-Identification

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2025-02-25 DOI: 10.1049/cvi2.70005

Husheng Dong, Ping Lu, Yuanfeng Yang, Xun Sun

Visible-infrared person re-identification (VI-ReID) aims to match pedestrian images captured by nonoverlapping visible and infrared cameras. Most existing compensation-based methods try to generate images of missing modality from the other ones. However, the generated images often fail to possess enough quality due to severe discrepancies between different modalities. Moreover, it is generally assumed that person images are roughly aligned during the extraction of part-based local features. However, this does not always hold true, typically when they are cropped via inaccurate pedestrian detectors. To alleviate such problems, the authors propose a novel feature-level compensation and alignment network (FCA-Net) for VI-ReID in this paper, which tries to compensate for the missing modality information on the channel-level and align part-based local features. Specifically, the visible and infrared features of low-level subnetworks are first processed by a channel feature compensation (CFC) module, which enforces the network to learn consistent distribution patterns of channel features, and thereby the cross-modality discrepancy is narrowed. To address spatial misalignment, a pairwise relation module (PRM) is introduced to incorporate human structural information into part-based local features, which can significantly enhance the feature discrimination power. Besides, a cross-modality part alignment loss (CPAL) is designed on the basis of a dynamic part matching algorithm, which can promote more accurate local matching. Extensive experiments on three standard VI-ReID datasets are conducted to validate the effectiveness of the proposed method, and the results show that state-of-the-art performance is achieved.

{"title":"Feature-Level Compensation and Alignment for Visible-Infrared Person Re-Identification","authors":"Husheng Dong, Ping Lu, Yuanfeng Yang, Xun Sun","doi":"10.1049/cvi2.70005","DOIUrl":"https://doi.org/10.1049/cvi2.70005","url":null,"abstract":"Visible-infrared person re-identification (VI-ReID) aims to match pedestrian images captured by nonoverlapping visible and infrared cameras. Most existing compensation-based methods try to generate images of missing modality from the other ones. However, the generated images often fail to possess enough quality due to severe discrepancies between different modalities. Moreover, it is generally assumed that person images are roughly aligned during the extraction of part-based local features. However, this does not always hold true, typically when they are cropped via inaccurate pedestrian detectors. To alleviate such problems, the authors propose a novel feature-level compensation and alignment network (FCA-Net) for VI-ReID in this paper, which tries to compensate for the missing modality information on the channel-level and align part-based local features. Specifically, the visible and infrared features of low-level subnetworks are first processed by a channel feature compensation (CFC) module, which enforces the network to learn consistent distribution patterns of channel features, and thereby the cross-modality discrepancy is narrowed. To address spatial misalignment, a pairwise relation module (PRM) is introduced to incorporate human structural information into part-based local features, which can significantly enhance the feature discrimination power. Besides, a cross-modality part alignment loss (CPAL) is designed on the basis of a dynamic part matching algorithm, which can promote more accurate local matching. Extensive experiments on three standard VI-ReID datasets are conducted to validate the effectiveness of the proposed method, and the results show that state-of-the-art performance is achieved.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143481439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancements in smart agriculture: A systematic literature review on state-of-the-art plant disease detection with computer vision

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2025-02-14 DOI: 10.1049/cvi2.70004

Esra Yilmaz, Sevim Ceylan Bocekci, Cengiz Safak, Kazim Yildiz

In an era of rapid digital transformation, ensuring sustainable and traceable food production is more crucial than ever. Plant diseases, a major threat to agriculture, lead to significant losses in crops and financial damage. Standard techniques for detecting diseases, though widespread, are lengthy and intensive work, especially in extensive agricultural settings. This systematic literature review examines the cutting-edge technologies in smart agriculture specifically computer vision, robotics, deep learning (DL), and Internet of Things (IoT) that are reshaping plant disease detection and management. By analysing 198 studies published between 2021 and 2023, from an initial pool of 19,838 papers, the authors reveal the dominance of DL, particularly with datasets such as PlantVillage, and highlight critical challenges, including dataset limitations, lack of geographical diversity, and the scarcity of real-world field data. Moreover, the authors explore the promising role of IoT, robotics, and drones in enhancing early disease detection, although the high costs and technological gaps present significant barriers for small-scale farmers, especially in developing countries. Through the preferred reporting items for systematic reviews and meta-analyses methodology, this review synthesises these findings, identifying key trends, uncovering research gaps, and offering actionable insights for the future of plant disease management in smart agriculture.

{"title":"Advancements in smart agriculture: A systematic literature review on state-of-the-art plant disease detection with computer vision","authors":"Esra Yilmaz, Sevim Ceylan Bocekci, Cengiz Safak, Kazim Yildiz","doi":"10.1049/cvi2.70004","DOIUrl":"https://doi.org/10.1049/cvi2.70004","url":null,"abstract":"In an era of rapid digital transformation, ensuring sustainable and traceable food production is more crucial than ever. Plant diseases, a major threat to agriculture, lead to significant losses in crops and financial damage. Standard techniques for detecting diseases, though widespread, are lengthy and intensive work, especially in extensive agricultural settings. This systematic literature review examines the cutting-edge technologies in smart agriculture specifically computer vision, robotics, deep learning (DL), and Internet of Things (IoT) that are reshaping plant disease detection and management. By analysing 198 studies published between 2021 and 2023, from an initial pool of 19,838 papers, the authors reveal the dominance of DL, particularly with datasets such as PlantVillage, and highlight critical challenges, including dataset limitations, lack of geographical diversity, and the scarcity of real-world field data. Moreover, the authors explore the promising role of IoT, robotics, and drones in enhancing early disease detection, although the high costs and technological gaps present significant barriers for small-scale farmers, especially in developing countries. Through the preferred reporting items for systematic reviews and meta-analyses methodology, this review synthesises these findings, identifying key trends, uncovering research gaps, and offering actionable insights for the future of plant disease management in smart agriculture.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Egocentric action anticipation from untrimmed videos

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2025-02-14 DOI: 10.1049/cvi2.12342

Ivan Rodin, Antonino Furnari, Giovanni Maria Farinella

Egocentric action anticipation involves predicting future actions performed by the camera wearer from egocentric video. Although the task has recently gained attention in the research community, current approaches often assume that input videos are ‘trimmed’, meaning that a short video sequence is sampled a fixed time before the beginning of the action. However, trimmed action anticipation has limited applicability in real-world scenarios, where it is crucial to deal with ‘untrimmed’ video inputs and the exact moment of action initiation cannot be assumed at test time. To address these limitations, an untrimmed action anticipation task is proposed, which, akin to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before actions take place. The authors introduce a benchmark evaluation procedure for methods designed to address this novel task and compare several baselines on the EPIC-KITCHENS-100 dataset. Through our experimental evaluation, testing a variety of models, the authors aim to better understand their performance in untrimmed action anticipation. Our results reveal that the performance of current models designed for trimmed action anticipation is limited, emphasising the need for further research in this area.

{"title":"Egocentric action anticipation from untrimmed videos","authors":"Ivan Rodin, Antonino Furnari, Giovanni Maria Farinella","doi":"10.1049/cvi2.12342","DOIUrl":"https://doi.org/10.1049/cvi2.12342","url":null,"abstract":"Egocentric action anticipation involves predicting future actions performed by the camera wearer from egocentric video. Although the task has recently gained attention in the research community, current approaches often assume that input videos are ‘trimmed’, meaning that a short video sequence is sampled a fixed time before the beginning of the action. However, trimmed action anticipation has limited applicability in real-world scenarios, where it is crucial to deal with ‘untrimmed’ video inputs and the exact moment of action initiation cannot be assumed at test time. To address these limitations, an untrimmed action anticipation task is proposed, which, akin to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before actions take place. The authors introduce a benchmark evaluation procedure for methods designed to address this novel task and compare several baselines on the EPIC-KITCHENS-100 dataset. Through our experimental evaluation, testing a variety of models, the authors aim to better understand their performance in untrimmed action anticipation. Our results reveal that the performance of current models designed for trimmed action anticipation is limited, emphasising the need for further research in this area.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12342","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Controlling semantics of diffusion-augmented data for unsupervised domain adaptation

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2025-02-07 DOI: 10.1049/cvi2.70002

Henrietta Ridley, Roberto Alcover-Couso, Juan C. SanMiguel

Unsupervised domain adaptation (UDA) offers a compelling solution to bridge the gap between labelled synthetic data and unlabelled real-world data for training semantic segmentation models, given the high costs associated with manual annotation. However, the visual differences between the synthetic and real images pose significant challenges to their practical applications. This work addresses these challenges through synthetic-to-real style transfer leveraging diffusion models. The authors’ proposal incorporates semantic controllers to guide the diffusion process and low-rank adaptations (LoRAs) to ensure that style-transferred images align with real-world aesthetics while preserving semantic layout. Moreover, the authors introduce quality metrics to rank the utility of generated images, enabling the selective use of high-quality images for training. To further enhance reliability, the authors propose a novel loss function that mitigates artefacts from the style transfer process by incorporating only pixels aligned with the original semantic labels. Experimental results demonstrate that the authors’ proposal outperforms selected state-of-the-art methods for image generation and UDA training, achieving optimal performance even with a smaller set of high-quality generated images. The authors’ code and models are available at http://www-vpu.eps.uam.es/ControllingSem4UDA/.

{"title":"Controlling semantics of diffusion-augmented data for unsupervised domain adaptation","authors":"Henrietta Ridley, Roberto Alcover-Couso, Juan C. SanMiguel","doi":"10.1049/cvi2.70002","DOIUrl":"https://doi.org/10.1049/cvi2.70002","url":null,"abstract":"Unsupervised domain adaptation (UDA) offers a compelling solution to bridge the gap between labelled synthetic data and unlabelled real-world data for training semantic segmentation models, given the high costs associated with manual annotation. However, the visual differences between the synthetic and real images pose significant challenges to their practical applications. This work addresses these challenges through synthetic-to-real style transfer leveraging diffusion models. The authors’ proposal incorporates semantic controllers to guide the diffusion process and low-rank adaptations (LoRAs) to ensure that style-transferred images align with real-world aesthetics while preserving semantic layout. Moreover, the authors introduce quality metrics to rank the utility of generated images, enabling the selective use of high-quality images for training. To further enhance reliability, the authors propose a novel loss function that mitigates artefacts from the style transfer process by incorporating only pixels aligned with the original semantic labels. Experimental results demonstrate that the authors’ proposal outperforms selected state-of-the-art methods for image generation and UDA training, achieving optimal performance even with a smaller set of high-quality generated images. The authors’ code and models are available at http://www-vpu.eps.uam.es/ControllingSem4UDA/.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TomoSAR 3D reconstruction: Cascading adversarial strategy with sparse observation trajectory

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2025-02-04 DOI: 10.1049/cvi2.70001

Xian Zhu, Xiaoqin Zeng, Yuhua Cong, Yanhao Huang, Ziyan Zhu, Yantao Luo

Synthetic aperture radar tomography (TomoSAR) has shown significant potential for the 3D Reconstruction of buildings, especially in critical areas such as topographic mapping, urban planning, and disaster monitoring. In practical applications, the constraints of observation trajectories frequently lead to the acquisition of a limited dataset of sparse SAR images, presenting challenges for TomoSAR 3D Reconstruction and affecting its signal-to-noise ratio and elevation resolution performance. The study introduces a cascade adversarial strategy based on the Conditional Generative Adversarial Network (CGAN), optimised explicitly for sparse observation trajectories. In the preliminary phase of the CGAN, the U-Net architecture was employed to capture more global information and enhance image detail recovery capability, which is subsequently utilised in the cascade refinement network. The ResNet34 residual network in the advanced network stage was adopted to bolster feature extraction and image generation capabilities further. Based on experimental validation performed on the curated TomoSAR 3D super-resolution dataset tailored for buildings, the findings reveal that the methodology yields a notable enhancement in image quality and accuracy compared to other techniques.

{"title":"TomoSAR 3D reconstruction: Cascading adversarial strategy with sparse observation trajectory","authors":"Xian Zhu, Xiaoqin Zeng, Yuhua Cong, Yanhao Huang, Ziyan Zhu, Yantao Luo","doi":"10.1049/cvi2.70001","DOIUrl":"https://doi.org/10.1049/cvi2.70001","url":null,"abstract":"Synthetic aperture radar tomography (TomoSAR) has shown significant potential for the 3D Reconstruction of buildings, especially in critical areas such as topographic mapping, urban planning, and disaster monitoring. In practical applications, the constraints of observation trajectories frequently lead to the acquisition of a limited dataset of sparse SAR images, presenting challenges for TomoSAR 3D Reconstruction and affecting its signal-to-noise ratio and elevation resolution performance. The study introduces a cascade adversarial strategy based on the Conditional Generative Adversarial Network (CGAN), optimised explicitly for sparse observation trajectories. In the preliminary phase of the CGAN, the U-Net architecture was employed to capture more global information and enhance image detail recovery capability, which is subsequently utilised in the cascade refinement network. The ResNet34 residual network in the advanced network stage was adopted to bolster feature extraction and image generation capabilities further. Based on experimental validation performed on the curated TomoSAR 3D super-resolution dataset tailored for buildings, the findings reveal that the methodology yields a notable enhancement in image quality and accuracy compared to other techniques.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143111862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human activity recognition: A review of deep learning-based methods

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2025-02-01 DOI: 10.1049/cvi2.70003

Sanjay Jyoti Dutta, Tossapon Boongoen, Reyer Zwiggelaar

Human Activity Recognition (HAR) covers methods for automatically identifying human activities from a stream of data. End-users of HAR methods cover a range of sectors, including health, self-care, amusement, safety and monitoring. In this survey, the authors provide a thorough overview of deep learning based and detailed analysis of work that was performed between 2018 and 2023 in a variety of fields related to HAR with a focus on device-free solutions. It also presents the categorisation and taxonomy of the covered publication and an overview of publicly available datasets. To complete this review, the limitations of existing approaches and potential future research directions are discussed.

引用次数: 0

A principal direction-guided local voxelisation structural feature approach for point cloud registration

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2025-01-29 DOI: 10.1049/cvi2.70000

Chenyang Li, Yansong Duan

Point cloud registration is a crucial aspect of computer vision and 3D reconstruction. Traditional registration methods often depend on global features or iterative optimisation, leading to inefficiencies and imprecise outcomes when processing complex scene point cloud data. To address these challenges, the authors introduce a principal direction-guided local voxelisation structural feature (PDLVSF) approach for point cloud registration. This method reliably identifies feature points regardless of initial positioning. Approach begins with the 3D Harris algorithm to extract feature points, followed by determining the principal direction within the feature points' radius neighbourhood to ensure rotational invariance. For scale invariance, voxel grid normalisation is utilised to maximise the point cloud's geometric resolution and make it scale-independent. Cosine similarity is then employed for effective feature matching, identifying corresponding feature point pairs and determining transformation parameters between point clouds. Experimental validations on various datasets, including the real terrain dataset, demonstrate the effectiveness of our method. Results indicate superior performance in root mean square error (RMSE) and registration accuracy compared to state-of-the-art methods, particularly in scenarios with high noise, limited overlap, and significant initial pose rotation. The real terrain dataset is publicly available at https://github.com/black-2000/Real-terrain-data.

{"title":"A principal direction-guided local voxelisation structural feature approach for point cloud registration","authors":"Chenyang Li, Yansong Duan","doi":"10.1049/cvi2.70000","DOIUrl":"https://doi.org/10.1049/cvi2.70000","url":null,"abstract":"Point cloud registration is a crucial aspect of computer vision and 3D reconstruction. Traditional registration methods often depend on global features or iterative optimisation, leading to inefficiencies and imprecise outcomes when processing complex scene point cloud data. To address these challenges, the authors introduce a principal direction-guided local voxelisation structural feature (PDLVSF) approach for point cloud registration. This method reliably identifies feature points regardless of initial positioning. Approach begins with the 3D Harris algorithm to extract feature points, followed by determining the principal direction within the feature points' radius neighbourhood to ensure rotational invariance. For scale invariance, voxel grid normalisation is utilised to maximise the point cloud's geometric resolution and make it scale-independent. Cosine similarity is then employed for effective feature matching, identifying corresponding feature point pairs and determining transformation parameters between point clouds. Experimental validations on various datasets, including the real terrain dataset, demonstrate the effectiveness of our method. Results indicate superior performance in root mean square error (RMSE) and registration accuracy compared to state-of-the-art methods, particularly in scenarios with high noise, limited overlap, and significant initial pose rotation. The real terrain dataset is publicly available at https://github.com/black-2000/Real-terrain-data.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143120630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NBCDC-YOLOv8: A new framework to improve blood cell detection and classification based on YOLOv8

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2025-01-22 DOI: 10.1049/cvi2.12341

Xuan Chen, Linxuan Li, Xiaoyu Liu, Fengjuan Yin, Xue Liu, Xiaoxiao Zhu, Yufeng Wang, Fanbin Meng

In recent years, computer technology has successfully permeated all areas of medicine and its management, and it now offers doctors an accurate and rapid means of diagnosis. Existing blood cell detection methods suffer from low accuracy, which is caused by the uneven distribution, high density, and mutual occlusion of different blood cell types in blood microscope images, this article introduces NBCDC-YOLOv8: a new framework to improve blood cell detection and classification based on YOLOv8. Our framework innovates on several fronts: it uses Mosaic data augmentation to enrich the dataset and add small targets, incorporates a space to depth convolution (SPD-Conv) tailored for cells that are small and have low resolution, and introduces the Multi-Separated and Enhancement Attention Module (MultiSEAM) to enhance feature map resolution. Additionally, it integrates a bidirectional feature pyramid network (BiFPN) for effective multi-scale feature fusion and includes four detection heads to improve recognition accuracy of various cell sizes, especially small target platelets. Evaluated on the Blood Cell Classification Dataset (BCCD), NBCDC-YOLOv8 obtains a mean average precision (mAP) of 94.7%, and thus surpasses the original YOLOv8n by 2.3%.

{"title":"NBCDC-YOLOv8: A new framework to improve blood cell detection and classification based on YOLOv8","authors":"Xuan Chen, Linxuan Li, Xiaoyu Liu, Fengjuan Yin, Xue Liu, Xiaoxiao Zhu, Yufeng Wang, Fanbin Meng","doi":"10.1049/cvi2.12341","DOIUrl":"https://doi.org/10.1049/cvi2.12341","url":null,"abstract":"In recent years, computer technology has successfully permeated all areas of medicine and its management, and it now offers doctors an accurate and rapid means of diagnosis. Existing blood cell detection methods suffer from low accuracy, which is caused by the uneven distribution, high density, and mutual occlusion of different blood cell types in blood microscope images, this article introduces NBCDC-YOLOv8: a new framework to improve blood cell detection and classification based on YOLOv8. Our framework innovates on several fronts: it uses Mosaic data augmentation to enrich the dataset and add small targets, incorporates a space to depth convolution (SPD-Conv) tailored for cells that are small and have low resolution, and introduces the Multi-Separated and Enhancement Attention Module (MultiSEAM) to enhance feature map resolution. Additionally, it integrates a bidirectional feature pyramid network (BiFPN) for effective multi-scale feature fusion and includes four detection heads to improve recognition accuracy of various cell sizes, especially small target platelets. Evaluated on the Blood Cell Classification Dataset (BCCD), NBCDC-YOLOv8 obtains a mean average precision (mAP) of 94.7%, and thus surpasses the original YOLOv8n by 2.3%.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143363050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Re-identification of patterned animals by multi-image feature aggregation and geometric similarity

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2025-01-08 DOI: 10.1049/cvi2.12337

Ekaterina Nepovinnykh, Veikka Immonen, Tuomas Eerola, Charles V. Stewart, Heikki Kälviäinen

Image-based re-identification of animal individuals allows gathering of information such as population size and migration patterns of the animals over time. This, together with large image volumes collected using camera traps and crowdsourcing, opens novel possibilities to study animal populations. For many species, the re-identification can be done by analysing the permanent fur, feather, or skin patterns that are unique to each individual. In this paper, the authors study pattern feature aggregation based re-identification and consider two ways of improving accuracy: (1) aggregating pattern image features over multiple images and (2) combining the pattern appearance similarity obtained by feature aggregation and geometric pattern similarity. Aggregation over multiple database images of the same individual allows to obtain more comprehensive and robust descriptors while reducing the computation time. On the other hand, combining the two similarity measures allows to efficiently utilise both the local and global pattern features, providing a general re-identification approach that can be applied to a wide variety of different pattern types. In the experimental part of the work, the authors demonstrate that the proposed method achieves promising re-identification accuracies for Saimaa ringed seals and whale sharks without species-specific training or fine-tuning.

{"title":"Re-identification of patterned animals by multi-image feature aggregation and geometric similarity","authors":"Ekaterina Nepovinnykh, Veikka Immonen, Tuomas Eerola, Charles V. Stewart, Heikki Kälviäinen","doi":"10.1049/cvi2.12337","DOIUrl":"https://doi.org/10.1049/cvi2.12337","url":null,"abstract":"Image-based re-identification of animal individuals allows gathering of information such as population size and migration patterns of the animals over time. This, together with large image volumes collected using camera traps and crowdsourcing, opens novel possibilities to study animal populations. For many species, the re-identification can be done by analysing the permanent fur, feather, or skin patterns that are unique to each individual. In this paper, the authors study pattern feature aggregation based re-identification and consider two ways of improving accuracy: (1) aggregating pattern image features over multiple images and (2) combining the pattern appearance similarity obtained by feature aggregation and geometric pattern similarity. Aggregation over multiple database images of the same individual allows to obtain more comprehensive and robust descriptors while reducing the computation time. On the other hand, combining the two similarity measures allows to efficiently utilise both the local and global pattern features, providing a general re-identification approach that can be applied to a wide variety of different pattern types. In the experimental part of the work, the authors demonstrate that the proposed method achieves promising re-identification accuracies for Saimaa ringed seals and whale sharks without species-specific training or fine-tuning.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12337","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IET Computer Vision

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀