Pablo Alonso, Jon Ander Íñiguez de Gordoa, Juan Diego Ortega, Marcos Nieto
Ensuring the safety of water airport runways is essential for the correct operation of seaplane flights. Among other tasks, airport operators must identify and remove various objects that may have drifted into the runway area. In this paper, the authors propose a complete and embedded-friendly waterway obstacle detection pipeline that runs on a camera-equipped drone. This system uses a class-agnostic version of the YOLOv7 detector, which is capable of detecting objects regardless of its class. Additionally, through the usage of the GPS data of the drone and camera parameters, the location of the objects are pinpointed with 0.58 m Distance Root Mean Square. In our own annotated dataset, the system is capable of generating alerts for detected objects with a recall of 0.833 and a precision of 1.
{"title":"Efficient class-agnostic obstacle detection for UAV-assisted waterway inspection systems","authors":"Pablo Alonso, Jon Ander Íñiguez de Gordoa, Juan Diego Ortega, Marcos Nieto","doi":"10.1049/cvi2.12319","DOIUrl":"https://doi.org/10.1049/cvi2.12319","url":null,"abstract":"<p>Ensuring the safety of water airport runways is essential for the correct operation of seaplane flights. Among other tasks, airport operators must identify and remove various objects that may have drifted into the runway area. In this paper, the authors propose a complete and embedded-friendly waterway obstacle detection pipeline that runs on a camera-equipped drone. This system uses a class-agnostic version of the YOLOv7 detector, which is capable of detecting objects regardless of its class. Additionally, through the usage of the GPS data of the drone and camera parameters, the location of the objects are pinpointed with 0.58 m Distance Root Mean Square. In our own annotated dataset, the system is capable of generating alerts for detected objects with a recall of 0.833 and a precision of 1.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1087-1096"},"PeriodicalIF":1.5,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12319","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomer Gadot, Ștefan Istrate, Hyungwon Kim, Dan Morris, Sara Beery, Tanya Birch, Jorge Ahumada
Camera traps facilitate non-invasive wildlife monitoring, but their widespread adoption has created a data processing bottleneck: a camera trap survey can create millions of images, and the labour required to review those images strains the resources of conservation organisations. AI is a promising approach for accelerating image review, but AI tools for camera trap data are imperfect; in particular, classifying small animals remains difficult, and accuracy falls off outside the ecosystems in which a model was trained. It has been proposed that incorporating an object detector into an image analysis pipeline may help address these challenges, but the benefit of object detection has not been systematically evaluated in the literature. In this work, the authors assess the hypothesis that classifying animals cropped from camera trap images using a species-agnostic detector yields better accuracy than classifying whole images. We find that incorporating an object detection stage into an image classification pipeline yields a macro-average F1 improvement of around 25% on a large, long-tailed dataset; this improvement is reproducible on a large public dataset and a smaller public benchmark dataset. The authors describe a classification architecture that performs well for both whole and detector-cropped images, and demonstrate that this architecture yields state-of-the-art benchmark accuracy.
{"title":"To crop or not to crop: Comparing whole-image and cropped classification on a large dataset of camera trap images","authors":"Tomer Gadot, Ștefan Istrate, Hyungwon Kim, Dan Morris, Sara Beery, Tanya Birch, Jorge Ahumada","doi":"10.1049/cvi2.12318","DOIUrl":"https://doi.org/10.1049/cvi2.12318","url":null,"abstract":"<p>Camera traps facilitate non-invasive wildlife monitoring, but their widespread adoption has created a data processing bottleneck: a camera trap survey can create millions of images, and the labour required to review those images strains the resources of conservation organisations. AI is a promising approach for accelerating image review, but AI tools for camera trap data are imperfect; in particular, classifying small animals remains difficult, and accuracy falls off outside the ecosystems in which a model was trained. It has been proposed that incorporating an object detector into an image analysis pipeline may help address these challenges, but the benefit of object detection has not been systematically evaluated in the literature. In this work, the authors assess the hypothesis that classifying animals cropped from camera trap images using a species-agnostic detector yields better accuracy than classifying whole images. We find that incorporating an object detection stage into an image classification pipeline yields a macro-average F1 improvement of around 25% on a large, long-tailed dataset; this improvement is reproducible on a large public dataset and a smaller public benchmark dataset. The authors describe a classification architecture that performs well for both whole and detector-cropped images, and demonstrate that this architecture yields state-of-the-art benchmark accuracy.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1193-1208"},"PeriodicalIF":1.5,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12318","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computational photography is a combination of novel optical designs and processing methods to capture high-dimensional visual information. As an emerged promising technique, light field (LF) imaging measures the lighting, reflectance, focus, geometry and viewpoint in the free space, which has been widely explored for depth estimation, view synthesis, refocus, rendering, 3D displays, microscopy and other applications in computer vision in the past decades. In this paper, the authors present a comprehensive research survey on the LF imaging theory, technology and application. Firstly, the LF imaging process based on a MicroLens Array structure is derived, that is MLA-LF. Subsequently, the innovations of LF imaging technology are presented in terms of the imaging prototype, consumer LF camera and LF displays in Virtual Reality (VR) and Augmented Reality (AR). Finally the applications and challenges of LF imaging integrating with deep learning models are analysed, which consist of depth estimation, saliency detection, semantic segmentation, de-occlusion and defocus deblurring in recent years. It is believed that this paper will be a good reference for the future research on LF imaging technology in Artificial Intelligence era.
{"title":"A comprehensive research on light field imaging: Theory and application","authors":"Fei Liu, Yunlong Wang, Qing Yang, Shubo Zhou, Kunbo Zhang","doi":"10.1049/cvi2.12321","DOIUrl":"https://doi.org/10.1049/cvi2.12321","url":null,"abstract":"<p>Computational photography is a combination of novel optical designs and processing methods to capture high-dimensional visual information. As an emerged promising technique, light field (LF) imaging measures the lighting, reflectance, focus, geometry and viewpoint in the free space, which has been widely explored for depth estimation, view synthesis, refocus, rendering, 3D displays, microscopy and other applications in computer vision in the past decades. In this paper, the authors present a comprehensive research survey on the LF imaging theory, technology and application. Firstly, the LF imaging process based on a MicroLens Array structure is derived, that is MLA-LF. Subsequently, the innovations of LF imaging technology are presented in terms of the imaging prototype, consumer LF camera and LF displays in Virtual Reality (VR) and Augmented Reality (AR). Finally the applications and challenges of LF imaging integrating with deep learning models are analysed, which consist of depth estimation, saliency detection, semantic segmentation, de-occlusion and defocus deblurring in recent years. It is believed that this paper will be a good reference for the future research on LF imaging technology in Artificial Intelligence era.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1269-1284"},"PeriodicalIF":1.5,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Urban remote sensing image semantic segmentation has a wide range of applications, such as urban planning, resource exploration, intelligent transportation, and other scenarios. Although UNetFormer performs well by introducing the self-attention mechanism of Transformer, it still faces challenges arising from relatively low segmentation accuracy and significant edge segmentation errors. To this end, this paper proposes DEUFormer by employing a special weighted sum method to fuse the features of the encoder and the decoder, thus capturing both local details and global context information. Moreover, an Enhanced Feature Refinement Head is designed to finely re-weight features on the channel dimension and narrow the semantic gap between shallow and deep features, thereby enhancing multi-scale feature extraction. Additionally, an Edge-Guided Context Module is introduced to enhance edge areas through effective edge detection, which can improve edge information extraction. Experimental results show that DEUFormer achieves an average Mean Intersection over Union (mIoU) of 53.8% on the LoveDA dataset and 69.1% on the UAVid dataset. Notably, the mIoU of buildings in the LoveDA dataset is 5.0% higher than that of UNetFormer. The proposed model outperforms methods such as UNetFormer on multiple datasets, which demonstrates its effectiveness.
{"title":"DEUFormer: High-precision semantic segmentation for urban remote sensing images","authors":"Xinqi Jia, Xiaoyong Song, Lei Rao, Guangyu Fan, Songlin Cheng, Niansheng Chen","doi":"10.1049/cvi2.12313","DOIUrl":"https://doi.org/10.1049/cvi2.12313","url":null,"abstract":"<p>Urban remote sensing image semantic segmentation has a wide range of applications, such as urban planning, resource exploration, intelligent transportation, and other scenarios. Although UNetFormer performs well by introducing the self-attention mechanism of Transformer, it still faces challenges arising from relatively low segmentation accuracy and significant edge segmentation errors. To this end, this paper proposes DEUFormer by employing a special weighted sum method to fuse the features of the encoder and the decoder, thus capturing both local details and global context information. Moreover, an Enhanced Feature Refinement Head is designed to finely re-weight features on the channel dimension and narrow the semantic gap between shallow and deep features, thereby enhancing multi-scale feature extraction. Additionally, an Edge-Guided Context Module is introduced to enhance edge areas through effective edge detection, which can improve edge information extraction. Experimental results show that DEUFormer achieves an average Mean Intersection over Union (mIoU) of 53.8% on the LoveDA dataset and 69.1% on the UAVid dataset. Notably, the mIoU of buildings in the LoveDA dataset is 5.0% higher than that of UNetFormer. The proposed model outperforms methods such as UNetFormer on multiple datasets, which demonstrates its effectiveness.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1209-1222"},"PeriodicalIF":1.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12313","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}