On the design of deep learning-based control algorithms for visually guided UAVs engaged in power tower inspection tasks.

IF 2.9 Q2 ROBOTICS Frontiers in Robotics and AI Pub Date : 2024-04-26 eCollection Date: 2024-01-01 DOI:10.3389/frobt.2024.1378149

Guillaume Maitre, Dimitri Martinot, Elio Tuci

{"title":"On the design of deep learning-based control algorithms for visually guided UAVs engaged in power tower inspection tasks.","authors":"Guillaume Maitre, Dimitri Martinot, Elio Tuci","doi":"10.3389/frobt.2024.1378149","DOIUrl":null,"url":null,"abstract":"<p><p>This paper focuses on the design of Convolution Neural Networks to visually guide an autonomous Unmanned Aerial Vehicle required to inspect power towers. The network is required to precisely segment images taken by a camera mounted on a UAV in order to allow a motion module to generate collision-free and inspection-relevant manoeuvres of the UAV along different types of towers. The images segmentation process is particularly challenging not only because of the different structures of the towers but also because of the enormous variability of the background, which can vary from the uniform blue of the sky to the multi-colour complexity of a rural, forest, or urban area. To be able to train networks that are robust enough to deal with the task variability, without incurring into a labour-intensive and costly annotation process of physical-world images, we have carried out a comparative study in which we evaluate the performances of networks trained either with synthetic images (i.e., the synthetic dataset), physical-world images (i.e., the physical-world dataset), or a combination of these two types of images (i.e., the hybrid dataset). The network used is an attention-based U-NET. The synthetic images are created using photogrammetry, to accurately model power towers, and simulated environments modelling a UAV during inspection of different power towers in different settings. Our findings reveal that the network trained on the hybrid dataset outperforms the networks trained with the synthetic and the physical-world image datasets. Most notably, the networks trained with the hybrid dataset demonstrates a superior performance on multiples evaluation metrics related to the image-segmentation task. This suggests that, the combination of synthetic and physical-world images represents the best trade-off to minimise the costs related to capturing and annotating physical-world images, and to maximise the task performances. Moreover, the results of our study demonstrate the potential of photogrammetry in creating effective training datasets to design networks to automate the precise movement of visually-guided UAVs.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082499/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Robotics and AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frobt.2024.1378149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper focuses on the design of Convolution Neural Networks to visually guide an autonomous Unmanned Aerial Vehicle required to inspect power towers. The network is required to precisely segment images taken by a camera mounted on a UAV in order to allow a motion module to generate collision-free and inspection-relevant manoeuvres of the UAV along different types of towers. The images segmentation process is particularly challenging not only because of the different structures of the towers but also because of the enormous variability of the background, which can vary from the uniform blue of the sky to the multi-colour complexity of a rural, forest, or urban area. To be able to train networks that are robust enough to deal with the task variability, without incurring into a labour-intensive and costly annotation process of physical-world images, we have carried out a comparative study in which we evaluate the performances of networks trained either with synthetic images (i.e., the synthetic dataset), physical-world images (i.e., the physical-world dataset), or a combination of these two types of images (i.e., the hybrid dataset). The network used is an attention-based U-NET. The synthetic images are created using photogrammetry, to accurately model power towers, and simulated environments modelling a UAV during inspection of different power towers in different settings. Our findings reveal that the network trained on the hybrid dataset outperforms the networks trained with the synthetic and the physical-world image datasets. Most notably, the networks trained with the hybrid dataset demonstrates a superior performance on multiples evaluation metrics related to the image-segmentation task. This suggests that, the combination of synthetic and physical-world images represents the best trade-off to minimise the costs related to capturing and annotating physical-world images, and to maximise the task performances. Moreover, the results of our study demonstrate the potential of photogrammetry in creating effective training datasets to design networks to automate the precise movement of visually-guided UAVs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为执行电力塔检测任务的视觉制导无人机设计基于深度学习的控制算法。

本文的重点是设计卷积神经网络，为检查电塔所需的自主无人驾驶飞行器提供视觉引导。该网络需要对无人飞行器上安装的摄像头拍摄的图像进行精确分割，以便让运动模块生成无人飞行器沿不同类型的塔架进行无碰撞和与检查相关的机动操作。图像分割过程尤其具有挑战性，这不仅是因为塔楼的结构各不相同，还因为背景的巨大变异性，从天空的统一蓝色到农村、森林或城市地区的多色复杂性都可能发生变化。为了能够训练出足够强大的网络来应对任务的多变性，同时又不需要对物理世界的图像进行耗费大量人力和财力的标注，我们开展了一项比较研究，对使用合成图像（即合成数据集）、物理世界图像（即物理世界数据集）或这两种图像的组合（即混合数据集）训练的网络的性能进行了评估。使用的网络是基于注意力的 U-NET 网络。合成图像使用摄影测量法创建，以准确模拟电力塔，并模拟无人机在不同环境中检查不同电力塔的环境。我们的研究结果表明，使用混合数据集训练的网络优于使用合成和物理世界图像数据集训练的网络。最值得注意的是，使用混合数据集训练的网络在与图像分割任务相关的多个评估指标上都表现出色。这表明，合成图像和物理世界图像的结合是最佳的权衡方法，既能最大限度地降低捕获和注释物理世界图像的相关成本，又能最大限度地提高任务性能。此外，我们的研究结果还证明了摄影测量在创建有效训练数据集方面的潜力，这些数据集可用于设计网络，使视觉制导无人机的精确运动实现自动化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers in Robotics and AI ROBOTICS-

CiteScore

6.50

自引率

5.90%

发文量

355

审稿时长

14 weeks

期刊介绍： Frontiers in Robotics and AI publishes rigorously peer-reviewed research covering all theory and applications of robotics, technology, and artificial intelligence, from biomedical to space robotics.