EFFECT OF DATA QUALITY ON WATER BODY SEGMENTATION WITH DEEPLABV3+ ALGORITHM

Q2 Social Sciences The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences Pub Date : 2023-09-05 DOI:10.5194/isprs-archives-xlviii-m-3-2023-81-2023

Anirudh Edpuganti, Pillalamarri Akshaya, Jangala Gouthami, Sajith Variyar, Sowmya, R. Sivanpillai

{"title":"EFFECT OF DATA QUALITY ON WATER BODY SEGMENTATION WITH DEEPLABV3+ ALGORITHM","authors":"Anirudh Edpuganti, Pillalamarri Akshaya, Jangala Gouthami, Sajith Variyar, Sowmya, R. Sivanpillai","doi":"10.5194/isprs-archives-xlviii-m-3-2023-81-2023","DOIUrl":null,"url":null,"abstract":"Abstract. Training Deep Learning (DL) algorithms for segmenting features require hundreds to thousands of input data and corresponding labels. Generating thousands of input images and labels requires considerable resources and time. Hence, it is common practice to use opensource imagery data and labels available online. Most of these open-source data have little or no metadata describing their quality or suitability making it problematic for training or evaluating DL models. This study evaluated the effect of data quality on training DeepLabV3+, using Sentinel 2 A/B RGB images and labels obtained from Kaggle. We generated subsets of 256 × 256 pixels, and 10% of these images (802) were set aside for testing. First, we trained and validated the DeepLabV3+ model with the remaining images. Second, we removed images with incorrect labels and trained another DeepLabV3+ network. Finally, we trained the third DeepLabV3+ network after removing images with turbid water or with floating vegetation. All three trained models were evaluated with test images and then we calculated accuracy metrics. As the quality of the input images improved, accuracy of the predicted masks generated from the first model increased from 92.8% to 94.3% in the second model. The third model’s accuracy was 96.4%, demonstrating the network’s ability to better learn and predict water bodies when the input data had fewer class variations. Based on the results we recommend assessing the quality of open-source data for incorrect labels and variations in the target class prior to training DeepLabV3+ or any other DL network.\n","PeriodicalId":30634,"journal":{"name":"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/isprs-archives-xlviii-m-3-2023-81-2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 1

Abstract

Abstract. Training Deep Learning (DL) algorithms for segmenting features require hundreds to thousands of input data and corresponding labels. Generating thousands of input images and labels requires considerable resources and time. Hence, it is common practice to use opensource imagery data and labels available online. Most of these open-source data have little or no metadata describing their quality or suitability making it problematic for training or evaluating DL models. This study evaluated the effect of data quality on training DeepLabV3+, using Sentinel 2 A/B RGB images and labels obtained from Kaggle. We generated subsets of 256 × 256 pixels, and 10% of these images (802) were set aside for testing. First, we trained and validated the DeepLabV3+ model with the remaining images. Second, we removed images with incorrect labels and trained another DeepLabV3+ network. Finally, we trained the third DeepLabV3+ network after removing images with turbid water or with floating vegetation. All three trained models were evaluated with test images and then we calculated accuracy metrics. As the quality of the input images improved, accuracy of the predicted masks generated from the first model increased from 92.8% to 94.3% in the second model. The third model’s accuracy was 96.4%, demonstrating the network’s ability to better learn and predict water bodies when the input data had fewer class variations. Based on the results we recommend assessing the quality of open-source data for incorrect labels and variations in the target class prior to training DeepLabV3+ or any other DL network.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

数据质量对deeplabv3 +算法水体分割的影响

摘要用于分割特征的训练深度学习（DL）算法需要数百到数千个输入数据和相应的标签。生成数千个输入图像和标签需要大量的资源和时间。因此，使用在线可用的开源图像数据和标签是一种常见的做法。这些开源数据中的大多数几乎没有或根本没有描述其质量或适用性的元数据，这使得训练或评估DL模型成为问题。本研究使用从Kaggle获得的Sentinel 2 A/B RGB图像和标签，评估了数据质量对训练DeepLabV3+的影响。我们生成了256个子集 × 256个像素并且这些图像（802）的10%被留出用于测试。首先，我们用剩下的图像对DeepLabV3+模型进行了训练和验证。其次，我们删除了带有错误标签的图像，并训练了另一个DeepLabV3+网络。最后，我们在去除含有浑浊水或漂浮植被的图像后，训练了第三个DeepLabV3+网络。所有三个训练的模型都用测试图像进行了评估，然后我们计算了准确性指标。随着输入图像质量的提高，从第一模型生成的预测掩模的精度从92.8%提高到第二模型中的94.3%。第三个模型的准确率为96.4%，表明当输入数据的类别变化较小时，该网络能够更好地学习和预测水体。根据结果，我们建议在训练DeepLabV3+或任何其他DL网络之前，评估目标类中不正确标签和变化的开源数据的质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊