EFFECT OF HYPERPARAMETERS ON DEEPLABV3+ PERFORMANCE TO SEGMENT WATER BODIES IN RGB IMAGES

Onteddu Chaitanya Reddy, Illa Dinesh Kumar, Pingali Sathvika, Sajith Variyar, Sowmya, R. Sivanpillai
{"title":"EFFECT OF HYPERPARAMETERS ON DEEPLABV3+ PERFORMANCE TO SEGMENT WATER BODIES IN RGB IMAGES","authors":"Onteddu Chaitanya Reddy, Illa Dinesh Kumar, Pingali Sathvika, Sajith Variyar, Sowmya, R. Sivanpillai","doi":"10.5194/isprs-archives-xlviii-m-3-2023-203-2023","DOIUrl":null,"url":null,"abstract":"Abstract. Deep Learning (DL) networks used in image segmentation tasks must be trained with input images and corresponding masks that identify target features in them. DL networks learn by iteratively adjusting the weights of interconnected layers using backpropagation, a process that involves calculating gradients and minimizing a loss function. This allows the network to learn patterns and relationships in the data, enabling it to make predictions or classifications on new, unseen data. Training any DL network requires specifying values of the hyperparameters such as input image size, batch size, and number of epochs among others. Failure to specify optimal values for the parameters will increase the training time or result in incomplete learning. The rationale of this study was to evaluate the effect of input image and batch sizes on the performance of DeepLabV3+ using Sentinel 2 A/B RGB images and labels obtained from Kaggle. We trained DeepLabV3+ network six times with two sets of input images of 128 × 128-pixel, and 256 × 256-pixel dimensions with 4, 8 and 16 batch sizes. The model is trained for 100 epochs to ensure that the loss plot reaches saturation and the model converged to a stable solution. Predicted masks generated by each model were compared to their corresponding test mask images based on accuracy, precision, recall and F1 scores. Results from this study demonstrated that image size of 256 × 256 and batch size 4 achieved highest performance. It can also be inferred that larger input image size improved DeepLabV3+ model performance.\n","PeriodicalId":30634,"journal":{"name":"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/isprs-archives-xlviii-m-3-2023-203-2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract. Deep Learning (DL) networks used in image segmentation tasks must be trained with input images and corresponding masks that identify target features in them. DL networks learn by iteratively adjusting the weights of interconnected layers using backpropagation, a process that involves calculating gradients and minimizing a loss function. This allows the network to learn patterns and relationships in the data, enabling it to make predictions or classifications on new, unseen data. Training any DL network requires specifying values of the hyperparameters such as input image size, batch size, and number of epochs among others. Failure to specify optimal values for the parameters will increase the training time or result in incomplete learning. The rationale of this study was to evaluate the effect of input image and batch sizes on the performance of DeepLabV3+ using Sentinel 2 A/B RGB images and labels obtained from Kaggle. We trained DeepLabV3+ network six times with two sets of input images of 128 × 128-pixel, and 256 × 256-pixel dimensions with 4, 8 and 16 batch sizes. The model is trained for 100 epochs to ensure that the loss plot reaches saturation and the model converged to a stable solution. Predicted masks generated by each model were compared to their corresponding test mask images based on accuracy, precision, recall and F1 scores. Results from this study demonstrated that image size of 256 × 256 and batch size 4 achieved highest performance. It can also be inferred that larger input image size improved DeepLabV3+ model performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
超参数对deep plabv3 + RGB图像水体分割性能的影响
摘要用于图像分割任务的深度学习(DL)网络必须使用输入图像和相应的掩码进行训练,以识别其中的目标特征。深度学习网络通过使用反向传播迭代调整互连层的权重来学习,这一过程涉及计算梯度和最小化损失函数。这使得网络能够学习数据中的模式和关系,使其能够对新的、未见过的数据进行预测或分类。训练任何深度学习网络都需要指定超参数的值,如输入图像大小、批处理大小和epoch数量等。如果不能指定参数的最优值,则会增加训练时间或导致学习不完全。本研究的基本原理是使用Sentinel 2 A/B RGB图像和从Kaggle获得的标签来评估输入图像和批大小对DeepLabV3+性能的影响。我们对DeepLabV3+网络进行了6次训练,输入图像尺寸分别为128 × 128像素和256 × 256像素,batch size分别为4、8和16。对模型进行100次epoch的训练,以保证损失图达到饱和,模型收敛到稳定解。根据准确率、精密度、召回率和F1分数,将每个模型生成的预测掩模与相应的测试掩模图像进行比较。研究结果表明,图像大小为256 × 256和批处理大小为4时,获得了最高的性能。也可以推断,更大的输入图像尺寸提高了DeepLabV3+模型的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.70
自引率
0.00%
发文量
949
审稿时长
16 weeks
期刊最新文献
EVALUATION OF CONSUMER-GRADE AND SURVEY-GRADE UAV-LIDAR EVALUATING GEOMETRY OF AN INDOOR SCENARIO WITH OCCLUSIONS BASED ON TOTAL STATION MEASUREMENTS OF WALL ELEMENTS INVESTIGATION ON THE USE OF NeRF FOR HERITAGE 3D DENSE RECONSTRUCTION FOR INTERIOR SPACES TERRESTRIAL 3D MAPPING OF FORESTS: GEOREFERENCING CHALLENGES AND SENSORS COMPARISONS SPECTRAL ANALYSIS OF IMAGES OF PLANTS UNDER STRESS USING A CLOSE-RANGE CAMERA
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1