四种机器学习卫星图像分类方法的超参数优化分析

IF 2.1 3区地球科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computational Geosciences Pub Date : 2024-04-05 DOI:10.1007/s10596-024-10285-y

Francisco Alonso-Sarría, Carmen Valdivieso-Ros, Francisco Gomariz-Castillo

{"title":"四种机器学习卫星图像分类方法的超参数优化分析","authors":"Francisco Alonso-Sarría, Carmen Valdivieso-Ros, Francisco Gomariz-Castillo","doi":"10.1007/s10596-024-10285-y","DOIUrl":null,"url":null,"abstract":"<p>The classification of land use and land cover (LULC) from remotely sensed imagery in semi-arid Mediterranean areas is a challenging task due to the fragmentation of the landscape and the diversity of spatial patterns. Recently, the use of deep learning (DL) for image analysis has increased compared to commonly used machine learning (ML) methods. This paper compares the performance of four algorithms, Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Convolutional Network (CNN), using multi-source data, applying an exhaustive optimisation process of the hyperparameters. The usual approach in the optimisation process of a LULC classification model is to keep the best model in terms of accuracy without analysing the rest of the results. In this study, we have analysed such results, discovering noteworthy patterns in a space defined by the mean and standard deviation of the validation accuracy estimated in a 10-fold cross validation (CV). The point distributions in such a space do not appear to be completely random, but show clusters of points that facilitate the discovery of hyperparameter values that tend to increase the mean accuracy and decrease its standard deviation. RF is not the most accurate model, but it is the less sensitive to changes in hyperparameters. Neural Networks, tend to increase commission and omission errors of the less represented classes because their optimisation lead the model to learn better the most frequent classes. On the other hand, RF and MLP prediction layers are the most accurate from a general qualitative point of view.</p>","PeriodicalId":10662,"journal":{"name":"Computational Geosciences","volume":"43 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis of the hyperparameter optimisation of four machine learning satellite imagery classification methods\",\"authors\":\"Francisco Alonso-Sarría, Carmen Valdivieso-Ros, Francisco Gomariz-Castillo\",\"doi\":\"10.1007/s10596-024-10285-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The classification of land use and land cover (LULC) from remotely sensed imagery in semi-arid Mediterranean areas is a challenging task due to the fragmentation of the landscape and the diversity of spatial patterns. Recently, the use of deep learning (DL) for image analysis has increased compared to commonly used machine learning (ML) methods. This paper compares the performance of four algorithms, Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Convolutional Network (CNN), using multi-source data, applying an exhaustive optimisation process of the hyperparameters. The usual approach in the optimisation process of a LULC classification model is to keep the best model in terms of accuracy without analysing the rest of the results. In this study, we have analysed such results, discovering noteworthy patterns in a space defined by the mean and standard deviation of the validation accuracy estimated in a 10-fold cross validation (CV). The point distributions in such a space do not appear to be completely random, but show clusters of points that facilitate the discovery of hyperparameter values that tend to increase the mean accuracy and decrease its standard deviation. RF is not the most accurate model, but it is the less sensitive to changes in hyperparameters. Neural Networks, tend to increase commission and omission errors of the less represented classes because their optimisation lead the model to learn better the most frequent classes. On the other hand, RF and MLP prediction layers are the most accurate from a general qualitative point of view.</p>\",\"PeriodicalId\":10662,\"journal\":{\"name\":\"Computational Geosciences\",\"volume\":\"43 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Geosciences\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1007/s10596-024-10285-y\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Geosciences","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10596-024-10285-y","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

在半干旱的地中海地区，由于景观的破碎化和空间模式的多样性，从遥感图像中对土地利用和土地覆被进行分类是一项具有挑战性的任务。最近，与常用的机器学习（ML）方法相比，深度学习（DL）在图像分析中的应用越来越多。本文利用多源数据，对随机森林（RF）、支持向量机（SVM）、多层感知器（MLP）和卷积网络（CNN）这四种算法的性能进行了比较，并对超参数进行了详尽的优化。在 LULC 分类模型的优化过程中，通常的做法是保留准确率最高的模型，而不对其他结果进行分析。在本研究中，我们对这些结果进行了分析，发现了由 10 倍交叉验证（CV）中估计的验证准确率的平均值和标准偏差所定义的空间中值得注意的模式。这种空间中的点分布似乎并不是完全随机的，而是呈现出点群，有利于发现超参数值，这些超参数值往往会提高平均准确率并降低其标准偏差。射频模型并不是最准确的模型，但它对超参数变化的敏感度较低。神经网络往往会增加代表性较低类别的委托和遗漏误差，因为其优化会使模型更好地学习最常见的类别。另一方面，从一般定性的角度来看，RF 和 MLP 预测层是最准确的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Analysis of the hyperparameter optimisation of four machine learning satellite imagery classification methods

The classification of land use and land cover (LULC) from remotely sensed imagery in semi-arid Mediterranean areas is a challenging task due to the fragmentation of the landscape and the diversity of spatial patterns. Recently, the use of deep learning (DL) for image analysis has increased compared to commonly used machine learning (ML) methods. This paper compares the performance of four algorithms, Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Convolutional Network (CNN), using multi-source data, applying an exhaustive optimisation process of the hyperparameters. The usual approach in the optimisation process of a LULC classification model is to keep the best model in terms of accuracy without analysing the rest of the results. In this study, we have analysed such results, discovering noteworthy patterns in a space defined by the mean and standard deviation of the validation accuracy estimated in a 10-fold cross validation (CV). The point distributions in such a space do not appear to be completely random, but show clusters of points that facilitate the discovery of hyperparameter values that tend to increase the mean accuracy and decrease its standard deviation. RF is not the most accurate model, but it is the less sensitive to changes in hyperparameters. Neural Networks, tend to increase commission and omission errors of the less represented classes because their optimisation lead the model to learn better the most frequent classes. On the other hand, RF and MLP prediction layers are the most accurate from a general qualitative point of view.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Geosciences 地学-地球科学综合

CiteScore

6.10

自引率

4.00%

发文量

审稿时长

6-12 weeks

期刊介绍： Computational Geosciences publishes high quality papers on mathematical modeling, simulation, numerical analysis, and other computational aspects of the geosciences. In particular the journal is focused on advanced numerical methods for the simulation of subsurface flow and transport, and associated aspects such as discretization, gridding, upscaling, optimization, data assimilation, uncertainty assessment, and high performance parallel and grid computing. Papers treating similar topics but with applications to other fields in the geosciences, such as geomechanics, geophysics, oceanography, or meteorology, will also be considered. The journal provides a platform for interaction and multidisciplinary collaboration among diverse scientific groups, from both academia and industry, which share an interest in developing mathematical models and efficient algorithms for solving them, such as mathematicians, engineers, chemists, physicists, and geoscientists.