Francisco Alonso-Sarría, Carmen Valdivieso-Ros, Francisco Gomariz-Castillo
{"title":"Analysis of the hyperparameter optimisation of four machine learning satellite imagery classification methods","authors":"Francisco Alonso-Sarría, Carmen Valdivieso-Ros, Francisco Gomariz-Castillo","doi":"10.1007/s10596-024-10285-y","DOIUrl":null,"url":null,"abstract":"<p>The classification of land use and land cover (LULC) from remotely sensed imagery in semi-arid Mediterranean areas is a challenging task due to the fragmentation of the landscape and the diversity of spatial patterns. Recently, the use of deep learning (DL) for image analysis has increased compared to commonly used machine learning (ML) methods. This paper compares the performance of four algorithms, Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Convolutional Network (CNN), using multi-source data, applying an exhaustive optimisation process of the hyperparameters. The usual approach in the optimisation process of a LULC classification model is to keep the best model in terms of accuracy without analysing the rest of the results. In this study, we have analysed such results, discovering noteworthy patterns in a space defined by the mean and standard deviation of the validation accuracy estimated in a 10-fold cross validation (CV). The point distributions in such a space do not appear to be completely random, but show clusters of points that facilitate the discovery of hyperparameter values that tend to increase the mean accuracy and decrease its standard deviation. RF is not the most accurate model, but it is the less sensitive to changes in hyperparameters. Neural Networks, tend to increase commission and omission errors of the less represented classes because their optimisation lead the model to learn better the most frequent classes. On the other hand, RF and MLP prediction layers are the most accurate from a general qualitative point of view.</p>","PeriodicalId":10662,"journal":{"name":"Computational Geosciences","volume":"43 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Geosciences","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10596-024-10285-y","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The classification of land use and land cover (LULC) from remotely sensed imagery in semi-arid Mediterranean areas is a challenging task due to the fragmentation of the landscape and the diversity of spatial patterns. Recently, the use of deep learning (DL) for image analysis has increased compared to commonly used machine learning (ML) methods. This paper compares the performance of four algorithms, Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Convolutional Network (CNN), using multi-source data, applying an exhaustive optimisation process of the hyperparameters. The usual approach in the optimisation process of a LULC classification model is to keep the best model in terms of accuracy without analysing the rest of the results. In this study, we have analysed such results, discovering noteworthy patterns in a space defined by the mean and standard deviation of the validation accuracy estimated in a 10-fold cross validation (CV). The point distributions in such a space do not appear to be completely random, but show clusters of points that facilitate the discovery of hyperparameter values that tend to increase the mean accuracy and decrease its standard deviation. RF is not the most accurate model, but it is the less sensitive to changes in hyperparameters. Neural Networks, tend to increase commission and omission errors of the less represented classes because their optimisation lead the model to learn better the most frequent classes. On the other hand, RF and MLP prediction layers are the most accurate from a general qualitative point of view.
期刊介绍:
Computational Geosciences publishes high quality papers on mathematical modeling, simulation, numerical analysis, and other computational aspects of the geosciences. In particular the journal is focused on advanced numerical methods for the simulation of subsurface flow and transport, and associated aspects such as discretization, gridding, upscaling, optimization, data assimilation, uncertainty assessment, and high performance parallel and grid computing.
Papers treating similar topics but with applications to other fields in the geosciences, such as geomechanics, geophysics, oceanography, or meteorology, will also be considered.
The journal provides a platform for interaction and multidisciplinary collaboration among diverse scientific groups, from both academia and industry, which share an interest in developing mathematical models and efficient algorithms for solving them, such as mathematicians, engineers, chemists, physicists, and geoscientists.