Remote estimates of suspended particulate matter in global lakes using machine learning models

IF 7.3 1区 农林科学 Q1 ENVIRONMENTAL SCIENCES International Soil and Water Conservation Research Pub Date : 2023-07-16 DOI:10.1016/j.iswcr.2023.07.002
Zhidan Wen , Qiang Wang , Yue Ma , Pierre Andre Jacinthe , Ge Liu , Sijia Li , Yingxin Shang , Hui Tao , Chong Fang , Lili Lyu , Baohua Zhang , Kaishan Song
{"title":"Remote estimates of suspended particulate matter in global lakes using machine learning models","authors":"Zhidan Wen ,&nbsp;Qiang Wang ,&nbsp;Yue Ma ,&nbsp;Pierre Andre Jacinthe ,&nbsp;Ge Liu ,&nbsp;Sijia Li ,&nbsp;Yingxin Shang ,&nbsp;Hui Tao ,&nbsp;Chong Fang ,&nbsp;Lili Lyu ,&nbsp;Baohua Zhang ,&nbsp;Kaishan Song","doi":"10.1016/j.iswcr.2023.07.002","DOIUrl":null,"url":null,"abstract":"<div><p>Suspended particulate matter (SPM) in lakes exerts strong impact on light propagation, aquatic ecosystem productivity, which co-varies with nutrients, heavy metal and micro-pollutant in waters. In lakes, SPM exerts strong absorption and backscattering, ultimately affects water leaving signals that can be detected by satellite sensors. Simple regression models based on specific band or hand ratios have been widely used for SPM estimate in the past with moderate accuracy. There are still rooms for model accuracy improvements, and machine learning models may solve the non-linear relationships between spectral variable and SPM in waters. We assembled more than 16,400 <em>in situ</em> measured SPM in lakes from six continents (excluding the Antarctica continent), of which 9640 samples were matched with Landsat overpasses within ±7 days. Seven machine learning algorithms and two simple regression methods (linear and partial least squares models) were used to estimate SPM in lakes and the performance were compared. To overcome the problem of imbalance datasets in regression, a Synthetic Minority Over-Sampling technique for regression with Gaussian Noise (SMOGN) was adopted in this study. Through comparison, we found that gradient boosting decision tree (GBDT), random forest (RF), and extreme gradient boosting (XGBoost) models demonstrated good spatiotemporal transferability with SMOGN processed dataset, and has potential to map SPM at different year with good quality of Landsat land surface reflectance images. In all the tested modeling approaches, the GBDT model has accurate calibration (n = 6428, R<sup>2</sup> = 0.95, MAPE = 29.8%) from SPM collected in 2235 lakes across the world, and the validation (n = 3214, R<sup>2</sup> = 0.84, MAPE = 38.8%) also exhibited stable performance. Further, the good performances were also exhibited by RF model with calibration (R<sup>2</sup> = 0.93) and validation (R<sup>2</sup> = 0.86, MAPE = 24.2%) datasets. We applied GBDT and RF models to map SPM of typical lakes, and satisfactory result was obtained. In addition, the GBDT model was evaluated by historical SPM measurements coincident with different Landsat sensors (L5-TM, L7-ETM+, and L8-OLI), thus the model has the potential to map SPM of lakes for monitoring temporal variations, and tracks lake water SPM dynamics in approximately the past four decades (1984–2021) since Landsat-5/TM was launched in 1984.</p></div>","PeriodicalId":48622,"journal":{"name":"International Soil and Water Conservation Research","volume":"12 1","pages":"Pages 200-216"},"PeriodicalIF":7.3000,"publicationDate":"2023-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2095633923000564/pdfft?md5=37872fb5d5982f62d67a65a3d27412a1&pid=1-s2.0-S2095633923000564-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Soil and Water Conservation Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2095633923000564","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Suspended particulate matter (SPM) in lakes exerts strong impact on light propagation, aquatic ecosystem productivity, which co-varies with nutrients, heavy metal and micro-pollutant in waters. In lakes, SPM exerts strong absorption and backscattering, ultimately affects water leaving signals that can be detected by satellite sensors. Simple regression models based on specific band or hand ratios have been widely used for SPM estimate in the past with moderate accuracy. There are still rooms for model accuracy improvements, and machine learning models may solve the non-linear relationships between spectral variable and SPM in waters. We assembled more than 16,400 in situ measured SPM in lakes from six continents (excluding the Antarctica continent), of which 9640 samples were matched with Landsat overpasses within ±7 days. Seven machine learning algorithms and two simple regression methods (linear and partial least squares models) were used to estimate SPM in lakes and the performance were compared. To overcome the problem of imbalance datasets in regression, a Synthetic Minority Over-Sampling technique for regression with Gaussian Noise (SMOGN) was adopted in this study. Through comparison, we found that gradient boosting decision tree (GBDT), random forest (RF), and extreme gradient boosting (XGBoost) models demonstrated good spatiotemporal transferability with SMOGN processed dataset, and has potential to map SPM at different year with good quality of Landsat land surface reflectance images. In all the tested modeling approaches, the GBDT model has accurate calibration (n = 6428, R2 = 0.95, MAPE = 29.8%) from SPM collected in 2235 lakes across the world, and the validation (n = 3214, R2 = 0.84, MAPE = 38.8%) also exhibited stable performance. Further, the good performances were also exhibited by RF model with calibration (R2 = 0.93) and validation (R2 = 0.86, MAPE = 24.2%) datasets. We applied GBDT and RF models to map SPM of typical lakes, and satisfactory result was obtained. In addition, the GBDT model was evaluated by historical SPM measurements coincident with different Landsat sensors (L5-TM, L7-ETM+, and L8-OLI), thus the model has the potential to map SPM of lakes for monitoring temporal variations, and tracks lake water SPM dynamics in approximately the past four decades (1984–2021) since Landsat-5/TM was launched in 1984.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习模型对全球湖泊中的悬浮颗粒物进行远程估计
湖泊中悬浮颗粒物(SPM)对光传播、水生生态系统生产力具有强烈的影响,并与水体中的营养物质、重金属和微污染物共变。在湖泊中,SPM产生强烈的吸收和后向散射,最终影响到卫星传感器可以探测到的水留下的信号。过去,基于特定波段或手比的简单回归模型被广泛用于SPM估计,其精度适中。模型精度仍有提高的空间,机器学习模型可以解决光谱变量与水域SPM之间的非线性关系。我们在六大洲(不包括南极洲大陆)的湖泊中收集了超过16400个原位测量的SPM,其中9640个样本与Landsat立交桥在±7天内进行了匹配。采用7种机器学习算法和2种简单回归方法(线性和偏最小二乘模型)估计湖泊SPM,并比较其性能。为了克服回归中数据集不平衡的问题,本研究采用了一种高斯噪声回归的合成少数派过采样技术(SMOGN)。通过对比,我们发现梯度增强决策树(GBDT)、随机森林(RF)和极端梯度增强(XGBoost)模型与SMOGN处理后的数据集具有良好的时空可转移性,具有在不同年份绘制高质量Landsat地表反射率图像的潜力。在所有被测试的建模方法中,GBDT模型对全球2235个湖泊的SPM进行了准确的校准(n = 6428, R2 = 0.95, MAPE = 29.8%),验证(n = 3214, R2 = 0.84, MAPE = 38.8%)也表现出稳定的性能。此外,校正数据集(R2 = 0.93)和验证数据集(R2 = 0.86, MAPE = 24.2%)的射频模型也表现出良好的性能。应用GBDT和RF模型对典型湖泊的SPM进行了反演,得到了满意的结果。此外,利用不同Landsat传感器(L5-TM、L7-ETM+和L8-OLI)的SPM历史测量值对GBDT模型进行了评估,因此该模型具有绘制湖泊SPM的潜力,可以监测湖泊SPM的时间变化,并跟踪了自1984年Landsat-5/TM发射以来近40年(1984 - 2021年)的湖泊SPM动态。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Soil and Water Conservation Research
International Soil and Water Conservation Research Agricultural and Biological Sciences-Agronomy and Crop Science
CiteScore
12.00
自引率
3.10%
发文量
171
审稿时长
49 days
期刊介绍: The International Soil and Water Conservation Research (ISWCR), the official journal of World Association of Soil and Water Conservation (WASWAC) http://www.waswac.org, is a multidisciplinary journal of soil and water conservation research, practice, policy, and perspectives. It aims to disseminate new knowledge and promote the practice of soil and water conservation. The scope of International Soil and Water Conservation Research includes research, strategies, and technologies for prediction, prevention, and protection of soil and water resources. It deals with identification, characterization, and modeling; dynamic monitoring and evaluation; assessment and management of conservation practice and creation and implementation of quality standards. Examples of appropriate topical areas include (but are not limited to): • Conservation models, tools, and technologies • Conservation agricultural • Soil health resources, indicators, assessment, and management • Land degradation • Sustainable development • Soil erosion and its control • Soil erosion processes • Water resources assessment and management • Watershed management • Soil erosion models • Literature review on topics related soil and water conservation research
期刊最新文献
Editorial Board Utilizing geodetectors to identify conditioning factors for gully erosion risk in the black soil region of northeast China Automated quantification of contouring as support practice for improved soil erosion estimation considering ridges Editorial Board Three-dimensional spatiotemporal variation of soil organic carbon and its influencing factors at the basin scale
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1