Coastal science has entered a new era of data-driven research, facilitated by satellite data and cloud computing. Despite its potential, the coastal community has yet to fully capitalize on these advancements due to a lack of tailored data, tools, and models. This paper demonstrates how cloud technology can advance coastal analytics at scale. We introduce GCTS, a novel foundational dataset comprising over 11 million coastal transects at 100-m resolution. Our experiments highlight the importance of cloud-optimized data formats, geospatial sorting, and metadata-driven data retrieval. By leveraging cloud technology, we achieve up to 700 times faster performance for tasks like coastal waterline mapping. A case study reveals that 33% of the world’s first kilometer of coast is below 5 m, with the entire analysis completed in a few hours. Our findings make a compelling case for the coastal community to start producing data, tools, and models suitable for scalable coastal analytics.
{"title":"Enabling coastal analytics at planetary scale","authors":"Floris Reinier Calkoen , Arjen Pieter Luijendijk , Kilian Vos , Etiënne Kras , Fedor Baart","doi":"10.1016/j.envsoft.2024.106257","DOIUrl":"10.1016/j.envsoft.2024.106257","url":null,"abstract":"<div><div>Coastal science has entered a new era of data-driven research, facilitated by satellite data and cloud computing. Despite its potential, the coastal community has yet to fully capitalize on these advancements due to a lack of tailored data, tools, and models. This paper demonstrates how cloud technology can advance coastal analytics at scale. We introduce GCTS, a novel foundational dataset comprising over 11 million coastal transects at 100-m resolution. Our experiments highlight the importance of cloud-optimized data formats, geospatial sorting, and metadata-driven data retrieval. By leveraging cloud technology, we achieve up to 700 times faster performance for tasks like coastal waterline mapping. A case study reveals that 33% of the world’s first kilometer of coast is below 5 m, with the entire analysis completed in a few hours. Our findings make a compelling case for the coastal community to start producing data, tools, and models suitable for scalable coastal analytics.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"183 ","pages":"Article 106257"},"PeriodicalIF":4.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.envsoft.2024.106261
Bangjie Fu , Yange Li , Chen Wang , Zheng Han , Nan Jiang , Wendu Xie , Changli Li , Haohui Ding , Weidong Wang , Guangqi Chen
Up-to-date studies have proved the effectiveness of Convolutional Neural Networks (CNN) in landslide detection. With the rapid development of Remote Sensing and Geographic Information System technologies, an increasing amount of spectral and non-spectral information is available for CNN modeling. It offering a comprehensive perspective for landslide detection, but also presents challenges to CNNs, especially in efficiently learning long-range feature associations. Therefore, we proposed a novel Transformer-improved VGG network (Trans-VGG). It takes spectral (RGB images) and non-spectral information (elevation, slope, and PCA components) as data inputs and integrating both local and global feature in modeling. The method is tested in two landslide cluster areas in Litang County, China. The results in site a show that the Trans-VGG model demonstrates an improvement in F1-score, ranging from 4% to 21%, compared with the conventional machine learning and CNN models. The validation result in site b further proved the validity of our proposed method.
{"title":"Transformer-embedded 1D VGG convolutional neural network for regional landslides detection boosted by multichannel data inputs","authors":"Bangjie Fu , Yange Li , Chen Wang , Zheng Han , Nan Jiang , Wendu Xie , Changli Li , Haohui Ding , Weidong Wang , Guangqi Chen","doi":"10.1016/j.envsoft.2024.106261","DOIUrl":"10.1016/j.envsoft.2024.106261","url":null,"abstract":"<div><div>Up-to-date studies have proved the effectiveness of Convolutional Neural Networks (CNN) in landslide detection. With the rapid development of Remote Sensing and Geographic Information System technologies, an increasing amount of spectral and non-spectral information is available for CNN modeling. It offering a comprehensive perspective for landslide detection, but also presents challenges to CNNs, especially in efficiently learning long-range feature associations. Therefore, we proposed a novel Transformer-improved VGG network (Trans-VGG). It takes spectral (RGB images) and non-spectral information (elevation, slope, and PCA components) as data inputs and integrating both local and global feature in modeling. The method is tested in two landslide cluster areas in Litang County, China. The results in site a show that the Trans-VGG model demonstrates an improvement in F1-score, ranging from 4% to 21%, compared with the conventional machine learning and CNN models. The validation result in site b further proved the validity of our proposed method.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"183 ","pages":"Article 106261"},"PeriodicalIF":4.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1016/j.envsoft.2024.106253
Xin Tong , Bryan Quaife
Data-driven techniques are increasingly being applied to complement physics-based models in fire science. However, the lack of sufficiently large datasets continues to hinder the application of certain machine learning techniques. In this paper, we use simulated data to investigate the ability of neural networks to parameterize dynamics in fire science. In particular, we investigate neural networks that map five key parameters in fire spread to the first arrival time, and the corresponding inverse problem. By using simulated data, we are able to characterize the error, the required dataset size, and the convergence properties of these neural networks. For the inverse problem, we quantify the network’s sensitivity in estimating each of the key parameters. The findings demonstrate the potential of machine learning in fire science, highlight the challenges associated with limited dataset sizes, and quantify the sensitivity of neural networks to estimate key parameters governing fire spread dynamics.
{"title":"Data-driven fire modeling: Learning first arrival times and model parameters with neural networks","authors":"Xin Tong , Bryan Quaife","doi":"10.1016/j.envsoft.2024.106253","DOIUrl":"10.1016/j.envsoft.2024.106253","url":null,"abstract":"<div><div>Data-driven techniques are increasingly being applied to complement physics-based models in fire science. However, the lack of sufficiently large datasets continues to hinder the application of certain machine learning techniques. In this paper, we use simulated data to investigate the ability of neural networks to parameterize dynamics in fire science. In particular, we investigate neural networks that map five key parameters in fire spread to the first arrival time, and the corresponding inverse problem. By using simulated data, we are able to characterize the error, the required dataset size, and the convergence properties of these neural networks. For the inverse problem, we quantify the network’s sensitivity in estimating each of the key parameters. The findings demonstrate the potential of machine learning in fire science, highlight the challenges associated with limited dataset sizes, and quantify the sensitivity of neural networks to estimate key parameters governing fire spread dynamics.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"183 ","pages":"Article 106253"},"PeriodicalIF":4.8,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1016/j.envsoft.2024.106260
Bao Liu , Siqi Chen , Lei Gao
Understanding spatiotemporal variations in forest cover is crucial for effective forest resource management. However, existing models often lack accuracy in simultaneously capturing temporal continuity and spatial correlation. To address this challenge, we developed ResConvLSTM-Att, a novel hybrid model integrating residual neural networks, Convolutional Long Short-Term Memory (ConvLSTM) networks, and attention mechanisms. We evaluated ResConvLSTM-Att against four deep learning models: LSTM, combined convolutional neural network and LSTM (CNN-LSTM), ConvLSTM, and ResConvLSTM for spatiotemporal prediction of forest cover in Tasmania, Australia. ResConvLSTM-Att achieved outstanding prediction performance, with an average root mean square error (RMSE) of 6.9% coverage and an impressive average coefficient of determination of 0.965. Compared with LSTM, CNN-LSTM, ConvLSTM, and ResConvLSTM, ResConvLSTM-Att achieved RMSE reductions of 31.2%, 43.0%, 10.1%, and 6.5%, respectively. Additionally, we quantified the impacts of explanatory variables on forest cover dynamics. Our work demonstrated the effectiveness of ResConvLSTM-Att in spatiotemporal data modelling and prediction.
{"title":"Combining residual convolutional LSTM with attention mechanisms for spatiotemporal forest cover prediction","authors":"Bao Liu , Siqi Chen , Lei Gao","doi":"10.1016/j.envsoft.2024.106260","DOIUrl":"10.1016/j.envsoft.2024.106260","url":null,"abstract":"<div><div>Understanding spatiotemporal variations in forest cover is crucial for effective forest resource management. However, existing models often lack accuracy in simultaneously capturing temporal continuity and spatial correlation. To address this challenge, we developed ResConvLSTM-Att, a novel hybrid model integrating residual neural networks, Convolutional Long Short-Term Memory (ConvLSTM) networks, and attention mechanisms. We evaluated ResConvLSTM-Att against four deep learning models: LSTM, combined convolutional neural network and LSTM (CNN-LSTM), ConvLSTM, and ResConvLSTM for spatiotemporal prediction of forest cover in Tasmania, Australia. ResConvLSTM-Att achieved outstanding prediction performance, with an average root mean square error (RMSE) of 6.9% coverage and an impressive average coefficient of determination of 0.965. Compared with LSTM, CNN-LSTM, ConvLSTM, and ResConvLSTM, ResConvLSTM-Att achieved RMSE reductions of 31.2%, 43.0%, 10.1%, and 6.5%, respectively. Additionally, we quantified the impacts of explanatory variables on forest cover dynamics. Our work demonstrated the effectiveness of ResConvLSTM-Att in spatiotemporal data modelling and prediction.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"183 ","pages":"Article 106260"},"PeriodicalIF":4.8,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-03DOI: 10.1016/j.envsoft.2024.106255
Elisa Bayraktarov , Samantha Low-Choy , Abhimanyu Raj Singh , Linda J. Beaumont , Kristen J. Williams , John B. Baumgartner , Shawn W. Laffan , Daniela Vasco , Robert Cosgrove , Jenna Wraith , Jessica Fenker Antunes , Brendan Mackey
Biodiversity decline and climate change are among the most important environmental issues society faces. Information to address these issues has benefited from increasing big data, advances in cloud computing, and subsequent new tools for analytics. Accessing such tools is streamlined by virtual laboratories for ecological analysis, like the ‘Biodiversity and Climate Change Virtual Laboratory’ (BCCVL) and ‘ecocloud’. These platforms help reduce time and effort spent on developing programming skills, data acquisition and curation, plus model building. Recently this functionality was extended, producing EcoCommons Australia—a web-based ecological modeling platform for environmental problem-solving—with upgraded infrastructure and improved ensemble modeling, post-model analysis, workflow transparency and reproducibility. We outline our user-centered approach to systems design, from initial surveys of stakeholder needs to user involvement in testing, and collaboration with specialists. We illustrate EcoCommons and compare model evaluation statistics through four case studies, highlighting how the modular platform meets users' needs.
{"title":"EcoCommons Australia virtual laboratories with cloud computing: Meeting diverse user needs for ecological modeling and decision-making","authors":"Elisa Bayraktarov , Samantha Low-Choy , Abhimanyu Raj Singh , Linda J. Beaumont , Kristen J. Williams , John B. Baumgartner , Shawn W. Laffan , Daniela Vasco , Robert Cosgrove , Jenna Wraith , Jessica Fenker Antunes , Brendan Mackey","doi":"10.1016/j.envsoft.2024.106255","DOIUrl":"10.1016/j.envsoft.2024.106255","url":null,"abstract":"<div><div>Biodiversity decline and climate change are among the most important environmental issues society faces. Information to address these issues has benefited from increasing big data, advances in cloud computing, and subsequent new tools for analytics. Accessing such tools is streamlined by virtual laboratories for ecological analysis, like the ‘Biodiversity and Climate Change Virtual Laboratory’ (BCCVL) and ‘ecocloud’. These platforms help reduce time and effort spent on developing programming skills, data acquisition and curation, plus model building. Recently this functionality was extended, producing EcoCommons Australia—a web-based ecological modeling platform for environmental problem-solving—with upgraded infrastructure and improved ensemble modeling, post-model analysis, workflow transparency and reproducibility. We outline our user-centered approach to systems design, from initial surveys of stakeholder needs to user involvement in testing, and collaboration with specialists. We illustrate EcoCommons and compare model evaluation statistics through four case studies, highlighting how the modular platform meets users' needs.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"183 ","pages":"Article 106255"},"PeriodicalIF":4.8,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142655281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-02DOI: 10.1016/j.envsoft.2024.106254
Nicolò Perello , Andrea Trucchia , Mirko D’Andrea , Silvia Degli Esposti , Paolo Fiorucci , Andrea Gollini , Dario Negro
Estimating the Dead Fuel Moisture Content (DFMC) is crucial in wildfire risk management, representing a key component in forest fire danger rating systems and wildfire simulation models. DFMC fluctuates sub-daily and spatially, influenced by local weather and fuel characteristics. This necessitates models that provide sub-daily fuel moisture conditions for improving wildfire risk management. Many forest fire danger rating systems typically rely on daily fuel moisture models that overlook local fuel characteristics, with consequent impact on wildfire management. The semi-empirical parametric DFMC model proposed addresses these issues by providing hourly dead fuel moisture dynamics, with specific parameters to consider local fuel characteristics. A calibration framework is proposed by adopting Particle Swarm Optimization-type algorithm. In the present study, the calibration framework has been tested by using hourly 10-h fuel sticks measurements. Implementing this model in forest fire danger rating systems would enhance detail in forest fire danger conditions, advancing wildfire risk management.
{"title":"An adaptable dead fuel moisture model for various fuel types and temporal scales tailored for wildfire danger assessment","authors":"Nicolò Perello , Andrea Trucchia , Mirko D’Andrea , Silvia Degli Esposti , Paolo Fiorucci , Andrea Gollini , Dario Negro","doi":"10.1016/j.envsoft.2024.106254","DOIUrl":"10.1016/j.envsoft.2024.106254","url":null,"abstract":"<div><div>Estimating the Dead Fuel Moisture Content (DFMC) is crucial in wildfire risk management, representing a key component in forest fire danger rating systems and wildfire simulation models. DFMC fluctuates sub-daily and spatially, influenced by local weather and fuel characteristics. This necessitates models that provide sub-daily fuel moisture conditions for improving wildfire risk management. Many forest fire danger rating systems typically rely on daily fuel moisture models that overlook local fuel characteristics, with consequent impact on wildfire management. The semi-empirical parametric DFMC model proposed addresses these issues by providing hourly dead fuel moisture dynamics, with specific parameters to consider local fuel characteristics. A calibration framework is proposed by adopting Particle Swarm Optimization-type algorithm. In the present study, the calibration framework has been tested by using hourly 10-h fuel sticks measurements. Implementing this model in forest fire danger rating systems would enhance detail in forest fire danger conditions, advancing wildfire risk management.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"183 ","pages":"Article 106254"},"PeriodicalIF":4.8,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1016/j.envsoft.2024.106258
Jian Ji , Bin Tong , Hong-Zhi Cui , Xin-Tao Tang , Marcel Hürlimann , Shigui Du
Earthquake-induced regional landslides frequently result in substantial economic losses and casualties. Conducting landslide susceptibility assessments is essential for mitigating these risks and minimizing potential damage. To address the diverse needs of professionals in various disciplines, we have developed an open-source plugin for QGIS, named QGIS-FORM. This plugin integrates functions of both physically-based model (PM) and physically-based probabilistic model (PPM). The PM employs pseudo-static infinite slope stability model, while the PPM utilizes an improved first order reliability method (FORM) to perform landslide probability analysis over a spatial region. To verify its effectiveness, the plugin was applied to the Maerkang landslide event in 2022. Based on the PM and the PPM, the landslide susceptibility assessments were evaluated using several parameters including slope, aspect, stratum, and PGA. In addition, the Receiver Operating Characteristic (ROC) curve and Balanced Accuracy were employed to assess their predictive performance. The landslide susceptibility results indicate that landslides in Maerkang are mostly concentrated in slopes between 30° and 50°, and the geological conditions of the Xinduqiao Formation () are more prone to landslides. Compared to PM, the PPM can achieve higher AUC values when the parameter uncertainties are properly characterized. Overall, the PPM exhibits higher accuracy and is more capable of identifying potential landslides than the physically-based model, thereby providing a more reliable way and/or offering a scientific basis for the management and mitigation of landslide disaster risks.
{"title":"A QGIS framework for physically-based probabilistic modelling of landslide susceptibility: QGIS-FORM","authors":"Jian Ji , Bin Tong , Hong-Zhi Cui , Xin-Tao Tang , Marcel Hürlimann , Shigui Du","doi":"10.1016/j.envsoft.2024.106258","DOIUrl":"10.1016/j.envsoft.2024.106258","url":null,"abstract":"<div><div>Earthquake-induced regional landslides frequently result in substantial economic losses and casualties. Conducting landslide susceptibility assessments is essential for mitigating these risks and minimizing potential damage. To address the diverse needs of professionals in various disciplines, we have developed an open-source plugin for QGIS, named QGIS-FORM. This plugin integrates functions of both physically-based model (PM) and physically-based probabilistic model (PPM). The PM employs pseudo-static infinite slope stability model, while the PPM utilizes an improved first order reliability method (FORM) to perform landslide probability analysis over a spatial region. To verify its effectiveness, the plugin was applied to the Maerkang landslide event in 2022. Based on the PM and the PPM, the landslide susceptibility assessments were evaluated using several parameters including slope, aspect, stratum, and PGA. In addition, the Receiver Operating Characteristic (ROC) curve and Balanced Accuracy were employed to assess their predictive performance. The landslide susceptibility results indicate that landslides in Maerkang are mostly concentrated in slopes between 30° and 50°, and the geological conditions of the Xinduqiao Formation (<span><math><mrow><msub><mi>T</mi><mn>3</mn></msub><mi>X</mi></mrow></math></span>) are more prone to landslides. Compared to PM, the PPM can achieve higher AUC values when the parameter uncertainties are properly characterized. Overall, the PPM exhibits higher accuracy and is more capable of identifying potential landslides than the physically-based model, thereby providing a more reliable way and/or offering a scientific basis for the management and mitigation of landslide disaster risks.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"183 ","pages":"Article 106258"},"PeriodicalIF":4.8,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1016/j.envsoft.2024.106251
Zhao Sun, Yongxian Wang
Sea surface temperature (SST) is crucial for studying global oceans and evaluating ecosystems. Accurately predicting short and mid-term daily SST has been a significant challenge in oceanography. Traditional deep learning methods can handle temporal data and spatial features but often struggle with long-range spatiotemporal dependencies. To address this, we propose a coordination attention residual U-Net(CResU-Net) model designed to better capture the dynamic spatiotemporal correlations of high-resolution SST. The model integrates coordinate attention mechanisms, multiple residual modules, and depthwise separable convolutions to enhance prediction capabilities. The spatiotemporal variations of SST across different areas of the South China Sea are complex, making accurate predictions challenging. Experiments across various regions of the South China Sea show the model’s effectiveness and robust generalization in predicting high-resolution daily SST. For a 10-day forecast period, the model achieves approximately 0.3 °C in RMSE, outperforming several advanced models.
{"title":"A coordination attention residual U-Net model for enhanced short and mid-term sea surface temperature prediction","authors":"Zhao Sun, Yongxian Wang","doi":"10.1016/j.envsoft.2024.106251","DOIUrl":"10.1016/j.envsoft.2024.106251","url":null,"abstract":"<div><div>Sea surface temperature (SST) is crucial for studying global oceans and evaluating ecosystems. Accurately predicting short and mid-term daily SST has been a significant challenge in oceanography. Traditional deep learning methods can handle temporal data and spatial features but often struggle with long-range spatiotemporal dependencies. To address this, we propose a coordination attention residual U-Net(CResU-Net) model designed to better capture the dynamic spatiotemporal correlations of high-resolution SST. The model integrates coordinate attention mechanisms, multiple residual modules, and depthwise separable convolutions to enhance prediction capabilities. The spatiotemporal variations of SST across different areas of the South China Sea are complex, making accurate predictions challenging. Experiments across various regions of the South China Sea show the model’s effectiveness and robust generalization in predicting high-resolution daily SST. For a 10-day forecast period, the model achieves approximately 0.3 °C in RMSE, outperforming several advanced models.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"183 ","pages":"Article 106251"},"PeriodicalIF":4.8,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-25DOI: 10.1016/j.envsoft.2024.106238
Yiran Ji , Feifei Zheng , Jinhua Wen , Qifeng Li , Junyi Chen , Holger R. Maier , Hoshin V. Gupta
Development of environmental models generally requires available data to be split into “development” and “evaluation” subsets. How this is done can significantly affect a model's outputs and performance. However, data splitting is generally done in a subjective, ad-hoc manner, with little justification, raising questions regarding the reliability of the findings of many modelling studies. To address this issue, we present and demonstrate the value of an R-package along with high-level guidelines for implementing many state-of-the-art data splitting methods in order to develop the model in a considered, defensible, consistent, repeatable and transparent fashion, thereby improving the generalizability of the resulting models. Results from two rainfall-runoff case studies show that models with high generalization ability can be achieved even when the available data contain rare, extreme events. Additionally, data splitting methods can be used to explicitly quantify the parameter uncertainty associated with data splitting and the resulting bounds on model predictions.
开发环境模型通常需要将可用数据分成 "开发 "和 "评估 "两个子集。如何分割会对模型的输出结果和性能产生重大影响。然而,数据分割通常是以主观的、临时的方式进行的,没有什么正当理由,这就对许多建模研究结果的可靠性提出了质疑。为了解决这个问题,我们介绍并演示了 R 软件包的价值,以及实施许多最先进数据拆分方法的高级指南,以便以一种经过深思熟虑、可辩护、一致、可重复和透明的方式开发模型,从而提高所生成模型的可推广性。两个降雨-径流案例研究的结果表明,即使现有数据包含罕见的极端事件,也可以建立具有高泛化能力的模型。此外,数据拆分方法可用于明确量化与数据拆分相关的参数不确定性以及由此产生的模型预测界限。
{"title":"An R package to partition observation data used for model development and evaluation to achieve model generalizability","authors":"Yiran Ji , Feifei Zheng , Jinhua Wen , Qifeng Li , Junyi Chen , Holger R. Maier , Hoshin V. Gupta","doi":"10.1016/j.envsoft.2024.106238","DOIUrl":"10.1016/j.envsoft.2024.106238","url":null,"abstract":"<div><div>Development of environmental models generally requires available data to be split into “development” and “evaluation” subsets. How this is done can significantly affect a model's outputs and performance. However, data splitting is generally done in a subjective, ad-hoc manner, with little justification, raising questions regarding the reliability of the findings of many modelling studies. To address this issue, we present and demonstrate the value of an R-package along with high-level guidelines for implementing many state-of-the-art data splitting methods in order to develop the model in a considered, defensible, consistent, repeatable and transparent fashion, thereby improving the generalizability of the resulting models. Results from two rainfall-runoff case studies show that models with high generalization ability can be achieved even when the available data contain rare, extreme events. Additionally, data splitting methods can be used to explicitly quantify the parameter uncertainty associated with data splitting and the resulting bounds on model predictions.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"183 ","pages":"Article 106238"},"PeriodicalIF":4.8,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}