A global gross primary productivity of sunlit and shaded canopies dataset from 2002 to 2020 via embedding random forest into two-leaf light use efficiency model
Zhilong Li , Ziti Jiao , Ge Gao , Jing Guo , Chenxia Wang , Sizhe Chen , Zheyou Tan
{"title":"A global gross primary productivity of sunlit and shaded canopies dataset from 2002 to 2020 via embedding random forest into two-leaf light use efficiency model","authors":"Zhilong Li , Ziti Jiao , Ge Gao , Jing Guo , Chenxia Wang , Sizhe Chen , Zheyou Tan","doi":"10.1016/j.dib.2025.111298","DOIUrl":null,"url":null,"abstract":"<div><div>Gross primary productivity (GPP) is crucial for understanding the carbon cycle and maintaining ecosystem balance under climate change. We attempt to generate a long-term global dataset for GPP of sunlit (GPP<sub>su</sub>) and shaded leaves (GPP<sub>sh</sub>) by a hybrid model combining the random forest (RF) submodule with the two-leaf light use efficiency (TL-LUE) model. First, the TL-LUE model was optimized by considering the seasonal differences in the clumping index on a global scale (TL-CLUE). Then, we used the RF technique to integrate various environmental stress factors, including meteorological factors, hydrological variables, soil properties, and elevation, which originate from the NASA MERRA-2 dataset, ISRIC soil Grids, and USGS data center. Furthermore, the RF submodule was embedded into the TL-CLUE model to construct the hybrid model (TL-CRF), which was trained and evaluated based on global eddy covariance (EC) site data from the AmeriFlux and FLUXNET2015 datasets. We produced a global GPP, GPP<sub>su</sub>, and GPP<sub>sh</sub> dataset with a spatial resolution of 0.05 × 0.05° over 2002–2020 by the TL-CRF model driven by the LP DACC leaf area index and land cover, NASA MERRA-2 incoming shortwave solar radiation, and the above environmental variables. This GPP product provides a data basis for improving our understanding of the dynamics of global vegetation productivity and its interactions with the changes in environmental conditions<em>.</em></div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111298"},"PeriodicalIF":1.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11786690/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340925000307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Gross primary productivity (GPP) is crucial for understanding the carbon cycle and maintaining ecosystem balance under climate change. We attempt to generate a long-term global dataset for GPP of sunlit (GPPsu) and shaded leaves (GPPsh) by a hybrid model combining the random forest (RF) submodule with the two-leaf light use efficiency (TL-LUE) model. First, the TL-LUE model was optimized by considering the seasonal differences in the clumping index on a global scale (TL-CLUE). Then, we used the RF technique to integrate various environmental stress factors, including meteorological factors, hydrological variables, soil properties, and elevation, which originate from the NASA MERRA-2 dataset, ISRIC soil Grids, and USGS data center. Furthermore, the RF submodule was embedded into the TL-CLUE model to construct the hybrid model (TL-CRF), which was trained and evaluated based on global eddy covariance (EC) site data from the AmeriFlux and FLUXNET2015 datasets. We produced a global GPP, GPPsu, and GPPsh dataset with a spatial resolution of 0.05 × 0.05° over 2002–2020 by the TL-CRF model driven by the LP DACC leaf area index and land cover, NASA MERRA-2 incoming shortwave solar radiation, and the above environmental variables. This GPP product provides a data basis for improving our understanding of the dynamics of global vegetation productivity and its interactions with the changes in environmental conditions.
期刊介绍:
Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.