IPB-MSA&SO4: a daily 0.25° resolution dataset of in situ-produced biogenic methanesulfonic acid and sulfate over the North Atlantic during 1998–2022 based on machine learning

IF 11.2 1区 地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY Earth System Science Data Pub Date : 2024-06-12 DOI:10.5194/essd-16-2717-2024
Karam Mansour, Stefano Decesari, Darius Ceburnis, Jurgita Ovadnevaite, Lynn M. Russell, Marco Paglione, Laurent Poulain, Shan Huang, Colin O'Dowd, Matteo Rinaldi
{"title":"IPB-MSA&SO4: a daily 0.25° resolution dataset of in situ-produced biogenic methanesulfonic acid and sulfate over the North Atlantic during 1998–2022 based on machine learning","authors":"Karam Mansour, Stefano Decesari, Darius Ceburnis, Jurgita Ovadnevaite, Lynn M. Russell, Marco Paglione, Laurent Poulain, Shan Huang, Colin O'Dowd, Matteo Rinaldi","doi":"10.5194/essd-16-2717-2024","DOIUrl":null,"url":null,"abstract":"Abstract. Accurate long-term marine-derived biogenic sulfur aerosol concentrations at high spatial and temporal resolutions are critical for a wide range of studies, including climatology, trend analysis, and model evaluation; this information is also imperative for the accurate investigation of the contribution of marine-derived biogenic sulfur aerosol concentrations to the aerosol burden, for the elucidation of their radiative impacts, and to provide boundary conditions for regional models. By applying machine learning algorithms, we constructed the first publicly available daily gridded dataset of in situ-produced biogenic methanesulfonic acid (MSA) and non-sea-salt sulfate (nss-SO4=) concentrations covering the North Atlantic. The dataset is of high spatial resolution (0.25° × 0.25°) and spans 25 years (1998–2022), far exceeding what observations alone could achieve both spatially and temporally. The machine learning models were generated by combining in situ observations of sulfur aerosol data from Mace Head Atmospheric Research Station, located on the west coast of Ireland, and from the North Atlantic Aerosols and Marine Ecosystems Study (NAAMES) cruises in the northwestern Atlantic with the constructed sea-to-air dimethylsulfide flux (FDMS) and ECMWF ERA5 reanalysis datasets. To determine the optimal method for regression, we employed five machine learning model types: support vector machines, decision tree, regression ensemble, Gaussian process regression, and artificial neural networks. A comparison of the mean absolute error (MAE), root-mean-square error (RMSE), and coefficient of determination (R2) revealed that Gaussian process regression (GPR) was the most effective algorithm, outperforming the other models with respect to simulating the biogenic MSA and nss-SO4= concentrations. For predicting daily MSA (nss-SO4=), GPR displayed the highest R2 value of 0.86 (0.72) and the lowest MAE of 0.014 (0.10) µg m−3. GPR partial dependence analysis suggests that the relationships between predictors and MSA and nss-SO4= concentrations are complex rather than linear. Using the GPR algorithm, we produced a high-resolution daily dataset of in situ-produced biogenic MSA and nss-SO4= sea-level concentrations over the North Atlantic, which we named “In-situ Produced Biogenic Methanesulfonic Acid and Sulfate over the North Atlantic” (IPB-MSA&SO4). The obtained IPB-MSA&SO4 data allowed us to analyze the spatiotemporal patterns of MSA and nss-SO4= as well as the ratio between them (MSA:nss-SO4=). A comparison with the existing Copernicus Atmosphere Monitoring Service ECMWF Atmospheric Composition Reanalysis 4 (CAMS-EAC4) reanalysis suggested that our high-resolution dataset reproduces the spatial and temporal patterns of the biogenic sulfur aerosol concentration with high accuracy and has high consistency with independent measurements in the Atlantic Ocean. IPB-MSA&SO4 is publicly available at https://doi.org/10.17632/j8bzd5dvpx.1 (Mansour et al., 2023b).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":null,"pages":null},"PeriodicalIF":11.2000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth System Science Data","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.5194/essd-16-2717-2024","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract. Accurate long-term marine-derived biogenic sulfur aerosol concentrations at high spatial and temporal resolutions are critical for a wide range of studies, including climatology, trend analysis, and model evaluation; this information is also imperative for the accurate investigation of the contribution of marine-derived biogenic sulfur aerosol concentrations to the aerosol burden, for the elucidation of their radiative impacts, and to provide boundary conditions for regional models. By applying machine learning algorithms, we constructed the first publicly available daily gridded dataset of in situ-produced biogenic methanesulfonic acid (MSA) and non-sea-salt sulfate (nss-SO4=) concentrations covering the North Atlantic. The dataset is of high spatial resolution (0.25° × 0.25°) and spans 25 years (1998–2022), far exceeding what observations alone could achieve both spatially and temporally. The machine learning models were generated by combining in situ observations of sulfur aerosol data from Mace Head Atmospheric Research Station, located on the west coast of Ireland, and from the North Atlantic Aerosols and Marine Ecosystems Study (NAAMES) cruises in the northwestern Atlantic with the constructed sea-to-air dimethylsulfide flux (FDMS) and ECMWF ERA5 reanalysis datasets. To determine the optimal method for regression, we employed five machine learning model types: support vector machines, decision tree, regression ensemble, Gaussian process regression, and artificial neural networks. A comparison of the mean absolute error (MAE), root-mean-square error (RMSE), and coefficient of determination (R2) revealed that Gaussian process regression (GPR) was the most effective algorithm, outperforming the other models with respect to simulating the biogenic MSA and nss-SO4= concentrations. For predicting daily MSA (nss-SO4=), GPR displayed the highest R2 value of 0.86 (0.72) and the lowest MAE of 0.014 (0.10) µg m−3. GPR partial dependence analysis suggests that the relationships between predictors and MSA and nss-SO4= concentrations are complex rather than linear. Using the GPR algorithm, we produced a high-resolution daily dataset of in situ-produced biogenic MSA and nss-SO4= sea-level concentrations over the North Atlantic, which we named “In-situ Produced Biogenic Methanesulfonic Acid and Sulfate over the North Atlantic” (IPB-MSA&SO4). The obtained IPB-MSA&SO4 data allowed us to analyze the spatiotemporal patterns of MSA and nss-SO4= as well as the ratio between them (MSA:nss-SO4=). A comparison with the existing Copernicus Atmosphere Monitoring Service ECMWF Atmospheric Composition Reanalysis 4 (CAMS-EAC4) reanalysis suggested that our high-resolution dataset reproduces the spatial and temporal patterns of the biogenic sulfur aerosol concentration with high accuracy and has high consistency with independent measurements in the Atlantic Ocean. IPB-MSA&SO4 is publicly available at https://doi.org/10.17632/j8bzd5dvpx.1 (Mansour et al., 2023b).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
IPB-MSA&SO4:基于机器学习的 1998-2022 年北大西洋上空每日 0.25° 分辨率生物源甲烷磺酸和硫酸盐数据集
摘要。准确的高时空分辨率海洋生物源硫气溶胶长期浓度对于气候学、趋势分析和模式评估等广泛研究至关重要;这些信息对于准确调查海洋生物源硫气溶胶浓度对气溶胶负荷的贡献、阐明其辐射影响以及为区域模式提供边界条件也是必不可少的。通过应用机器学习算法,我们构建了首个可公开获取的覆盖北大西洋的原地产生的生物源甲磺酸(MSA)和非海盐硫酸盐(nss-SO4=)浓度的日网格数据集。该数据集空间分辨率高(0.25°×0.25°),时间跨度长达 25 年(1998-2022 年),在空间和时间上都远远超过了单靠观测所能达到的水平。机器学习模型是通过将爱尔兰西海岸梅斯海德大气研究站和西北大西洋北大西洋气溶胶和海洋生态系统研究(NAAMES)巡航的硫气溶胶原位观测数据与构建的海气二甲基硫通量(FDMS)和 ECMWF ERA5 再分析数据集相结合而生成的。为了确定最佳回归方法,我们采用了五种机器学习模型类型:支持向量机、决策树、回归集合、高斯过程回归和人工神经网络。通过比较平均绝对误差(MAE)、均方根误差(RMSE)和判定系数(R2)发现,高斯过程回归(GPR)是最有效的算法,在模拟生物源 MSA 和 nss-SO4= 浓度方面优于其他模型。在预测每日 MSA(nss-SO4=)时,GPR 的 R2 值最高,为 0.86(0.72),MAE 最低,为 0.014(0.10) µg m-3。GPR 部分依存分析表明,预测因子与 MSA 和 nss-SO4= 浓度之间的关系是复杂的,而非线性的。利用 GPR 算法,我们生成了北大西洋上空原位生成的生物甲烷磺酸和 nss-SO4= 海平面浓度的高分辨率日数据集,并将其命名为 "北大西洋上空原位生成的生物甲烷磺酸和硫酸盐"(IPB-MSA&SO4)。利用获得的 IPB-MSA&SO4 数据,我们可以分析 MSA 和 nss-SO4= 的时空模式以及它们之间的比例(MSA:nss-SO4=)。与现有的哥白尼大气监测服务 ECMWF 大气成分再分析 4(CAMS-EAC4)再分析的比较表明,我们的高分辨率数据集高度准确地再现了生物源硫气溶胶浓度的时空模式,并与大西洋的独立测量结果高度一致。IPB-MSA&SO4 可在 https://doi.org/10.17632/j8bzd5dvpx.1 上公开获取(Mansour 等,2023b)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Earth System Science Data
Earth System Science Data GEOSCIENCES, MULTIDISCIPLINARYMETEOROLOGY-METEOROLOGY & ATMOSPHERIC SCIENCES
CiteScore
18.00
自引率
5.30%
发文量
231
审稿时长
35 weeks
期刊介绍: Earth System Science Data (ESSD) is an international, interdisciplinary journal that publishes articles on original research data in order to promote the reuse of high-quality data in the field of Earth system sciences. The journal welcomes submissions of original data or data collections that meet the required quality standards and have the potential to contribute to the goals of the journal. It includes sections dedicated to regular-length articles, brief communications (such as updates to existing data sets), commentaries, review articles, and special issues. ESSD is abstracted and indexed in several databases, including Science Citation Index Expanded, Current Contents/PCE, Scopus, ADS, CLOCKSS, CNKI, DOAJ, EBSCO, Gale/Cengage, GoOA (CAS), and Google Scholar, among others.
期刊最新文献
Reconstructing long-term (1980–2022) daily ground particulate matter concentrations in India (LongPMInd) Distributions of in situ parameters, dissolved (in)organic carbon, and nutrients in the water column and pore waters of Arctic fjords (western Spitsbergen) during a melting season Insights from a topo-bathymetric and oceanographic dataset for coastal flooding studies: the French Flooding Prevention Action Program of Saint-Malo Retrieval of dominant methane (CH4) emission sources, the first high-resolution (1–2 m) dataset of storage tanks of China in 2000–2021 Climatological distribution of ocean acidification variables along the North American ocean margins
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1