{"title":"Data mining-based machine learning methods for improving hydrological data a case study of salinity field in the Western Arctic Ocean","authors":"Shuhao Tao, Ling Du, Jiahao Li","doi":"10.5194/essd-2024-138","DOIUrl":null,"url":null,"abstract":"<strong>Abstract.</strong> In the Western Arctic Ocean lies the largest freshwater reservoir in the Arctic Ocean, the Beaufort Gyre. Long-term changes in freshwater reservoirs are critical for understanding the Arctic Ocean, and data from various sources, particularly measured or reanalyzed data, must be used to the greatest extent possible. Over the past two decades, a large number of intensive field observations and ship surveys have been conducted in the western Arctic Ocean to obtain a large amount of CTD data. Multiple machine learning methods were evaluated and merged to reconstruct annual salinity product in the western Arctic Ocean over the period 2003–2022. Data mining-based machine learning methods make use of variables determined by physical processes, such as sea level pressure, sea ice concentration, and drift. Our objective is to effectively manage the mean root mean square error (RMSE) of sea surface salinity, which exhibits greater susceptibility to atmospheric, sea ice, and oceanic changes. Considering the higher susceptibility of sea surface salinity to atmospheric, sea ice, and oceanic changes, which leads to greater variability, we ensured that the average root mean square error of CTD and EN4 sea surface salinity field during the machine learning training process was constrained within 0.25 psu. The machine learning process reveals that the uncertainty in predicting sea surface salinity, as constrained by CTD data, is 0.24 %, whereas when constrained by EN4 data it reduces to 0.02 %. During data merging and post-calibrating, the weight coefficients are constrained by imposing limitations on the uncertainty value. Compared with commonly used EN4 and ORAS5 salinity in the Arctic Ocean, our salinity product provide more accurate descriptions of freshwater content in the Beaufort Gyre and depth variations at its halocline base. The application potential of this multi-machine learning results approach for evaluating and integrating extends beyond the salinity field, encompassing hydrometeorology, sea ice thickness, polar biogeochemistry, and other related fields. The datasets are available at https://zenodo.org/records/10990138 (Tao and Du, 2024).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"84 1","pages":""},"PeriodicalIF":11.2000,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth System Science Data","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.5194/essd-2024-138","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract. In the Western Arctic Ocean lies the largest freshwater reservoir in the Arctic Ocean, the Beaufort Gyre. Long-term changes in freshwater reservoirs are critical for understanding the Arctic Ocean, and data from various sources, particularly measured or reanalyzed data, must be used to the greatest extent possible. Over the past two decades, a large number of intensive field observations and ship surveys have been conducted in the western Arctic Ocean to obtain a large amount of CTD data. Multiple machine learning methods were evaluated and merged to reconstruct annual salinity product in the western Arctic Ocean over the period 2003–2022. Data mining-based machine learning methods make use of variables determined by physical processes, such as sea level pressure, sea ice concentration, and drift. Our objective is to effectively manage the mean root mean square error (RMSE) of sea surface salinity, which exhibits greater susceptibility to atmospheric, sea ice, and oceanic changes. Considering the higher susceptibility of sea surface salinity to atmospheric, sea ice, and oceanic changes, which leads to greater variability, we ensured that the average root mean square error of CTD and EN4 sea surface salinity field during the machine learning training process was constrained within 0.25 psu. The machine learning process reveals that the uncertainty in predicting sea surface salinity, as constrained by CTD data, is 0.24 %, whereas when constrained by EN4 data it reduces to 0.02 %. During data merging and post-calibrating, the weight coefficients are constrained by imposing limitations on the uncertainty value. Compared with commonly used EN4 and ORAS5 salinity in the Arctic Ocean, our salinity product provide more accurate descriptions of freshwater content in the Beaufort Gyre and depth variations at its halocline base. The application potential of this multi-machine learning results approach for evaluating and integrating extends beyond the salinity field, encompassing hydrometeorology, sea ice thickness, polar biogeochemistry, and other related fields. The datasets are available at https://zenodo.org/records/10990138 (Tao and Du, 2024).
Earth System Science DataGEOSCIENCES, MULTIDISCIPLINARYMETEOROLOGY-METEOROLOGY & ATMOSPHERIC SCIENCES
CiteScore
18.00
自引率
5.30%
发文量
231
审稿时长
35 weeks
期刊介绍:
Earth System Science Data (ESSD) is an international, interdisciplinary journal that publishes articles on original research data in order to promote the reuse of high-quality data in the field of Earth system sciences. The journal welcomes submissions of original data or data collections that meet the required quality standards and have the potential to contribute to the goals of the journal. It includes sections dedicated to regular-length articles, brief communications (such as updates to existing data sets), commentaries, review articles, and special issues. ESSD is abstracted and indexed in several databases, including Science Citation Index Expanded, Current Contents/PCE, Scopus, ADS, CLOCKSS, CNKI, DOAJ, EBSCO, Gale/Cengage, GoOA (CAS), and Google Scholar, among others.