The amount of research data in Earth System Modelling is growing fast, and so is the demand for solutions to keep data and workflows archived according to the FAIR principles—findable, accessible, interoperable and re-usable. In practice, numerous simulations are carried out during model development and tuning, in which different model versions or parameter values are tested. Often, this approach leads to intransparent workflows and legacy data sets that lack findability and re-usability criteria. Here, we present a strategy to facilitate the FAIRness of the active model testing workflow, starting with making existing legacy data sets findable and re-usable retrospectively, and automating FAIR workflows for subsequent data analysis and further model development using a semantic data managment framework. We provide a general strategy, specific use case and technical implementation of the required steps, i.e., inventorisation, integration, documentation and analysis using an example simulation legacy data set. The technical solution is implemented based on the open source semantic research data management system LinkAhead. The crawler in the LinkAhead framework automatically extracts relevant parameters for subsequent data analysis. A bidirectional connection of the database to a Jupyter Notebook enables seamless access of data and metadata through semantic queries, as well as storage of data analysis scripts and outputs linked to the original data in a FAIR manner. A major advantage of this approach is its flexibility: the crawler itself leaves the original data file structure untouched and can iteratively be adapted to variations in the data structure. FAIR workflows in model development, especially at the group or project level, avoid unneccessary repetition of simulations due to lacking findability, therefore enhance efficiency of model development and reduce computation time and energy. Such data integration tools enhance sustainable management of research data in Geosciences.
{"title":"FAIR Workflows in Earth System Modelling: A Use Case With Semantic Data Management","authors":"Sinikka T. Lennartz, Alexander Schlemmer","doi":"10.1002/gdj3.70055","DOIUrl":"https://doi.org/10.1002/gdj3.70055","url":null,"abstract":"<p>The amount of research data in Earth System Modelling is growing fast, and so is the demand for solutions to keep data and workflows archived according to the FAIR principles—findable, accessible, interoperable and re-usable. In practice, numerous simulations are carried out during model development and tuning, in which different model versions or parameter values are tested. Often, this approach leads to intransparent workflows and legacy data sets that lack findability and re-usability criteria. Here, we present a strategy to facilitate the FAIRness of the active model testing workflow, starting with making existing legacy data sets findable and re-usable retrospectively, and automating FAIR workflows for subsequent data analysis and further model development using a semantic data managment framework. We provide a general strategy, specific use case and technical implementation of the required steps, i.e., inventorisation, integration, documentation and analysis using an example simulation legacy data set. The technical solution is implemented based on the open source semantic research data management system LinkAhead. The crawler in the LinkAhead framework automatically extracts relevant parameters for subsequent data analysis. A bidirectional connection of the database to a Jupyter Notebook enables seamless access of data and metadata through semantic queries, as well as storage of data analysis scripts and outputs linked to the original data in a FAIR manner. A major advantage of this approach is its flexibility: the crawler itself leaves the original data file structure untouched and can iteratively be adapted to variations in the data structure. FAIR workflows in model development, especially at the group or project level, avoid unneccessary repetition of simulations due to lacking findability, therefore enhance efficiency of model development and reduce computation time and energy. Such data integration tools enhance sustainable management of research data in Geosciences.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"13 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70055","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146007658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The volume of tide gauge data available to the sea level community has grown substantially, with information distributed across numerous global, national, and institutional data centres. As a result, the main challenge is no longer accessing data, but identifying the most relevant dataset for a given application. Currently, more than 15 global data centres provide sea level information, each tailored to different users and use cases (e.g., real-time monitoring, delayed-mode analysis, monthly means). For users unfamiliar with tide gauge data, selecting the appropriate source can be difficult. Tide Gauge CATalog (TGCAT) is a software tool developed to address this challenge. It helps users discover where specific tide gauge data are available and assists data providers and centres in identifying inconsistencies, such as misreferenced stations or discrepancies in metadata. TGCAT collects metadata from global and national sea level data centres to produce intercomparable catalogues. It also allows visualisation of data availability timelines across multiple sources. Written entirely in Python and linked to an online dashboard (www.sonel.org/tgcat), TGCAT is designed as an open, community-based platform. Its goal is to improve data discoverability, support better referencing practices, and help users navigate the complex landscape of tide gauge data portals.
{"title":"TGCAT—A Tool to Analyse the Content of Sea Level Data Portals","authors":"Laurent Testut, Adrien Laval, Clémence Chupin, Mikaël Guichard, Begoña Pérez Gómez","doi":"10.1002/gdj3.70047","DOIUrl":"https://doi.org/10.1002/gdj3.70047","url":null,"abstract":"<p>The volume of tide gauge data available to the sea level community has grown substantially, with information distributed across numerous global, national, and institutional data centres. As a result, the main challenge is no longer accessing data, but identifying the most relevant dataset for a given application. Currently, more than 15 global data centres provide sea level information, each tailored to different users and use cases (e.g., real-time monitoring, delayed-mode analysis, monthly means). For users unfamiliar with tide gauge data, selecting the appropriate source can be difficult. Tide Gauge CATalog (TGCAT) is a software tool developed to address this challenge. It helps users discover where specific tide gauge data are available and assists data providers and centres in identifying inconsistencies, such as misreferenced stations or discrepancies in metadata. TGCAT collects metadata from global and national sea level data centres to produce intercomparable catalogues. It also allows visualisation of data availability timelines across multiple sources. Written entirely in Python and linked to an online dashboard (www.sonel.org/tgcat), TGCAT is designed as an open, community-based platform. Its goal is to improve data discoverability, support better referencing practices, and help users navigate the complex landscape of tide gauge data portals.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"13 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70047","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146002023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jhonatan Rafael Zárate-Salazar, Eduardo Vinicius da Silva Oliveira, Wadson de Jesus Correia, Sidney Feitosa Gouveia
Comprehensive soil data at multiple depths are essential for climate change assessments, biodiversity analyses, and land-use monitoring. However, generating such information through laboratory analyses is costly and labor-intensive. Existing global resources, such as SoilGrids and statistical packages like geodata, provide raster data for soil layers at 0–5 cm, 5–15 cm, and 15–30 cm, but many applications require alternative depth intervals—for example, 0–20 cm for carbon stock modelling (e.g., CENTURY) or 0–30 cm for ecosystem services (e.g., InVEST) and studies of plant growth, water availability, and pollutant dynamics. To address this gap, we developed a global database of raster soil data at depths of 0–10 cm, 0–15 cm, 0–20 cm, 0–25 cm, and 0–30 cm. Soil attributes at specific depths were computed through interpolation and weighted averaging of SoilGrids rasters and validated against empirical soil data from multiple continents, compiled from major global repositories and peer-reviewed studies. Standard statistical procedures confirmed the robust accuracy of the interpolated rasters. The database provides values for bulk density (Mg m−3), soil texture (%), soil acidity (pH), total nitrogen (g kg−1), soil organic carbon (g kg−1), and soil organic carbon stock (Mg ha−1). Potential applications include (i) biogeochemical modelling of soil carbon, (ii) aboveground biomass modelling, (iii) species distribution modelling, (iv) biodiversity assessments, and (v) ecosystem diagnostics. The dataset is open access under a Creative Commons licence, adheres to FAIR (Findable, Accessible, Interoperable, and Reusable) principles, and is hosted on Zenodo due to its large size, links access: https://doi.org/10.5281/zenodo.14721139. Users are kindly requested to cite this paper when employing the dataset.
{"title":"Multi-Depth Soil Attributes Dataset","authors":"Jhonatan Rafael Zárate-Salazar, Eduardo Vinicius da Silva Oliveira, Wadson de Jesus Correia, Sidney Feitosa Gouveia","doi":"10.1002/gdj3.70052","DOIUrl":"https://doi.org/10.1002/gdj3.70052","url":null,"abstract":"<p>Comprehensive soil data at multiple depths are essential for climate change assessments, biodiversity analyses, and land-use monitoring. However, generating such information through laboratory analyses is costly and labor-intensive. Existing global resources, such as SoilGrids and statistical packages like <i>geodata</i>, provide raster data for soil layers at 0–5 cm, 5–15 cm, and 15–30 cm, but many applications require alternative depth intervals—for example, 0–20 cm for carbon stock modelling (e.g., CENTURY) or 0–30 cm for ecosystem services (e.g., InVEST) and studies of plant growth, water availability, and pollutant dynamics. To address this gap, we developed a global database of raster soil data at depths of 0–10 cm, 0–15 cm, 0–20 cm, 0–25 cm, and 0–30 cm. Soil attributes at specific depths were computed through interpolation and weighted averaging of SoilGrids rasters and validated against empirical soil data from multiple continents, compiled from major global repositories and peer-reviewed studies. Standard statistical procedures confirmed the robust accuracy of the interpolated rasters. The database provides values for bulk density (Mg m<sup>−3</sup>), soil texture (%), soil acidity (pH), total nitrogen (g kg<sup>−1</sup>), soil organic carbon (g kg<sup>−1</sup>), and soil organic carbon stock (Mg ha<sup>−1</sup>). Potential applications include (i) biogeochemical modelling of soil carbon, (ii) aboveground biomass modelling, (iii) species distribution modelling, (iv) biodiversity assessments, and (v) ecosystem diagnostics. The dataset is open access under a Creative Commons licence, adheres to FAIR (Findable, Accessible, Interoperable, and Reusable) principles, and is hosted on Zenodo due to its large size, links access: https://doi.org/10.5281/zenodo.14721139. Users are kindly requested to cite this paper when employing the dataset.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"13 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70052","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Lerch, Benedikt Schulz, Reinhold Hess, Annette Möller, Cristina Primo, Sebastian Trepte, Susanne Theis
We present the CIENS dataset, which contains ensemble weather forecasts from the operational convection-permitting numerical weather prediction model of the German Weather Service. It comprises forecasts for 55 meteorological variables mapped to the locations of synoptic stations, as well as additional spatially aggregated forecasts from surrounding grid points, available for a subset of these variables. Forecasts are available at hourly lead times from 0 to 21 h for two daily model runs initialised at 00 and 12 UTC, covering the period from December 2010 to June 2023. Additionally, the dataset provides station observations for six key variables at 170 locations across Germany: pressure, temperature, hourly precipitation accumulation, wind speed, wind direction, and wind gusts. Since the forecasts are mapped to the observed locations, the data is delivered in a convenient format for analysis. The CIENS dataset complements the growing collection of benchmark datasets for weather and climate modelling. A key distinguishing feature is its long temporal extent, which encompasses multiple updates to the underlying numerical weather prediction model and thus supports investigations into how forecasting methods can account for such changes. In addition to detailing the design and contents of the CIENS dataset, we outline potential applications in ensemble post-processing, forecast verification, and related research areas. A use case focused on ensemble post-processing illustrates the benefits of incorporating the rich set of available model predictors into machine learning-based forecasting models.
{"title":"Operational Convection-Permitting COSMO/ICON Ensemble Predictions at Observation Sites (CIENS)","authors":"Sebastian Lerch, Benedikt Schulz, Reinhold Hess, Annette Möller, Cristina Primo, Sebastian Trepte, Susanne Theis","doi":"10.1002/gdj3.70051","DOIUrl":"https://doi.org/10.1002/gdj3.70051","url":null,"abstract":"<p>We present the CIENS dataset, which contains ensemble weather forecasts from the operational convection-permitting numerical weather prediction model of the German Weather Service. It comprises forecasts for 55 meteorological variables mapped to the locations of synoptic stations, as well as additional spatially aggregated forecasts from surrounding grid points, available for a subset of these variables. Forecasts are available at hourly lead times from 0 to 21 h for two daily model runs initialised at 00 and 12 UTC, covering the period from December 2010 to June 2023. Additionally, the dataset provides station observations for six key variables at 170 locations across Germany: pressure, temperature, hourly precipitation accumulation, wind speed, wind direction, and wind gusts. Since the forecasts are mapped to the observed locations, the data is delivered in a convenient format for analysis. The CIENS dataset complements the growing collection of benchmark datasets for weather and climate modelling. A key distinguishing feature is its long temporal extent, which encompasses multiple updates to the underlying numerical weather prediction model and thus supports investigations into how forecasting methods can account for such changes. In addition to detailing the design and contents of the CIENS dataset, we outline potential applications in ensemble post-processing, forecast verification, and related research areas. A use case focused on ensemble post-processing illustrates the benefits of incorporating the rich set of available model predictors into machine learning-based forecasting models.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"13 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145887731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Colleen M. Kaul, Po-Lun Ma, Kyle G. Pressel, Jacob Shpund, Shuaiqi Tang, Mikhail Ovchinnikov, Meng Huang, Jerome Fast, Xiaojian Zheng, Xiquan Dong
We describe a library of atmospheric large eddy simulations (LES) of liquid-phase boundary layer clouds constructed to enable aerosol–cloud–turbulence interaction studies, support parameterization evaluation and development, and provide training data for machine learning applications. The simulations use a modern LES framework designed for high numerical accuracy, coupled to a detailed spectral bin microphysical scheme. Case studies are configured to represent observed conditions in four key global cloud regions—the Northeastern Atlantic, Northeastern Pacific, Continental United States and Southern Ocean—following a semi-idealised approach. The library also includes aerosol concentration halving and doubling experiments to expose the sensitivities of the case studies to aerosol perturbations. Simulation results are compared to observations on a case-by-case basis, then the library's coverage is evaluated in terms of spreads in meteorological factors and atmospheric boundary layer attributes.
{"title":"A Data Library of Liquid Clouds Modelled With a Large Eddy Simulation Framework","authors":"Colleen M. Kaul, Po-Lun Ma, Kyle G. Pressel, Jacob Shpund, Shuaiqi Tang, Mikhail Ovchinnikov, Meng Huang, Jerome Fast, Xiaojian Zheng, Xiquan Dong","doi":"10.1002/gdj3.70049","DOIUrl":"https://doi.org/10.1002/gdj3.70049","url":null,"abstract":"<p>We describe a library of atmospheric large eddy simulations (LES) of liquid-phase boundary layer clouds constructed to enable aerosol–cloud–turbulence interaction studies, support parameterization evaluation and development, and provide training data for machine learning applications. The simulations use a modern LES framework designed for high numerical accuracy, coupled to a detailed spectral bin microphysical scheme. Case studies are configured to represent observed conditions in four key global cloud regions—the Northeastern Atlantic, Northeastern Pacific, Continental United States and Southern Ocean—following a semi-idealised approach. The library also includes aerosol concentration halving and doubling experiments to expose the sensitivities of the case studies to aerosol perturbations. Simulation results are compared to observations on a case-by-case basis, then the library's coverage is evaluated in terms of spreads in meteorological factors and atmospheric boundary layer attributes.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"13 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70049","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145887730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felix Pohl, Martin Schrön, Corinna Rebmann, Luis Samaniego, Steffen Zacharias, Anke Hildebrandt
Long-term, spatially representative soil-moisture records are critical for characterising ecosystem responses to water availability. We present a decade-long (2014–2024) dataset of continuous soil-moisture observations from distributed in situ networks and cosmic-ray neutron sensing (CRNS) across a 1 ha temperate deciduous forest in Germany. Spatial sensor coverage varied over time and challenged the derivation of a consistent spatial average due to the persistence of soil moisture patterns. We therefore implemented a semi-automatic workflow that (i) identifies reference periods via a bootstrap-based minimum required number of sensors (MRNS) and (ii) maps point measurements to the field-scale distribution using empirical CDF transformation. The resulting record provides a coherent long-term signal suitable for ecohydrological analyses and validation of remote-sensing products. Since any decade-scale monitoring will encounter sensor losses and replacements, we emphasise the critical role of robust data integration techniques to ensure the reliability of extended soil moisture datasets.
{"title":"From Points to Field Scale: A Decade of Soil-Moisture Monitoring in a German Deciduous Forest (2014–2024)","authors":"Felix Pohl, Martin Schrön, Corinna Rebmann, Luis Samaniego, Steffen Zacharias, Anke Hildebrandt","doi":"10.1002/gdj3.70053","DOIUrl":"https://doi.org/10.1002/gdj3.70053","url":null,"abstract":"<p>Long-term, spatially representative soil-moisture records are critical for characterising ecosystem responses to water availability. We present a decade-long (2014–2024) dataset of continuous soil-moisture observations from distributed in situ networks and cosmic-ray neutron sensing (CRNS) across a 1 ha temperate deciduous forest in Germany. Spatial sensor coverage varied over time and challenged the derivation of a consistent spatial average due to the persistence of soil moisture patterns. We therefore implemented a semi-automatic workflow that (i) identifies reference periods via a bootstrap-based minimum required number of sensors (MRNS) and (ii) maps point measurements to the field-scale distribution using empirical CDF transformation. The resulting record provides a coherent long-term signal suitable for ecohydrological analyses and validation of remote-sensing products. Since any decade-scale monitoring will encounter sensor losses and replacements, we emphasise the critical role of robust data integration techniques to ensure the reliability of extended soil moisture datasets.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"13 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145887653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific ocean drilling is crucial for understanding subsurface processes but remains vulnerable to geohazards such as blowouts and wellbore instability, posing catastrophic risks to operations, human safety, and ecosystems. Progress in pre-drill risk assessment remains limited due to the shortage of high-quality, integrated datasets for deep learning applications. Here, we release an interpretated Taranaki Basin dataset that combines geophysical, drilling data, and overpressure data to advance pre-drill geohazard assessment. The dataset contains 17 seismic attributes extracted along well trajectories and paired rock-physical interpretations. This enables characterisation of four drilling geohazard categories: normal (safe) formations, wellbore collapse, overpressure, and combined overpressure-collapse conditions. We propose three deep learning benchmarks: an unsupervised clustering model using solely seismic attributes, an informed model incorporating geological prior as constraints, and an enhanced variant of the informed model using increased trainable parameters. These benchmarks evaluate the ability of seismic data, and its integration with complementary data, to distinguish drilling geohazard factors. Validation against traditional methods highlights the dataset's utility for advancing predictive geohazard frameworks. This work promotes risk mitigation, fosters collaboration, and enables reproducible research.
{"title":"A Deep Learning Dataset for Pre-Drill Geohazard Assessment in Taranaki Basin New Zealand","authors":"Zhi Geng, Zhijing Bai, Yan Cui, Yanfei Wang, Wenyong Pan, Caixia Yu, Hongzhou Zhang","doi":"10.1002/gdj3.70046","DOIUrl":"https://doi.org/10.1002/gdj3.70046","url":null,"abstract":"<p>Scientific ocean drilling is crucial for understanding subsurface processes but remains vulnerable to geohazards such as blowouts and wellbore instability, posing catastrophic risks to operations, human safety, and ecosystems. Progress in pre-drill risk assessment remains limited due to the shortage of high-quality, integrated datasets for deep learning applications. Here, we release an interpretated Taranaki Basin dataset that combines geophysical, drilling data, and overpressure data to advance pre-drill geohazard assessment. The dataset contains 17 seismic attributes extracted along well trajectories and paired rock-physical interpretations. This enables characterisation of four drilling geohazard categories: normal (safe) formations, wellbore collapse, overpressure, and combined overpressure-collapse conditions. We propose three deep learning benchmarks: an unsupervised clustering model using solely seismic attributes, an informed model incorporating geological prior as constraints, and an enhanced variant of the informed model using increased trainable parameters. These benchmarks evaluate the ability of seismic data, and its integration with complementary data, to distinguish drilling geohazard factors. Validation against traditional methods highlights the dataset's utility for advancing predictive geohazard frameworks. This work promotes risk mitigation, fosters collaboration, and enables reproducible research.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"13 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olesya Skrynyk, Jürg Luterbacher, Rob Allan, Vladyslav Sidenko, Kateryna Saloid, Elena Xoplaki, Oleg Skrynyk, Volodymyr Osadchyi
In this study, we present ClimUAsd-stn.v2, an updated publicly open version of the digital dataset of the earliest instrumental weather observation conducted in Ukraine. Compared to the first version, ClimUAsd-stn.v2 extends the temporal coverage of the rescued meteorological records from 1808–1850 to 1808–1880 and incorporates data from four additional stations. The dataset primarily consists of rescued sub-daily measurements of air temperature (ta) and atmospheric pressure (p). In its updated version, ClimUAsd-stn.v2 includes an additional 180,101 temperature and 167,274 pressure records, bringing the total to 334,963 and 290,608 values, respectively. These measurements were collected at 12 stations during the 19th century. The rescued time series were quality controlled using the dataresqc software. Its iterative use revealed 1714 (~0.5%) erroneous and 6865 (~1.1%) suspicious values. In addition, the rescued meteorological data were compared with corresponding reference time series derived from the independent 20CRv3 historical reanalysis. The qualitative comparison (through visual inspection of time series plots) helped to identify 6624 (~1.1%) errors that remained in ClimUAsd-stn.v2 after the dataresqc application. The quantitative statistical comparison (performed after the correction of all detected errors) demonstrated generally good agreement between the rescued records and the reanalysis data. The ClimUAsd-stn.v2 dataset contributes to the update of the already existing digitised Ukrainian archives of original meteorological measurements in the 19th century. The rescued data have great potential to be used in regional climate analysis and improve historical reanalysis. In addition, they can be used to enhance regional and global historical reanalyses, refine understanding of climate variability and compound extremes, and support interdisciplinary studies linking past weather, societal impacts and environmental crises.
{"title":"Early Instrumental Weather Observations From Ukraine: The ClimUAsd-Stn.v2 Dataset, 1808–1880","authors":"Olesya Skrynyk, Jürg Luterbacher, Rob Allan, Vladyslav Sidenko, Kateryna Saloid, Elena Xoplaki, Oleg Skrynyk, Volodymyr Osadchyi","doi":"10.1002/gdj3.70050","DOIUrl":"https://doi.org/10.1002/gdj3.70050","url":null,"abstract":"<p>In this study, we present ClimUAsd-stn.v2, an updated publicly open version of the digital dataset of the earliest instrumental weather observation conducted in Ukraine. Compared to the first version, ClimUAsd-stn.v2 extends the temporal coverage of the rescued meteorological records from 1808–1850 to 1808–1880 and incorporates data from four additional stations. The dataset primarily consists of rescued sub-daily measurements of air temperature (ta) and atmospheric pressure (p). In its updated version, ClimUAsd-stn.v2 includes an additional 180,101 temperature and 167,274 pressure records, bringing the total to 334,963 and 290,608 values, respectively. These measurements were collected at 12 stations during the 19th century. The rescued time series were quality controlled using the <i>dataresqc</i> software. Its iterative use revealed 1714 (~0.5%) erroneous and 6865 (~1.1%) suspicious values. In addition, the rescued meteorological data were compared with corresponding reference time series derived from the independent 20CRv3 historical reanalysis. The qualitative comparison (through visual inspection of time series plots) helped to identify 6624 (~1.1%) errors that remained in ClimUAsd-stn.v2 after the <i>dataresqc</i> application. The quantitative statistical comparison (performed after the correction of all detected errors) demonstrated generally good agreement between the rescued records and the reanalysis data. The ClimUAsd-stn.v2 dataset contributes to the update of the already existing digitised Ukrainian archives of original meteorological measurements in the 19th century. The rescued data have great potential to be used in regional climate analysis and improve historical reanalysis. In addition, they can be used to enhance regional and global historical reanalyses, refine understanding of climate variability and compound extremes, and support interdisciplinary studies linking past weather, societal impacts and environmental crises.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"13 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The weather station on the Loughborough University campus underwent refurbishment and upgrade in 2007, and this contribution reports on the outcome of 14 subsequent years of meteorological data collection there, before a further episode of upgrading. Data collection is described, with emphasis on the continuity or lack of continuity of the variables monitored. Out of 136 instrument-years deployment, only 36 are less than 90% complete, and 21 less than 75% complete. Data processing discusses the method of retrieving 0900-0900 temperature maxima and minima and rainfall totals, to correspond to the standard UK and Ireland Climatological Day. As an independent check on the probable reliability of the campus weather dataset, values are correlated with and regressed against co-located values extracted from the UK Met Office HadUK-grid dataset. Campus temperatures are slightly, but consistently, higher than those indicated by HadUK-grid, while HadUK-grid rainfall is on average almost 10% higher than that recorded on the campus. Trend-free statistical relationships between campus and HadUK-grid data imply that there is unlikely to be any significant temporal bias in the campus dataset. The contribution concludes with a consideration of recent and potential future applications of the dataset.
{"title":"A 14-Year Meteorological Dataset From a University Campus in the East Midlands of the UK","authors":"Richard Hodgkins","doi":"10.1002/gdj3.70048","DOIUrl":"https://doi.org/10.1002/gdj3.70048","url":null,"abstract":"<p>The weather station on the Loughborough University campus underwent refurbishment and upgrade in 2007, and this contribution reports on the outcome of 14 subsequent years of meteorological data collection there, before a further episode of upgrading. Data collection is described, with emphasis on the continuity or lack of continuity of the variables monitored. Out of 136 instrument-years deployment, only 36 are less than 90% complete, and 21 less than 75% complete. Data processing discusses the method of retrieving 0900-0900 temperature maxima and minima and rainfall totals, to correspond to the standard UK and Ireland Climatological Day. As an independent check on the probable reliability of the campus weather dataset, values are correlated with and regressed against co-located values extracted from the UK Met Office HadUK-grid dataset. Campus temperatures are slightly, but consistently, higher than those indicated by HadUK-grid, while HadUK-grid rainfall is on average almost 10% higher than that recorded on the campus. Trend-free statistical relationships between campus and HadUK-grid data imply that there is unlikely to be any significant temporal bias in the campus dataset. The contribution concludes with a consideration of recent and potential future applications of the dataset.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"13 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As an indicator associated with dynamic non-equilibrium fractionation and less affected by equilibrium condensation temperature, d-excess holds unique application potential in geographical science research. Its continuous observation is crucial for understanding the water cycle, mechanisms of extreme precipitation, and evaluating atmospheric circulation models. However, the current scarcity of datasets with high spatiotemporal resolution has resulted in an unclear understanding of the spatiotemporal variation processes of d-excess. Meanwhile, existing isotope general circulation models (iGCMs) suffer from issues such as high complexity and significant discrepancies in results, which hinder the advancement of in-depth research. To this end, this study constructed d-excess datasets with different time series based on 98,423 stable isotope records, in which the data is mainly concentrated in 1965–2021. The results show that the variations of d-excess differ significantly across different climate types and time scales. This highlights the research gap where the dominant factors of d-excess in the boundary layer remain unclear. The d-excess dataset constructed lays a foundation for the application of iGCMs in geographical science fields such as boundary layer process exploration, and improvement of earth system models.
{"title":"Global Precipitation d-excess Dataset: A Critical Resource for Geographical Science Research","authors":"Baijun Shang, Guofeng Zhu, Tong Li, Hui Gao, Feng Wang, Zhibin Zhou, Tonggang Fu","doi":"10.1002/gdj3.70043","DOIUrl":"https://doi.org/10.1002/gdj3.70043","url":null,"abstract":"<p>As an indicator associated with dynamic non-equilibrium fractionation and less affected by equilibrium condensation temperature, d-excess holds unique application potential in geographical science research. Its continuous observation is crucial for understanding the water cycle, mechanisms of extreme precipitation, and evaluating atmospheric circulation models. However, the current scarcity of datasets with high spatiotemporal resolution has resulted in an unclear understanding of the spatiotemporal variation processes of d-excess. Meanwhile, existing isotope general circulation models (iGCMs) suffer from issues such as high complexity and significant discrepancies in results, which hinder the advancement of in-depth research. To this end, this study constructed d-excess datasets with different time series based on 98,423 stable isotope records, in which the data is mainly concentrated in 1965–2021. The results show that the variations of d-excess differ significantly across different climate types and time scales. This highlights the research gap where the dominant factors of d-excess in the boundary layer remain unclear. The d-excess dataset constructed lays a foundation for the application of iGCMs in geographical science fields such as boundary layer process exploration, and improvement of earth system models.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"13 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145739449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}