Scientific Data最新文献_第6页

Global Intensity-Duration-Frequency curves based on observed sub-daily rainfall (GSDR-IDF). 基于观测亚日降水的全球强度-持续时间-频率曲线。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06858-4

Amy C Green, Selma B Guerreiro, Hayley J Fowler

Short-duration extreme rainfall events can cause flash flooding and infrastructure failures, yet resources to assess these remain limited, particularly at the global scale. Heterogeneous data availability, inconsistent quality control, and methodological differences hinder the development of comparable intensity-duration-frequency (IDF) estimates. To address this gap, we present GSDR-IDF, a global dataset of intensity-duration-frequency curves derived from the largest quality-controlled sub-daily rain gauge dataset: the Global Sub-Daily Rainfall dataset (GSDR), comprising +24,000 hourly rain gauge records for all major climate regions. We apply robust extreme value analysis methods, including single-gauge and regional frequency approaches, to estimate return levels for 1-, 3-, 6- and 24-hour durations and for 10-, 30-, and 100-year return levels. These are then combined to give IDF curves for each rain gauge, providing an openly accessible, traceable, and reproducible resource for hydrological modelling, engineering design, flood-risk assessment and climate-resilience planning. This dataset represents a step change in accessibility and precision for global IDF estimation and enables a wide range of cross-disciplinary applications.

持续时间短的极端降雨事件可能导致山洪暴发和基础设施故障，但评估这些事件的资源仍然有限，尤其是在全球范围内。异构的数据可用性、不一致的质量控制和方法差异阻碍了可比较的强度-持续时间-频率（IDF）估计的发展。为了解决这一差距，我们提出了GSDR- idf，这是一个强度-持续时间-频率曲线的全球数据集，源自最大的质量控制亚日雨量数据集：全球亚日雨量数据集（GSDR），包括所有主要气候区的24000个小时雨量记录。我们采用稳健的极值分析方法，包括单测量和区域频率方法，来估计1年、3年、6年和24小时的回报水平，以及10年、30年和100年的回报水平。然后将这些数据结合起来，给出每个雨量计的IDF曲线，为水文建模、工程设计、洪水风险评估和气候适应性规划提供一个可公开获取、可追溯和可复制的资源。该数据集代表了全球IDF估计在可访问性和精度方面的一个步骤变化，并实现了广泛的跨学科应用。

{"title":"Global Intensity-Duration-Frequency curves based on observed sub-daily rainfall (GSDR-IDF).","authors":"Amy C Green, Selma B Guerreiro, Hayley J Fowler","doi":"10.1038/s41597-026-06858-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06858-4","url":null,"abstract":"Short-duration extreme rainfall events can cause flash flooding and infrastructure failures, yet resources to assess these remain limited, particularly at the global scale. Heterogeneous data availability, inconsistent quality control, and methodological differences hinder the development of comparable intensity-duration-frequency (IDF) estimates. To address this gap, we present GSDR-IDF, a global dataset of intensity-duration-frequency curves derived from the largest quality-controlled sub-daily rain gauge dataset: the Global Sub-Daily Rainfall dataset (GSDR), comprising +24,000 hourly rain gauge records for all major climate regions. We apply robust extreme value analysis methods, including single-gauge and regional frequency approaches, to estimate return levels for 1-, 3-, 6- and 24-hour durations and for 10-, 30-, and 100-year return levels. These are then combined to give IDF curves for each rain gauge, providing an openly accessible, traceable, and reproducible resource for hydrological modelling, engineering design, flood-risk assessment and climate-resilience planning. This dataset represents a step change in accessibility and precision for global IDF estimation and enables a wide range of cross-disciplinary applications.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146198032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multisession fNIRS-EEG data of Post-Stroke Motor Recovery. Recordings During Intact and Paretic Hand Movements. 脑卒中后运动恢复的多时段fNIRS-EEG数据。完整和麻痹时的手部运动记录。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06803-5

Alexandra Medvedeva, Nikolay Syrov, Lev Yakovlev, Yana Alieva, Artemiy Berkmush-Antipova, Galina Ivanova, Natalia Shusharina, Alexander Kaplan

Accurate diagnosis and monitoring of recovery after stroke are critical for effective motor rehabilitation. As stroke is inherently associated with impaired cerebral blood flow, functional near-infrared spectroscopy (fNIRS) provides a valuable tool for assessing hemodynamic changes in the brain. When combined with electroencephalography (EEG), this multimodal approach can provide complementary insights into neural and vascular responses during recovery. However, longitudinal datasets combining fNIRS and EEG in stroke populations remain limited. The current article presents an open access dataset with simultaneous fNIRS and EEG recordings from 16 post-stroke patients over 84 rehabilitation sessions. Participants performed motor tasks with both paretic and intact hands. The dataset includes raw and processed signals, clinical scores (ARAT, Fugl-Meyer) and patient demographics. This resource supports research into stroke recovery, development of neurorehabilitation strategies and fNIRS-based brain computer interfaces (BCI).

准确的诊断和监测中风后的恢复是有效的运动康复的关键。由于中风与脑血流受损有内在联系，功能性近红外光谱（fNIRS）为评估脑血流动力学变化提供了一种有价值的工具。当与脑电图（EEG）相结合时，这种多模式方法可以提供对恢复期间神经和血管反应的补充见解。然而，在脑卒中人群中，结合fNIRS和EEG的纵向数据集仍然有限。本文提供了一个开放获取的数据集，其中包括16名中风后患者在84次康复期间的fNIRS和EEG记录。参与者用麻痹的手和完好的手完成运动任务。数据集包括原始和处理过的信号、临床评分（ARAT, Fugl-Meyer）和患者人口统计数据。该资源支持中风恢复研究、神经康复策略开发和基于fnir的脑机接口（BCI）。

{"title":"Multisession fNIRS-EEG data of Post-Stroke Motor Recovery. Recordings During Intact and Paretic Hand Movements.","authors":"Alexandra Medvedeva, Nikolay Syrov, Lev Yakovlev, Yana Alieva, Artemiy Berkmush-Antipova, Galina Ivanova, Natalia Shusharina, Alexander Kaplan","doi":"10.1038/s41597-026-06803-5","DOIUrl":"https://doi.org/10.1038/s41597-026-06803-5","url":null,"abstract":"Accurate diagnosis and monitoring of recovery after stroke are critical for effective motor rehabilitation. As stroke is inherently associated with impaired cerebral blood flow, functional near-infrared spectroscopy (fNIRS) provides a valuable tool for assessing hemodynamic changes in the brain. When combined with electroencephalography (EEG), this multimodal approach can provide complementary insights into neural and vascular responses during recovery. However, longitudinal datasets combining fNIRS and EEG in stroke populations remain limited. The current article presents an open access dataset with simultaneous fNIRS and EEG recordings from 16 post-stroke patients over 84 rehabilitation sessions. Participants performed motor tasks with both paretic and intact hands. The dataset includes raw and processed signals, clinical scores (ARAT, Fugl-Meyer) and patient demographics. This resource supports research into stroke recovery, development of neurorehabilitation strategies and fNIRS-based brain computer interfaces (BCI).","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146195430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CardioEHR: A longitudinal electronic health record dataset of cardiovascular patients from central China. CardioEHR：中国中部心血管患者的纵向电子健康记录数据集。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06855-7

Lingfeng Zha, Chengbo Fu, Xue Sha, Peijun Yin, Yanze Li

We present a longitudinal electronic health record (EHR) dataset from Wuhan Union Hospital, compiled from two distinct hospital information systems. The first dataset, derived from a legacy system, includes 35,243 patients and covers the period from 2010 to 2020. The second dataset, collected via the research-oriented YIDUYUN system, includes 37,975 patients and spans from 2011 to 2024. Both datasets provide structured and de-identified clinical information, including medical record number, demographics, diagnoses, admissions, discharges, timestamps record, laboratory test results (including COVID-19 test records) and patients' residential region. Using the patients' residential regions, we combined the data with information from the China Statistical Yearbook to collect regional socioeconomic indices. While not specifically designed for pandemic research, the dataset captures both pre-pandemic and post-pandemic periods with de-identified exact timestamps, making it suitable for analyzing long-term healthcare utilization, population behavior, and policy impacts. With comprehensive metadata and rigorous validation, this resource supports a wide range of applications in longitudinal health system research and data-driven modeling.

我们介绍了武汉协和医院的纵向电子健康记录（EHR）数据集，该数据集来自两个不同的医院信息系统。第一个数据集来自一个遗留系统，包括35243名患者，涵盖了2010年至2020年的时间。第二个数据集通过以研究为导向的易度云系统收集，包括37,975名患者，时间跨度从2011年到2024年。这两个数据集都提供结构化和去识别的临床信息，包括病历号、人口统计、诊断、入院、出院、时间戳记录、实验室检测结果（包括COVID-19检测记录）和患者居住地区。以患者居住区域为研究对象，结合《中国统计年鉴》的数据收集区域社会经济指标。虽然不是专门为大流行研究设计的，但该数据集以去识别的精确时间戳捕获了大流行前和大流行后的时期，使其适合分析长期医疗保健利用、人口行为和政策影响。通过全面的元数据和严格的验证，该资源支持纵向卫生系统研究和数据驱动建模中的广泛应用。

{"title":"CardioEHR: A longitudinal electronic health record dataset of cardiovascular patients from central China.","authors":"Lingfeng Zha, Chengbo Fu, Xue Sha, Peijun Yin, Yanze Li","doi":"10.1038/s41597-026-06855-7","DOIUrl":"https://doi.org/10.1038/s41597-026-06855-7","url":null,"abstract":"We present a longitudinal electronic health record (EHR) dataset from Wuhan Union Hospital, compiled from two distinct hospital information systems. The first dataset, derived from a legacy system, includes 35,243 patients and covers the period from 2010 to 2020. The second dataset, collected via the research-oriented YIDUYUN system, includes 37,975 patients and spans from 2011 to 2024. Both datasets provide structured and de-identified clinical information, including medical record number, demographics, diagnoses, admissions, discharges, timestamps record, laboratory test results (including COVID-19 test records) and patients' residential region. Using the patients' residential regions, we combined the data with information from the China Statistical Yearbook to collect regional socioeconomic indices. While not specifically designed for pandemic research, the dataset captures both pre-pandemic and post-pandemic periods with de-identified exact timestamps, making it suitable for analyzing long-term healthcare utilization, population behavior, and policy impacts. With comprehensive metadata and rigorous validation, this resource supports a wide range of applications in longitudinal health system research and data-driven modeling.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RadRepro CBCT: An Open-Access CBCT Phantom Dataset for Improved Standardization and Reproducibility of Radiomics Research. RadRepro CBCT：一个开放获取的CBCT幻影数据集，用于提高放射组学研究的标准化和可重复性。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06781-8

Sepideh Hatamikia, Elisabeth Steiner, Eashrat Jahan Muniya, Soraya Elmirad, Arezoo Borji, Gernot Kronreif, Wolfgang Birkfellner, Martin Buschmann

Radiomics, the extraction of quantitative features from medical images, has shown great potential in improving precision diagnosis, prognosis, and treatment planning. However, the reproducibility of radiomics features remains a major challenge due to the variability introduced by differences in imaging devices, acquisition protocols, and image reconstruction methods. This study introduces the first open-access cone-beam computed tomography (CBCT) phantom dataset specifically designed to test reproducibility in on-board imaging systems used in C-arm linear accelerators for radiotherapy. Using a widely recognized Catphan phantom, CBCT images were acquired from multiple devices across different imaging parameters, including variations in mAs, slice thickness, and reconstruction filters. The dataset includes 120 CBCT volumes with corresponding region of interest (ROI) segmentations and radiomics features enabling comprehensive testing of radiomics feature stability across intra- and inter-vendor comparisons. By providing this open-access dataset, the study aims to facilitate the standardization of CBCT radiomics research, improve feature reproducibility, and support the development of robust radiomics models for clinical applications.

放射组学从医学图像中提取定量特征，在提高精确诊断、预后和治疗计划方面显示出巨大的潜力。然而，由于成像设备、采集协议和图像重建方法的差异，放射组学特征的再现性仍然是一个主要挑战。本研究介绍了首个开放获取锥形束计算机断层扫描（CBCT）虚拟数据集，该数据集专门设计用于测试用于放射治疗的c臂直线加速器的车载成像系统的可重复性。使用广泛认可的Catphan幻像，从不同成像参数的多个设备获得CBCT图像，包括mAs，切片厚度和重建滤波器的变化。该数据集包括120个CBCT卷，具有相应的感兴趣区域（ROI）分割和放射组学特征，能够在供应商内部和内部比较中全面测试放射组学特征的稳定性。通过提供这个开放获取的数据集，该研究旨在促进CBCT放射组学研究的标准化，提高特征的可重复性，并支持临床应用健壮的放射组学模型的开发。

{"title":"RadRepro CBCT: An Open-Access CBCT Phantom Dataset for Improved Standardization and Reproducibility of Radiomics Research.","authors":"Sepideh Hatamikia, Elisabeth Steiner, Eashrat Jahan Muniya, Soraya Elmirad, Arezoo Borji, Gernot Kronreif, Wolfgang Birkfellner, Martin Buschmann","doi":"10.1038/s41597-026-06781-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06781-8","url":null,"abstract":"Radiomics, the extraction of quantitative features from medical images, has shown great potential in improving precision diagnosis, prognosis, and treatment planning. However, the reproducibility of radiomics features remains a major challenge due to the variability introduced by differences in imaging devices, acquisition protocols, and image reconstruction methods. This study introduces the first open-access cone-beam computed tomography (CBCT) phantom dataset specifically designed to test reproducibility in on-board imaging systems used in C-arm linear accelerators for radiotherapy. Using a widely recognized Catphan phantom, CBCT images were acquired from multiple devices across different imaging parameters, including variations in mAs, slice thickness, and reconstruction filters. The dataset includes 120 CBCT volumes with corresponding region of interest (ROI) segmentations and radiomics features enabling comprehensive testing of radiomics feature stability across intra- and inter-vendor comparisons. By providing this open-access dataset, the study aims to facilitate the standardization of CBCT radiomics research, improve feature reproducibility, and support the development of robust radiomics models for clinical applications.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A telomere-to-telomere genome assembly of Castanopsis orthacantha (Fagaceae). 北棘栲（壳斗科）的端粒到端粒基因组组装。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06787-2

Si Yin, Haibo Wang, Honglong Chu, Yanan Zhang, Changxin Luo, Yanguo Xu, Yong Gao

This study reported the first telomere-to-telomere (T2T) genome assembly of Castanopsis orthacantha, a keystone tree species with significant ecological and economic values endemic to the subtropical evergreen forests of southwestern China. Using multi-platform sequencing data and high-throughput chromosome conformation capture (Hi-C) scaffolding, we successfully generated a chromosome-scale assembly. The final assembly spanned 893.28 Mb, with a contig N50 of 76.19 Mb, indicating a high degree of continuity. Remarkably, 97.94% of the genome was successfully anchored to 12 chromosomes. Terminal telomeric repeat sequences were identified at both ends of all of the chromosomes, and the assembly contained only a single unresolved gap. A total of 35,978 protein-coding genes were detected in the assembly, with an average coding sequence (CDS) length of 1,116.3 bp. Genomic analysis further revealed that repetitive elements comprised 59.28% of the genome. The generation of this near-complete reference genome of C. orthacantha provides a critical genomic resource for advancing evolutionary study within the Fagaceae family and supports conservation genomics strategies aimed at the ecological restoration of this species.

本研究报道了中国西南亚热带常绿森林特有的具有重要生态和经济价值的重要树种——东北栲（Castanopsis orthacantha）的第一个端粒到端粒基因组组装。利用多平台测序数据和高通量染色体构象捕获（Hi-C）脚手架，我们成功地生成了染色体尺度的组装。最终组装全长893.28 Mb，连续N50为76.19 Mb，显示了高度的连续性。值得注意的是，97.94%的基因组成功地固定在12条染色体上。终端端粒重复序列在所有染色体的两端被鉴定，组装只包含一个未解决的间隙。共检测到35978个蛋白编码基因，平均编码序列（CDS）长度为116.3 bp。基因组分析进一步表明，重复元件占基因组的59.28%。这一接近完整的参考基因组的生成为推进壳斗科植物的进化研究提供了重要的基因组资源，并为该物种的生态恢复提供了保护基因组策略。

{"title":"A telomere-to-telomere genome assembly of Castanopsis orthacantha (Fagaceae).","authors":"Si Yin, Haibo Wang, Honglong Chu, Yanan Zhang, Changxin Luo, Yanguo Xu, Yong Gao","doi":"10.1038/s41597-026-06787-2","DOIUrl":"https://doi.org/10.1038/s41597-026-06787-2","url":null,"abstract":"This study reported the first telomere-to-telomere (T2T) genome assembly of Castanopsis orthacantha, a keystone tree species with significant ecological and economic values endemic to the subtropical evergreen forests of southwestern China. Using multi-platform sequencing data and high-throughput chromosome conformation capture (Hi-C) scaffolding, we successfully generated a chromosome-scale assembly. The final assembly spanned 893.28 Mb, with a contig N50 of 76.19 Mb, indicating a high degree of continuity. Remarkably, 97.94% of the genome was successfully anchored to 12 chromosomes. Terminal telomeric repeat sequences were identified at both ends of all of the chromosomes, and the assembly contained only a single unresolved gap. A total of 35,978 protein-coding genes were detected in the assembly, with an average coding sequence (CDS) length of 1,116.3 bp. Genomic analysis further revealed that repetitive elements comprised 59.28% of the genome. The generation of this near-complete reference genome of C. orthacantha provides a critical genomic resource for advancing evolutionary study within the Fagaceae family and supports conservation genomics strategies aimed at the ecological restoration of this species.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Water Masses of the Arctic from 40 Years of Hydrographic Observations. 从40年的水文观测看北极的水团。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06749-8

Kate Oglethorpe, Joshua Lanham, Rafael S Reiss, Emma J D Boland, Alberto C Naveira Garabato, Colm-Cille P Caulfield, Ali Mashayek

The Arctic Ocean has been changing rapidly in a warming climate. To monitor these changes, it is useful to classify the Arctic Ocean into water masses-bodies of water with similar origin and physical and biogeochemical properties. However, there are significant barriers to Arctic water mass classification: observations of seawater properties are sparse, and traditional classification relies on extensive knowledge of water mass characteristics and circulation. To address these challenges, we compile existing hydrographic observations of the upper 1000 m of the Arctic Ocean and classify these observations into water masses. We present the classification tool and accompanying dataset, Water Masses of the Arctic (WMA), to support basin-wide investigations of Arctic Ocean circulation, its variability, drivers and impacts on wider Arctic climate. Our dataset reproduces key spatial and temporal features of Arctic water masses, including Atlantic and Pacific Water pathways. The WMA dataset will improve understanding of Arctic Ocean dynamics and provide an accessible framework for assessing the accuracy of the representation of the Arctic Ocean in Earth System Models.

在气候变暖的情况下，北冰洋一直在迅速变化。为了监测这些变化，将北冰洋划分为水团是有用的，水团是具有相似起源和物理和生物地球化学性质的水体。然而，北极水团分类存在重大障碍：对海水性质的观测很少，传统的分类依赖于对水团特征和环流的广泛了解。为了应对这些挑战，我们汇编了现有的北冰洋1000米以上的水文观测资料，并将这些观测资料分类为水团。我们提出了分类工具和附带的数据集——北极水团（WMA），以支持对北冰洋环流、其变率、驱动因素及其对更广泛的北极气候的影响的全流域调查。我们的数据集再现了北极水团的关键时空特征，包括大西洋和太平洋的水通道。WMA数据集将提高对北冰洋动力学的理解，并为评估地球系统模式中北冰洋表示的准确性提供一个可访问的框架。

{"title":"Water Masses of the Arctic from 40 Years of Hydrographic Observations.","authors":"Kate Oglethorpe, Joshua Lanham, Rafael S Reiss, Emma J D Boland, Alberto C Naveira Garabato, Colm-Cille P Caulfield, Ali Mashayek","doi":"10.1038/s41597-026-06749-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06749-8","url":null,"abstract":"The Arctic Ocean has been changing rapidly in a warming climate. To monitor these changes, it is useful to classify the Arctic Ocean into water masses-bodies of water with similar origin and physical and biogeochemical properties. However, there are significant barriers to Arctic water mass classification: observations of seawater properties are sparse, and traditional classification relies on extensive knowledge of water mass characteristics and circulation. To address these challenges, we compile existing hydrographic observations of the upper 1000 m of the Arctic Ocean and classify these observations into water masses. We present the classification tool and accompanying dataset, Water Masses of the Arctic (WMA), to support basin-wide investigations of Arctic Ocean circulation, its variability, drivers and impacts on wider Arctic climate. Our dataset reproduces key spatial and temporal features of Arctic water masses, including Atlantic and Pacific Water pathways. The WMA dataset will improve understanding of Arctic Ocean dynamics and provide an accessible framework for assessing the accuracy of the representation of the Arctic Ocean in Earth System Models.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146198052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comprehensive compilation and quality assessment of street-level urban air temperature measurements across European networks. 全面汇编和质量评估整个欧洲网络的街道水平的城市空气温度测量。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06804-4

Setareh Amini, Adrian Huerta, Jörg Franke, Yuri Brugnara, Steven Caluwaerts, Julien Anet, Stevan Savić, Moritz Gubler, Gert-Jan Steeneveld, Lee Chapman, Fred Meier, Vincent Dubreuil, Andreas Christen, Matthias Zeeman, Branislava Lalić, Sebastian Schlögl, Jukka Käyhkö, AmirMasoud Azadfar, Stefan Brönnimann

This study provides a comprehensive dataset (FAIRUrbTemp) that addresses the lack of high-resolution urban air temperature data across Europe. It compiles sub-hourly street-level air temperature data from 811 low-cost to commercial sensors across several European cities and offers data in a quality-controlled, standardized format in sub-hourly, hourly, and daily resolutions. In addition, detailed metadata, as an important source of information in urban studies, is provided at network, station, and measurement levels. This pan-European dataset is rigorously quality-controlled using a serially automatic method applicable to diverse city-scale air temperature data, which identifies systematic and minor inconsistencies to enhance reliability. Expert-based validation shows that the QC reliably identifies problematic measurements, while its performance varies across urban and climatic settings due to local environmental and instrumental effects. To ensure transparency, the results of the quality control are provided to the user together with the original value in the dataset. The validated FAIRUrbTemp is a valuable resource for urban climate studies, with direct applications in validating microclimate models, assessing heat-health risks, and informing climate-adaptive urban planning.

这项研究提供了一个全面的数据集（FAIRUrbTemp），解决了整个欧洲缺乏高分辨率城市气温数据的问题。它汇集了来自欧洲几个城市的811个低成本到商业传感器的亚小时街道气温数据，并以质量控制的标准格式提供亚小时、每小时和每日分辨率的数据。此外，详细元数据作为城市研究的重要信息来源，在网络、站点和测量层面提供。该泛欧洲数据集采用适用于不同城市尺度气温数据的连续自动方法进行严格的质量控制，该方法可识别系统和轻微的不一致，以提高可靠性。基于专家的验证表明，QC可靠地识别出有问题的测量，而由于当地环境和仪器的影响，其性能在城市和气候环境中有所不同。为了确保透明度，质量控制的结果与数据集中的原始值一起提供给用户。经过验证的FAIRUrbTemp是城市气候研究的宝贵资源，可直接应用于验证微气候模型，评估热健康风险，并为气候适应性城市规划提供信息。

{"title":"Comprehensive compilation and quality assessment of street-level urban air temperature measurements across European networks.","authors":"Setareh Amini, Adrian Huerta, Jörg Franke, Yuri Brugnara, Steven Caluwaerts, Julien Anet, Stevan Savić, Moritz Gubler, Gert-Jan Steeneveld, Lee Chapman, Fred Meier, Vincent Dubreuil, Andreas Christen, Matthias Zeeman, Branislava Lalić, Sebastian Schlögl, Jukka Käyhkö, AmirMasoud Azadfar, Stefan Brönnimann","doi":"10.1038/s41597-026-06804-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06804-4","url":null,"abstract":"This study provides a comprehensive dataset (FAIRUrbTemp) that addresses the lack of high-resolution urban air temperature data across Europe. It compiles sub-hourly street-level air temperature data from 811 low-cost to commercial sensors across several European cities and offers data in a quality-controlled, standardized format in sub-hourly, hourly, and daily resolutions. In addition, detailed metadata, as an important source of information in urban studies, is provided at network, station, and measurement levels. This pan-European dataset is rigorously quality-controlled using a serially automatic method applicable to diverse city-scale air temperature data, which identifies systematic and minor inconsistencies to enhance reliability. Expert-based validation shows that the QC reliably identifies problematic measurements, while its performance varies across urban and climatic settings due to local environmental and instrumental effects. To ensure transparency, the results of the quality control are provided to the user together with the original value in the dataset. The validated FAIRUrbTemp is a valuable resource for urban climate studies, with direct applications in validating microclimate models, assessing heat-health risks, and informing climate-adaptive urban planning.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Near telomere-to-telomere diploid genome assembly of Acrossocheilus wenchowensis. 温氏跨猴近端粒-端粒二倍体基因组组装。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06752-z

Lingzhan Xue, Mingkun Luo, Haoyu Wang, Wenbin Zhu, Duhuang Chen, Gaoxiong Zeng, Mengxiang Liao, Ji Zhao, Bin Wu, Luohao Xu, Zaijie Dong

Acrossocheilus wenchowensis is a lukewarm-water fish found in southern Chinese mountain streams, valued for both ornamental and edible purposes. We assembled a near telomere-to-telomere (T2T) genome using HiFi, ONT, Hi-C and Illumina data. The assembly is approximately 870.69 Mb with a contig N50 of about 21.28 Mb. Among these, 14 chromosomes in Hap1 and 15 chromosomes in Hap2 have reached T2T levels. A total of 24,909 protein-coding genes were predicted in Hap1 and 24,496 in Hap2, with BUSCO scores of 97.4% and 97.6%, respectively. A conserved centromeric satellite sequence (262 bp) derived from an LTR transposon was identified. Comparative genomics showed that Acrossocheilus and Onychostoma diverged approximately 13.7 million years ago (Mya), while A. wenchowensis diverged from A. fasciatus about 5.25 Mya. Resequencing of four geographic populations of A. wenchowensis revealed distinct genetic structure in the LY group compared to the other populations based on SNP and InDel analysis. This genome provides a framework for diploid T2T studies in fish and supports further functional genomics research.

wenchowensis是一种发现于中国南方山间溪流中的温水鱼类，具有观赏和食用价值。我们使用HiFi， ONT， Hi-C和Illumina数据组装了近端粒到端粒（T2T）基因组。该组装体约为870.69 Mb， N50约为21.28 Mb。其中，Hap1中的14条染色体和Hap2中的15条染色体达到T2T水平。Hap1和Hap2共预测24,909个蛋白编码基因，BUSCO评分分别为97.4%和97.6%。从LTR转座子中鉴定出一个保守的着丝粒卫星序列（262 bp）。比较基因组学表明，跨颌猿人与甲口猿人大约在1370万年前（Mya）分化，而温氏猿人大约在5.25万年前与fasciatus猿人分化。基于SNP和InDel分析的温氏古猿4个地理居群重测序结果显示，与其他居群相比，LY组的遗传结构明显不同。该基因组为鱼类二倍体T2T研究提供了一个框架，并支持进一步的功能基因组学研究。

{"title":"Near telomere-to-telomere diploid genome assembly of Acrossocheilus wenchowensis.","authors":"Lingzhan Xue, Mingkun Luo, Haoyu Wang, Wenbin Zhu, Duhuang Chen, Gaoxiong Zeng, Mengxiang Liao, Ji Zhao, Bin Wu, Luohao Xu, Zaijie Dong","doi":"10.1038/s41597-026-06752-z","DOIUrl":"https://doi.org/10.1038/s41597-026-06752-z","url":null,"abstract":"Acrossocheilus wenchowensis is a lukewarm-water fish found in southern Chinese mountain streams, valued for both ornamental and edible purposes. We assembled a near telomere-to-telomere (T2T) genome using HiFi, ONT, Hi-C and Illumina data. The assembly is approximately 870.69 Mb with a contig N50 of about 21.28 Mb. Among these, 14 chromosomes in Hap1 and 15 chromosomes in Hap2 have reached T2T levels. A total of 24,909 protein-coding genes were predicted in Hap1 and 24,496 in Hap2, with BUSCO scores of 97.4% and 97.6%, respectively. A conserved centromeric satellite sequence (262 bp) derived from an LTR transposon was identified. Comparative genomics showed that Acrossocheilus and Onychostoma diverged approximately 13.7 million years ago (Mya), while A. wenchowensis diverged from A. fasciatus about 5.25 Mya. Resequencing of four geographic populations of A. wenchowensis revealed distinct genetic structure in the LY group compared to the other populations based on SNP and InDel analysis. This genome provides a framework for diploid T2T studies in fish and supports further functional genomics research.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146195460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Defect Dataset for Electrode Coating Manufacturing. 电极涂层制造缺陷数据集。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-025-06419-1

Vignesh Sampath, Andrew S Lee, Samuel David Miller, Noah H Paulson, Yuepeng Zhang, Logan Ward

Electrode is a key component of many energy storage and energy conversion devices such as batteries and fuel cells. Defects in electrodes can significantly influence device performance and reliability and thus need to be monitored and eliminated during the electrode manufacturing process. Advancements in in-line metrology, computer vision, and machine learning have enabled the development of integrated hardware-software systems for automated defect detection and diagnostics. While several manufacturing domains have published defect datasets to support such efforts, publicly available datasets specific to electrode coating processes are not available. To fill this gap and support research on defect detection for automated coating processes, we present CoatingVision, a comprehensive dataset of slot-die coating images with labeled defect types. This dataset supports a diverse range of image recognition tasks, including defect segmentation, defect detection, and multi-label classification. It includes high-resolution images with associated labels for common defects such as surface cracks, delamination cracks, pinholes, and unclassified defects. To facilitate benchmarking and reproducible research, CoatingVision is packaged with an open-source codebase that enables comparative evaluation of AI models and hyperparameter configurations. The dataset has been meticulously curated to ensure high quality and consistency, providing researchers with reliable data for training and evaluating computer vision models. With over 2,200 image samples under various processing conditions, CoatingVision offers a robust foundation for developing automated defect detection systems. It promotes deeper insights into defect formation in coating manufacturing processes, which can be used to advance various coating-related applications including batteries and fuel cells.

电极是许多能量存储和能量转换装置（如电池和燃料电池）的关键部件。电极缺陷会严重影响器件的性能和可靠性，因此需要在电极制造过程中进行监测和消除。在线计量学、计算机视觉和机器学习的进步使得集成硬件软件系统的开发成为可能，用于自动缺陷检测和诊断。虽然一些制造领域已经发布了缺陷数据集来支持这种努力，但公开可用的特定于电极涂层工艺的数据集是不可用的。为了填补这一空白并支持自动化涂层过程缺陷检测的研究，我们提出了CoatingVision，这是一个综合的带有标记缺陷类型的槽模涂层图像数据集。该数据集支持多种图像识别任务，包括缺陷分割、缺陷检测和多标签分类。它包括高分辨率图像，并带有常见缺陷的相关标签，如表面裂纹、分层裂纹、针孔和未分类缺陷。为了便于基准测试和可重复的研究，CoatingVision打包了一个开源代码库，可以对人工智能模型和超参数配置进行比较评估。该数据集经过精心策划，以确保高质量和一致性，为研究人员提供训练和评估计算机视觉模型的可靠数据。CoatingVision拥有2200多个不同处理条件下的图像样本，为开发自动化缺陷检测系统提供了坚实的基础。它促进了对涂层制造过程中缺陷形成的更深入的了解，可用于推进各种涂层相关应用，包括电池和燃料电池。

{"title":"A Defect Dataset for Electrode Coating Manufacturing.","authors":"Vignesh Sampath, Andrew S Lee, Samuel David Miller, Noah H Paulson, Yuepeng Zhang, Logan Ward","doi":"10.1038/s41597-025-06419-1","DOIUrl":"https://doi.org/10.1038/s41597-025-06419-1","url":null,"abstract":"Electrode is a key component of many energy storage and energy conversion devices such as batteries and fuel cells. Defects in electrodes can significantly influence device performance and reliability and thus need to be monitored and eliminated during the electrode manufacturing process. Advancements in in-line metrology, computer vision, and machine learning have enabled the development of integrated hardware-software systems for automated defect detection and diagnostics. While several manufacturing domains have published defect datasets to support such efforts, publicly available datasets specific to electrode coating processes are not available. To fill this gap and support research on defect detection for automated coating processes, we present CoatingVision, a comprehensive dataset of slot-die coating images with labeled defect types. This dataset supports a diverse range of image recognition tasks, including defect segmentation, defect detection, and multi-label classification. It includes high-resolution images with associated labels for common defects such as surface cracks, delamination cracks, pinholes, and unclassified defects. To facilitate benchmarking and reproducible research, CoatingVision is packaged with an open-source codebase that enables comparative evaluation of AI models and hyperparameter configurations. The dataset has been meticulously curated to ensure high quality and consistency, providing researchers with reliable data for training and evaluating computer vision models. With over 2,200 image samples under various processing conditions, CoatingVision offers a robust foundation for developing automated defect detection systems. It promotes deeper insights into defect formation in coating manufacturing processes, which can be used to advance various coating-related applications including batteries and fuel cells.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Global dataset on heat wave exposure due to the urban heat island effect. 基于城市热岛效应的热浪暴露全球数据集。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06877-1

Wenbo Yu, Jun Yang, Yuyu Zhou, Xiangming Xiao

Continuing global warming and urbanization have increased the frequency and severity of extreme heat events in cities. Therefore, understanding how the urban heat island (UHI) effect influences cities is essential for developing effective mitigation and prevention strategies. A 1-km resolution dataset was constructed to assess heat-wave exposure attributable to UHIs in urban human settlements worldwide from 2003 to 2020. An adaptive urban-rural threshold method was employed to delineate the spatial extent of UHI impacts, and a spatiotemporally fitted MODIS surface temperature dataset was used to address missing data caused by cloud contamination. This dataset explicitly separates the contributions of background climate, local landscape characteristics, and urbanization to heat wave exposure, providing a scientific basis for identifying key UHI mitigation areas and developing heat wave risk early warning models that account for UHI effects. The proposed methodology and dataset support synergistic decision-making for integrating urban climate adaptation with sustainable development, and the technical framework can be extended to studies of UHIs and heat wave exposure in other regions worldwide.

持续的全球变暖和城市化增加了城市极端高温事件的频率和严重程度。因此，了解城市热岛效应如何影响城市对于制定有效的缓解和预防战略至关重要。构建了一个分辨率为1 km的数据集，评估了2003 - 2020年全球城市人类住区由UHIs引起的热浪暴露。采用城乡自适应阈值法描述城市热岛影响的空间范围，利用时空拟合的MODIS地表温度数据集解决云污染造成的数据缺失问题。该数据集明确分离了背景气候、当地景观特征和城市化对热浪暴露的贡献，为确定关键的热岛缓解区和开发考虑热岛效应的热浪风险预警模型提供了科学依据。所提出的方法和数据集支持协同决策，将城市气候适应与可持续发展相结合，技术框架可扩展到全球其他地区的UHIs和热浪暴露研究。

{"title":"Global dataset on heat wave exposure due to the urban heat island effect.","authors":"Wenbo Yu, Jun Yang, Yuyu Zhou, Xiangming Xiao","doi":"10.1038/s41597-026-06877-1","DOIUrl":"https://doi.org/10.1038/s41597-026-06877-1","url":null,"abstract":"Continuing global warming and urbanization have increased the frequency and severity of extreme heat events in cities. Therefore, understanding how the urban heat island (UHI) effect influences cities is essential for developing effective mitigation and prevention strategies. A 1-km resolution dataset was constructed to assess heat-wave exposure attributable to UHIs in urban human settlements worldwide from 2003 to 2020. An adaptive urban-rural threshold method was employed to delineate the spatial extent of UHI impacts, and a spatiotemporally fitted MODIS surface temperature dataset was used to address missing data caused by cloud contamination. This dataset explicitly separates the contributions of background climate, local landscape characteristics, and urbanization to heat wave exposure, providing a scientific basis for identifying key UHI mitigation areas and developing heat wave risk early warning models that account for UHI effects. The proposed methodology and dataset support synergistic decision-making for integrating urban climate adaptation with sustainable development, and the technical framework can be extended to studies of UHIs and heat wave exposure in other regions worldwide.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0