Scientific Data最新文献_第5页

An integrated database of combustion properties of metallic materials. 金属材料燃烧特性的综合数据库。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-16 DOI: 10.1038/s41597-026-06862-8

Penglin Wang, Huibin Ke, Yunfei Xue

Metal combustion, which is fundamentally a rapid exothermic redox reaction with oxygen, governs critical applications from aerospace propulsion to structural fire safety. Understanding key combustion metrics including combustion enthalpy, ignition temperature, ignition delay time, combustion rate, and threshold pressure is essential for designing fire-resistant alloys or high-energy propellants. This work establishes a comprehensive database of 725 curated data points extracted from 45 publications, mainly encompassing pure metals, Al-based, Ti-based, Mg-based, Fe-based alloys and multi-component alloys. Each data entry integrates combustion metrics with alloy composition and critical experimental metadata, such as sample geometry, oxygen partial pressure and test method. By integrating scattered literature data into a unified framework with standardized parameters, this work provides a foundation for data-driven discovery of next-generation materials with tailored combustion performance.

金属燃烧本质上是一种与氧的快速放热氧化还原反应，从航空航天推进到结构消防安全等关键应用都受到金属燃烧的影响。了解关键的燃烧指标，包括燃烧焓、点火温度、点火延迟时间、燃烧速率和阈值压力，对于设计耐火合金或高能推进剂至关重要。这项工作建立了一个综合数据库，从45份出版物中提取了725个精心策划的数据点，主要包括纯金属、铝基、钛基、镁基、铁基合金和多组分合金。每个数据条目都集成了燃烧指标、合金成分和关键实验元数据，如样品几何形状、氧分压和测试方法。通过将分散的文献数据整合到具有标准化参数的统一框架中，本工作为数据驱动的下一代材料的定制燃烧性能发现奠定了基础。

引用次数: 0

A dataset of measured machining deviations of compressor rotor blades. 压气机转子叶片加工偏差测量数据集。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-16 DOI: 10.1038/s41597-026-06846-8

Limin Gao, Yue Dan, Haohao Wang, Ruiyu Li, Guang Yang

The uncertainty in blade machining deviations leads to the offset in the average performance and the performance scatter of aero-engine compressors, posing a threat to the safe operation of engines. Therefore, quantifying the uncertainty effects of machining deviations is critically important. However, due to factors such as prolonged inspection cycles and high costs, geometric data on blade machining deviations remain scarce. Most uncertainty quantification analyses are conducted under assumed statistical distributions of deviations, making it difficult to guarantee the accuracy of the quantification. In this paper, a dataset of measured machining deviations of 100 compressor rotor blades is presented. And it includes 7 types of machining deviations of 13 blade sections from blade root to tip. The work fills a critical gap in available geometric deviation data for compressor rotor blades, and provides a reliable foundation for subsequent uncertainty quantification investigation.

由于叶片加工偏差的不确定性，导致航空发动机压气机的平均性能和性能偏差产生偏移，对发动机的安全运行构成威胁。因此，量化加工偏差的不确定性影响是至关重要的。然而，由于检测周期长、成本高等因素，关于叶片加工偏差的几何数据仍然很少。大多数不确定度量化分析是在假设偏差的统计分布下进行的，难以保证量化的准确性。本文给出了100个压气机转子叶片加工偏差测量数据集。从叶根到叶尖共包含13个叶片截面的7种加工偏差。该工作填补了现有压气机转子叶片几何偏差数据的重要空白，为后续的不确定度量化研究提供了可靠的基础。

引用次数: 0

A comprehensive, multi-method dataset of plant-frugivore interactions in a Mediterranean hotspot. 地中海热点地区植物-水果相互作用的综合多方法数据集。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-16 DOI: 10.1038/s41597-026-06835-x

Eva Moracho, Juan Miguel Arroyo, Blanca Arroyo-Correa, Gemma Calvo, Pablo Homet, Jorge Isla, Miguel Jácome-Flores, Irene Mendoza, Elena Quintero, Francisco Rodríguez-Sánchez, Pablo Villalva, Pedro Jordano

Mutualistic plant-animal interactions for seed dispersal are crucial for vegetation dynamics, benefiting over half of the world's plant species. Beyond the tropics, the Mediterranean biome harbors the highest proportion of species adapted to endozoochory, yet major gaps remain in quantifying interaction diversity in these biodiversity-rich areas and their links to ecosystem functioning. High-resolution, quantitative interaction data are essential not only to fill these gaps but also to enable large-scale ecological modeling of species interactions across biomes. Here, we present the FRUGivory INTegration (FRUGINT) dataset - an extensive collection of quantitative frugivory interactions and associated species traits from a Mediterranean biodiversity hotspot in southwestern Spain. By integrating six complementary sampling methods (camera trapping, continuous-monitoring cameras, DNA-barcoding, mist-netting, direct observation and track records) across multiple years, the dataset overcomes limitations of sampling biases, variable effort and spatio-temporal heterogeneity, providing a comprehensive picture of plant-frugivore interactions across the region. Based on a total of 37,923 interaction records and 481 unique pairwise interactions, involving 26 fleshy-fruited plant species present in Doñana and 78 frugivorous vertebrate species, FRUGINT yields estimates of regional-scale plant-frugivore networks based on pairwise interaction probabilities. The dataset encompasses both common and numerous rare interactions, offering a valuable resource for advancing research on plant-animal interactions, network ecology, and biodiversity conservation.

植物-动物相互作用对种子传播至关重要，使世界上一半以上的植物物种受益。除热带地区外，地中海生物群系拥有适应生态环境的物种比例最高，但在量化这些生物多样性丰富地区的相互作用多样性及其与生态系统功能的联系方面仍存在重大差距。高分辨率、定量的相互作用数据不仅对填补这些空白至关重要，而且对跨生物群系的物种相互作用的大规模生态建模也至关重要。在这里，我们展示了FRUGivory INTegration （FRUGINT）数据集-来自西班牙西南部地中海生物多样性热点的定量FRUGivory相互作用和相关物种特征的广泛收集。通过整合六种互补的采样方法（摄像机捕获、连续监测摄像机、dna条形码、雾网、直接观察和跟踪记录），该数据集克服了采样偏差、可变努力和时空异质性的限制，提供了整个地区植物-果食动物相互作用的全面画面。FRUGINT基于共37,923条相互作用记录和481个独特的成对相互作用，涉及Doñana中存在的26种肉果植物物种和78种果食性脊椎动物物种，基于成对相互作用概率估算区域尺度植物-果食性网络。该数据集既包括常见的相互作用，也包括许多罕见的相互作用，为推进植物-动物相互作用、网络生态学和生物多样性保护的研究提供了宝贵的资源。

{"title":"A comprehensive, multi-method dataset of plant-frugivore interactions in a Mediterranean hotspot.","authors":"Eva Moracho, Juan Miguel Arroyo, Blanca Arroyo-Correa, Gemma Calvo, Pablo Homet, Jorge Isla, Miguel Jácome-Flores, Irene Mendoza, Elena Quintero, Francisco Rodríguez-Sánchez, Pablo Villalva, Pedro Jordano","doi":"10.1038/s41597-026-06835-x","DOIUrl":"https://doi.org/10.1038/s41597-026-06835-x","url":null,"abstract":"Mutualistic plant-animal interactions for seed dispersal are crucial for vegetation dynamics, benefiting over half of the world's plant species. Beyond the tropics, the Mediterranean biome harbors the highest proportion of species adapted to endozoochory, yet major gaps remain in quantifying interaction diversity in these biodiversity-rich areas and their links to ecosystem functioning. High-resolution, quantitative interaction data are essential not only to fill these gaps but also to enable large-scale ecological modeling of species interactions across biomes. Here, we present the FRUGivory INTegration (FRUGINT) dataset - an extensive collection of quantitative frugivory interactions and associated species traits from a Mediterranean biodiversity hotspot in southwestern Spain. By integrating six complementary sampling methods (camera trapping, continuous-monitoring cameras, DNA-barcoding, mist-netting, direct observation and track records) across multiple years, the dataset overcomes limitations of sampling biases, variable effort and spatio-temporal heterogeneity, providing a comprehensive picture of plant-frugivore interactions across the region. Based on a total of 37,923 interaction records and 481 unique pairwise interactions, involving 26 fleshy-fruited plant species present in Doñana and 78 frugivorous vertebrate species, FRUGINT yields estimates of regional-scale plant-frugivore networks based on pairwise interaction probabilities. The dataset encompasses both common and numerous rare interactions, offering a valuable resource for advancing research on plant-animal interactions, network ecology, and biodiversity conservation.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146207585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multisession fNIRS-EEG data of Post-Stroke Motor Recovery. Recordings During Intact and Paretic Hand Movements. 脑卒中后运动恢复的多时段fNIRS-EEG数据。完整和麻痹时的手部运动记录。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06803-5

Alexandra Medvedeva, Nikolay Syrov, Lev Yakovlev, Yana Alieva, Artemiy Berkmush-Antipova, Galina Ivanova, Natalia Shusharina, Alexander Kaplan

Accurate diagnosis and monitoring of recovery after stroke are critical for effective motor rehabilitation. As stroke is inherently associated with impaired cerebral blood flow, functional near-infrared spectroscopy (fNIRS) provides a valuable tool for assessing hemodynamic changes in the brain. When combined with electroencephalography (EEG), this multimodal approach can provide complementary insights into neural and vascular responses during recovery. However, longitudinal datasets combining fNIRS and EEG in stroke populations remain limited. The current article presents an open access dataset with simultaneous fNIRS and EEG recordings from 16 post-stroke patients over 84 rehabilitation sessions. Participants performed motor tasks with both paretic and intact hands. The dataset includes raw and processed signals, clinical scores (ARAT, Fugl-Meyer) and patient demographics. This resource supports research into stroke recovery, development of neurorehabilitation strategies and fNIRS-based brain computer interfaces (BCI).

准确的诊断和监测中风后的恢复是有效的运动康复的关键。由于中风与脑血流受损有内在联系，功能性近红外光谱（fNIRS）为评估脑血流动力学变化提供了一种有价值的工具。当与脑电图（EEG）相结合时，这种多模式方法可以提供对恢复期间神经和血管反应的补充见解。然而，在脑卒中人群中，结合fNIRS和EEG的纵向数据集仍然有限。本文提供了一个开放获取的数据集，其中包括16名中风后患者在84次康复期间的fNIRS和EEG记录。参与者用麻痹的手和完好的手完成运动任务。数据集包括原始和处理过的信号、临床评分（ARAT, Fugl-Meyer）和患者人口统计数据。该资源支持中风恢复研究、神经康复策略开发和基于fnir的脑机接口（BCI）。

{"title":"Multisession fNIRS-EEG data of Post-Stroke Motor Recovery. Recordings During Intact and Paretic Hand Movements.","authors":"Alexandra Medvedeva, Nikolay Syrov, Lev Yakovlev, Yana Alieva, Artemiy Berkmush-Antipova, Galina Ivanova, Natalia Shusharina, Alexander Kaplan","doi":"10.1038/s41597-026-06803-5","DOIUrl":"https://doi.org/10.1038/s41597-026-06803-5","url":null,"abstract":"Accurate diagnosis and monitoring of recovery after stroke are critical for effective motor rehabilitation. As stroke is inherently associated with impaired cerebral blood flow, functional near-infrared spectroscopy (fNIRS) provides a valuable tool for assessing hemodynamic changes in the brain. When combined with electroencephalography (EEG), this multimodal approach can provide complementary insights into neural and vascular responses during recovery. However, longitudinal datasets combining fNIRS and EEG in stroke populations remain limited. The current article presents an open access dataset with simultaneous fNIRS and EEG recordings from 16 post-stroke patients over 84 rehabilitation sessions. Participants performed motor tasks with both paretic and intact hands. The dataset includes raw and processed signals, clinical scores (ARAT, Fugl-Meyer) and patient demographics. This resource supports research into stroke recovery, development of neurorehabilitation strategies and fNIRS-based brain computer interfaces (BCI).","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146195430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Chromosome-scale genome of the burrowing sea anemone Paracondylactis sinensis. 穴居海葵（Paracondylactis sinensis）染色体尺度基因组。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06838-8

Junyuan Li, Rongye Tang, Juan Feng, Tinghui Xie, Sitong Liu, Yang Li

Paracondylactis sinensis is a burrowing sea anemone inhabiting soft sediments along the Chinese coast, representing an ecologically and economically important actiniarian species. Despite its unique adaptations to hypoxia and sediment-associated stressors, genomic resources for burrowing sea anemones have been lacking. Here, we report a high-quality, chromosome-level genome assembly of P. sinensis. With PacBio HiFi long reads (39.77 × coverage), Illumina short reads, and Hi-C data, a 210.63 Mb genome with a contig N50 of 8.70 Mb and a scaffold N50 of 9.41 Mb was generated. A total of 93.44% of the assembly was anchored to 19 pseudo-chromosomes. BUSCO analysis indicated 95.91% completeness, confirming high assembly quality. Comprehensive annotation identified 19,420 protein-coding genes, of which 91.35% were functionally annotated. Repetitive elements accounted for 26.43% of the genome, with transposable elements representing 20.47%. This genome provides a crucial reference for understanding the genetic basis of environmental adaptation in P. sinensis and supports future efforts in its conservation, aquaculture, and bioactive compound exploration.

中华副海葵（Paracondylactis sinensis）是一种生活在中国沿海软质沉积物中的穴居海葵，是一种重要的生态和经济活性海葵。尽管其独特的适应缺氧和沉积物相关的压力源，穴居海葵的基因组资源一直缺乏。在这里，我们报道了一个高质量的，染色体水平的中华p.p sinensis基因组组装。利用PacBio HiFi长reads （39.77 × coverage）、Illumina短reads和Hi-C数据，得到210.63 Mb的基因组，其中contig N50为8.70 Mb， scaffold N50为9.41 Mb。共93.44%的装配被锚定在19条假染色体上。BUSCO分析完备度为95.91%，证明装配质量高。综合注释鉴定蛋白质编码基因19420个，其中功能注释91.35%。重复元件占基因组的26.43%，转座元件占20.47%。该基因组为了解中华对虾环境适应的遗传基础提供了重要参考，并为今后在其保护、水产养殖和生物活性化合物探索方面的工作提供了支持。

{"title":"Chromosome-scale genome of the burrowing sea anemone Paracondylactis sinensis.","authors":"Junyuan Li, Rongye Tang, Juan Feng, Tinghui Xie, Sitong Liu, Yang Li","doi":"10.1038/s41597-026-06838-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06838-8","url":null,"abstract":"Paracondylactis sinensis is a burrowing sea anemone inhabiting soft sediments along the Chinese coast, representing an ecologically and economically important actiniarian species. Despite its unique adaptations to hypoxia and sediment-associated stressors, genomic resources for burrowing sea anemones have been lacking. Here, we report a high-quality, chromosome-level genome assembly of P. sinensis. With PacBio HiFi long reads (39.77 × coverage), Illumina short reads, and Hi-C data, a 210.63 Mb genome with a contig N50 of 8.70 Mb and a scaffold N50 of 9.41 Mb was generated. A total of 93.44% of the assembly was anchored to 19 pseudo-chromosomes. BUSCO analysis indicated 95.91% completeness, confirming high assembly quality. Comprehensive annotation identified 19,420 protein-coding genes, of which 91.35% were functionally annotated. Repetitive elements accounted for 26.43% of the genome, with transposable elements representing 20.47%. This genome provides a crucial reference for understanding the genetic basis of environmental adaptation in P. sinensis and supports future efforts in its conservation, aquaculture, and bioactive compound exploration.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Global Intensity-Duration-Frequency curves based on observed sub-daily rainfall (GSDR-IDF). 基于观测亚日降水的全球强度-持续时间-频率曲线。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06858-4

Amy C Green, Selma B Guerreiro, Hayley J Fowler

Short-duration extreme rainfall events can cause flash flooding and infrastructure failures, yet resources to assess these remain limited, particularly at the global scale. Heterogeneous data availability, inconsistent quality control, and methodological differences hinder the development of comparable intensity-duration-frequency (IDF) estimates. To address this gap, we present GSDR-IDF, a global dataset of intensity-duration-frequency curves derived from the largest quality-controlled sub-daily rain gauge dataset: the Global Sub-Daily Rainfall dataset (GSDR), comprising +24,000 hourly rain gauge records for all major climate regions. We apply robust extreme value analysis methods, including single-gauge and regional frequency approaches, to estimate return levels for 1-, 3-, 6- and 24-hour durations and for 10-, 30-, and 100-year return levels. These are then combined to give IDF curves for each rain gauge, providing an openly accessible, traceable, and reproducible resource for hydrological modelling, engineering design, flood-risk assessment and climate-resilience planning. This dataset represents a step change in accessibility and precision for global IDF estimation and enables a wide range of cross-disciplinary applications.

持续时间短的极端降雨事件可能导致山洪暴发和基础设施故障，但评估这些事件的资源仍然有限，尤其是在全球范围内。异构的数据可用性、不一致的质量控制和方法差异阻碍了可比较的强度-持续时间-频率（IDF）估计的发展。为了解决这一差距，我们提出了GSDR- idf，这是一个强度-持续时间-频率曲线的全球数据集，源自最大的质量控制亚日雨量数据集：全球亚日雨量数据集（GSDR），包括所有主要气候区的24000个小时雨量记录。我们采用稳健的极值分析方法，包括单测量和区域频率方法，来估计1年、3年、6年和24小时的回报水平，以及10年、30年和100年的回报水平。然后将这些数据结合起来，给出每个雨量计的IDF曲线，为水文建模、工程设计、洪水风险评估和气候适应性规划提供一个可公开获取、可追溯和可复制的资源。该数据集代表了全球IDF估计在可访问性和精度方面的一个步骤变化，并实现了广泛的跨学科应用。

{"title":"Global Intensity-Duration-Frequency curves based on observed sub-daily rainfall (GSDR-IDF).","authors":"Amy C Green, Selma B Guerreiro, Hayley J Fowler","doi":"10.1038/s41597-026-06858-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06858-4","url":null,"abstract":"Short-duration extreme rainfall events can cause flash flooding and infrastructure failures, yet resources to assess these remain limited, particularly at the global scale. Heterogeneous data availability, inconsistent quality control, and methodological differences hinder the development of comparable intensity-duration-frequency (IDF) estimates. To address this gap, we present GSDR-IDF, a global dataset of intensity-duration-frequency curves derived from the largest quality-controlled sub-daily rain gauge dataset: the Global Sub-Daily Rainfall dataset (GSDR), comprising +24,000 hourly rain gauge records for all major climate regions. We apply robust extreme value analysis methods, including single-gauge and regional frequency approaches, to estimate return levels for 1-, 3-, 6- and 24-hour durations and for 10-, 30-, and 100-year return levels. These are then combined to give IDF curves for each rain gauge, providing an openly accessible, traceable, and reproducible resource for hydrological modelling, engineering design, flood-risk assessment and climate-resilience planning. This dataset represents a step change in accessibility and precision for global IDF estimation and enables a wide range of cross-disciplinary applications.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146198032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CardioEHR: A longitudinal electronic health record dataset of cardiovascular patients from central China. CardioEHR：中国中部心血管患者的纵向电子健康记录数据集。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06855-7

Lingfeng Zha, Chengbo Fu, Xue Sha, Peijun Yin, Yanze Li

We present a longitudinal electronic health record (EHR) dataset from Wuhan Union Hospital, compiled from two distinct hospital information systems. The first dataset, derived from a legacy system, includes 35,243 patients and covers the period from 2010 to 2020. The second dataset, collected via the research-oriented YIDUYUN system, includes 37,975 patients and spans from 2011 to 2024. Both datasets provide structured and de-identified clinical information, including medical record number, demographics, diagnoses, admissions, discharges, timestamps record, laboratory test results (including COVID-19 test records) and patients' residential region. Using the patients' residential regions, we combined the data with information from the China Statistical Yearbook to collect regional socioeconomic indices. While not specifically designed for pandemic research, the dataset captures both pre-pandemic and post-pandemic periods with de-identified exact timestamps, making it suitable for analyzing long-term healthcare utilization, population behavior, and policy impacts. With comprehensive metadata and rigorous validation, this resource supports a wide range of applications in longitudinal health system research and data-driven modeling.

我们介绍了武汉协和医院的纵向电子健康记录（EHR）数据集，该数据集来自两个不同的医院信息系统。第一个数据集来自一个遗留系统，包括35243名患者，涵盖了2010年至2020年的时间。第二个数据集通过以研究为导向的易度云系统收集，包括37,975名患者，时间跨度从2011年到2024年。这两个数据集都提供结构化和去识别的临床信息，包括病历号、人口统计、诊断、入院、出院、时间戳记录、实验室检测结果（包括COVID-19检测记录）和患者居住地区。以患者居住区域为研究对象，结合《中国统计年鉴》的数据收集区域社会经济指标。虽然不是专门为大流行研究设计的，但该数据集以去识别的精确时间戳捕获了大流行前和大流行后的时期，使其适合分析长期医疗保健利用、人口行为和政策影响。通过全面的元数据和严格的验证，该资源支持纵向卫生系统研究和数据驱动建模中的广泛应用。

{"title":"CardioEHR: A longitudinal electronic health record dataset of cardiovascular patients from central China.","authors":"Lingfeng Zha, Chengbo Fu, Xue Sha, Peijun Yin, Yanze Li","doi":"10.1038/s41597-026-06855-7","DOIUrl":"https://doi.org/10.1038/s41597-026-06855-7","url":null,"abstract":"We present a longitudinal electronic health record (EHR) dataset from Wuhan Union Hospital, compiled from two distinct hospital information systems. The first dataset, derived from a legacy system, includes 35,243 patients and covers the period from 2010 to 2020. The second dataset, collected via the research-oriented YIDUYUN system, includes 37,975 patients and spans from 2011 to 2024. Both datasets provide structured and de-identified clinical information, including medical record number, demographics, diagnoses, admissions, discharges, timestamps record, laboratory test results (including COVID-19 test records) and patients' residential region. Using the patients' residential regions, we combined the data with information from the China Statistical Yearbook to collect regional socioeconomic indices. While not specifically designed for pandemic research, the dataset captures both pre-pandemic and post-pandemic periods with de-identified exact timestamps, making it suitable for analyzing long-term healthcare utilization, population behavior, and policy impacts. With comprehensive metadata and rigorous validation, this resource supports a wide range of applications in longitudinal health system research and data-driven modeling.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RadRepro CBCT: An Open-Access CBCT Phantom Dataset for Improved Standardization and Reproducibility of Radiomics Research. RadRepro CBCT：一个开放获取的CBCT幻影数据集，用于提高放射组学研究的标准化和可重复性。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06781-8

Sepideh Hatamikia, Elisabeth Steiner, Eashrat Jahan Muniya, Soraya Elmirad, Arezoo Borji, Gernot Kronreif, Wolfgang Birkfellner, Martin Buschmann

Radiomics, the extraction of quantitative features from medical images, has shown great potential in improving precision diagnosis, prognosis, and treatment planning. However, the reproducibility of radiomics features remains a major challenge due to the variability introduced by differences in imaging devices, acquisition protocols, and image reconstruction methods. This study introduces the first open-access cone-beam computed tomography (CBCT) phantom dataset specifically designed to test reproducibility in on-board imaging systems used in C-arm linear accelerators for radiotherapy. Using a widely recognized Catphan phantom, CBCT images were acquired from multiple devices across different imaging parameters, including variations in mAs, slice thickness, and reconstruction filters. The dataset includes 120 CBCT volumes with corresponding region of interest (ROI) segmentations and radiomics features enabling comprehensive testing of radiomics feature stability across intra- and inter-vendor comparisons. By providing this open-access dataset, the study aims to facilitate the standardization of CBCT radiomics research, improve feature reproducibility, and support the development of robust radiomics models for clinical applications.

放射组学从医学图像中提取定量特征，在提高精确诊断、预后和治疗计划方面显示出巨大的潜力。然而，由于成像设备、采集协议和图像重建方法的差异，放射组学特征的再现性仍然是一个主要挑战。本研究介绍了首个开放获取锥形束计算机断层扫描（CBCT）虚拟数据集，该数据集专门设计用于测试用于放射治疗的c臂直线加速器的车载成像系统的可重复性。使用广泛认可的Catphan幻像，从不同成像参数的多个设备获得CBCT图像，包括mAs，切片厚度和重建滤波器的变化。该数据集包括120个CBCT卷，具有相应的感兴趣区域（ROI）分割和放射组学特征，能够在供应商内部和内部比较中全面测试放射组学特征的稳定性。通过提供这个开放获取的数据集，该研究旨在促进CBCT放射组学研究的标准化，提高特征的可重复性，并支持临床应用健壮的放射组学模型的开发。

{"title":"RadRepro CBCT: An Open-Access CBCT Phantom Dataset for Improved Standardization and Reproducibility of Radiomics Research.","authors":"Sepideh Hatamikia, Elisabeth Steiner, Eashrat Jahan Muniya, Soraya Elmirad, Arezoo Borji, Gernot Kronreif, Wolfgang Birkfellner, Martin Buschmann","doi":"10.1038/s41597-026-06781-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06781-8","url":null,"abstract":"Radiomics, the extraction of quantitative features from medical images, has shown great potential in improving precision diagnosis, prognosis, and treatment planning. However, the reproducibility of radiomics features remains a major challenge due to the variability introduced by differences in imaging devices, acquisition protocols, and image reconstruction methods. This study introduces the first open-access cone-beam computed tomography (CBCT) phantom dataset specifically designed to test reproducibility in on-board imaging systems used in C-arm linear accelerators for radiotherapy. Using a widely recognized Catphan phantom, CBCT images were acquired from multiple devices across different imaging parameters, including variations in mAs, slice thickness, and reconstruction filters. The dataset includes 120 CBCT volumes with corresponding region of interest (ROI) segmentations and radiomics features enabling comprehensive testing of radiomics feature stability across intra- and inter-vendor comparisons. By providing this open-access dataset, the study aims to facilitate the standardization of CBCT radiomics research, improve feature reproducibility, and support the development of robust radiomics models for clinical applications.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A telomere-to-telomere genome assembly of Castanopsis orthacantha (Fagaceae). 北棘栲（壳斗科）的端粒到端粒基因组组装。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06787-2

Si Yin, Haibo Wang, Honglong Chu, Yanan Zhang, Changxin Luo, Yanguo Xu, Yong Gao

This study reported the first telomere-to-telomere (T2T) genome assembly of Castanopsis orthacantha, a keystone tree species with significant ecological and economic values endemic to the subtropical evergreen forests of southwestern China. Using multi-platform sequencing data and high-throughput chromosome conformation capture (Hi-C) scaffolding, we successfully generated a chromosome-scale assembly. The final assembly spanned 893.28 Mb, with a contig N50 of 76.19 Mb, indicating a high degree of continuity. Remarkably, 97.94% of the genome was successfully anchored to 12 chromosomes. Terminal telomeric repeat sequences were identified at both ends of all of the chromosomes, and the assembly contained only a single unresolved gap. A total of 35,978 protein-coding genes were detected in the assembly, with an average coding sequence (CDS) length of 1,116.3 bp. Genomic analysis further revealed that repetitive elements comprised 59.28% of the genome. The generation of this near-complete reference genome of C. orthacantha provides a critical genomic resource for advancing evolutionary study within the Fagaceae family and supports conservation genomics strategies aimed at the ecological restoration of this species.

本研究报道了中国西南亚热带常绿森林特有的具有重要生态和经济价值的重要树种——东北栲（Castanopsis orthacantha）的第一个端粒到端粒基因组组装。利用多平台测序数据和高通量染色体构象捕获（Hi-C）脚手架，我们成功地生成了染色体尺度的组装。最终组装全长893.28 Mb，连续N50为76.19 Mb，显示了高度的连续性。值得注意的是，97.94%的基因组成功地固定在12条染色体上。终端端粒重复序列在所有染色体的两端被鉴定，组装只包含一个未解决的间隙。共检测到35978个蛋白编码基因，平均编码序列（CDS）长度为116.3 bp。基因组分析进一步表明，重复元件占基因组的59.28%。这一接近完整的参考基因组的生成为推进壳斗科植物的进化研究提供了重要的基因组资源，并为该物种的生态恢复提供了保护基因组策略。

{"title":"A telomere-to-telomere genome assembly of Castanopsis orthacantha (Fagaceae).","authors":"Si Yin, Haibo Wang, Honglong Chu, Yanan Zhang, Changxin Luo, Yanguo Xu, Yong Gao","doi":"10.1038/s41597-026-06787-2","DOIUrl":"https://doi.org/10.1038/s41597-026-06787-2","url":null,"abstract":"This study reported the first telomere-to-telomere (T2T) genome assembly of Castanopsis orthacantha, a keystone tree species with significant ecological and economic values endemic to the subtropical evergreen forests of southwestern China. Using multi-platform sequencing data and high-throughput chromosome conformation capture (Hi-C) scaffolding, we successfully generated a chromosome-scale assembly. The final assembly spanned 893.28 Mb, with a contig N50 of 76.19 Mb, indicating a high degree of continuity. Remarkably, 97.94% of the genome was successfully anchored to 12 chromosomes. Terminal telomeric repeat sequences were identified at both ends of all of the chromosomes, and the assembly contained only a single unresolved gap. A total of 35,978 protein-coding genes were detected in the assembly, with an average coding sequence (CDS) length of 1,116.3 bp. Genomic analysis further revealed that repetitive elements comprised 59.28% of the genome. The generation of this near-complete reference genome of C. orthacantha provides a critical genomic resource for advancing evolutionary study within the Fagaceae family and supports conservation genomics strategies aimed at the ecological restoration of this species.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Water Masses of the Arctic from 40 Years of Hydrographic Observations. 从40年的水文观测看北极的水团。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data

Pub Date : 2026-02-14 DOI: 10.1038/s41597-026-06749-8

Kate Oglethorpe, Joshua Lanham, Rafael S Reiss, Emma J D Boland, Alberto C Naveira Garabato, Colm-Cille P Caulfield, Ali Mashayek

The Arctic Ocean has been changing rapidly in a warming climate. To monitor these changes, it is useful to classify the Arctic Ocean into water masses-bodies of water with similar origin and physical and biogeochemical properties. However, there are significant barriers to Arctic water mass classification: observations of seawater properties are sparse, and traditional classification relies on extensive knowledge of water mass characteristics and circulation. To address these challenges, we compile existing hydrographic observations of the upper 1000 m of the Arctic Ocean and classify these observations into water masses. We present the classification tool and accompanying dataset, Water Masses of the Arctic (WMA), to support basin-wide investigations of Arctic Ocean circulation, its variability, drivers and impacts on wider Arctic climate. Our dataset reproduces key spatial and temporal features of Arctic water masses, including Atlantic and Pacific Water pathways. The WMA dataset will improve understanding of Arctic Ocean dynamics and provide an accessible framework for assessing the accuracy of the representation of the Arctic Ocean in Earth System Models.

在气候变暖的情况下，北冰洋一直在迅速变化。为了监测这些变化，将北冰洋划分为水团是有用的，水团是具有相似起源和物理和生物地球化学性质的水体。然而，北极水团分类存在重大障碍：对海水性质的观测很少，传统的分类依赖于对水团特征和环流的广泛了解。为了应对这些挑战，我们汇编了现有的北冰洋1000米以上的水文观测资料，并将这些观测资料分类为水团。我们提出了分类工具和附带的数据集——北极水团（WMA），以支持对北冰洋环流、其变率、驱动因素及其对更广泛的北极气候的影响的全流域调查。我们的数据集再现了北极水团的关键时空特征，包括大西洋和太平洋的水通道。WMA数据集将提高对北冰洋动力学的理解，并为评估地球系统模式中北冰洋表示的准确性提供一个可访问的框架。

{"title":"Water Masses of the Arctic from 40 Years of Hydrographic Observations.","authors":"Kate Oglethorpe, Joshua Lanham, Rafael S Reiss, Emma J D Boland, Alberto C Naveira Garabato, Colm-Cille P Caulfield, Ali Mashayek","doi":"10.1038/s41597-026-06749-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06749-8","url":null,"abstract":"The Arctic Ocean has been changing rapidly in a warming climate. To monitor these changes, it is useful to classify the Arctic Ocean into water masses-bodies of water with similar origin and physical and biogeochemical properties. However, there are significant barriers to Arctic water mass classification: observations of seawater properties are sparse, and traditional classification relies on extensive knowledge of water mass characteristics and circulation. To address these challenges, we compile existing hydrographic observations of the upper 1000 m of the Arctic Ocean and classify these observations into water masses. We present the classification tool and accompanying dataset, Water Masses of the Arctic (WMA), to support basin-wide investigations of Arctic Ocean circulation, its variability, drivers and impacts on wider Arctic climate. Our dataset reproduces key spatial and temporal features of Arctic water masses, including Atlantic and Pacific Water pathways. The WMA dataset will improve understanding of Arctic Ocean dynamics and provide an accessible framework for assessing the accuracy of the representation of the Arctic Ocean in Earth System Models.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146198052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0