This Data Descriptor presents a comprehensive LC-MS/MS-based muscle proteome dataset for Clarias magur subjected to sustained sub-lethal thermal stress. Fish were maintained at a control temperature of 26 °C or gradually warmed at a rate of 1 °C per day until reaching 37 °C, which was sustained for 60 days. Muscle samples were collected, proteins were extracted, digested with trypsin, and analyzed using a Waters SYNAPT G2-Si Q-TOF mass spectrometer. Resulting peptide spectra were searched against the Danio rerio Swiss-Prot UniProt database due to limited proteome resources for C. magur. The dataset includes raw mass spectral files, protein and peptide tables, sample metadata, water-quality logs, and processing files. These records are publicly available via PRIDE and are intended to support future studies on thermal tolerance in aquaculture species here the sole objective is to provide a transparent and reusable thermal proteome resource.
{"title":"Muscle Proteomic Dataset of A Threatened Indian walking catfish, Clarias magur (Hamilton 1822) Exposed to Thermal Stress.","authors":"Poonam Jayant Singh, Arpita Batta, Satish Kumar Srivastava","doi":"10.1038/s41597-026-06826-y","DOIUrl":"https://doi.org/10.1038/s41597-026-06826-y","url":null,"abstract":"<p><p>This Data Descriptor presents a comprehensive LC-MS/MS-based muscle proteome dataset for Clarias magur subjected to sustained sub-lethal thermal stress. Fish were maintained at a control temperature of 26 °C or gradually warmed at a rate of 1 °C per day until reaching 37 °C, which was sustained for 60 days. Muscle samples were collected, proteins were extracted, digested with trypsin, and analyzed using a Waters SYNAPT G2-Si Q-TOF mass spectrometer. Resulting peptide spectra were searched against the Danio rerio Swiss-Prot UniProt database due to limited proteome resources for C. magur. The dataset includes raw mass spectral files, protein and peptide tables, sample metadata, water-quality logs, and processing files. These records are publicly available via PRIDE and are intended to support future studies on thermal tolerance in aquaculture species here the sole objective is to provide a transparent and reusable thermal proteome resource.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146207562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-16DOI: 10.1038/s41597-026-06862-8
Penglin Wang, Huibin Ke, Yunfei Xue
Metal combustion, which is fundamentally a rapid exothermic redox reaction with oxygen, governs critical applications from aerospace propulsion to structural fire safety. Understanding key combustion metrics including combustion enthalpy, ignition temperature, ignition delay time, combustion rate, and threshold pressure is essential for designing fire-resistant alloys or high-energy propellants. This work establishes a comprehensive database of 725 curated data points extracted from 45 publications, mainly encompassing pure metals, Al-based, Ti-based, Mg-based, Fe-based alloys and multi-component alloys. Each data entry integrates combustion metrics with alloy composition and critical experimental metadata, such as sample geometry, oxygen partial pressure and test method. By integrating scattered literature data into a unified framework with standardized parameters, this work provides a foundation for data-driven discovery of next-generation materials with tailored combustion performance.
{"title":"An integrated database of combustion properties of metallic materials.","authors":"Penglin Wang, Huibin Ke, Yunfei Xue","doi":"10.1038/s41597-026-06862-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06862-8","url":null,"abstract":"<p><p>Metal combustion, which is fundamentally a rapid exothermic redox reaction with oxygen, governs critical applications from aerospace propulsion to structural fire safety. Understanding key combustion metrics including combustion enthalpy, ignition temperature, ignition delay time, combustion rate, and threshold pressure is essential for designing fire-resistant alloys or high-energy propellants. This work establishes a comprehensive database of 725 curated data points extracted from 45 publications, mainly encompassing pure metals, Al-based, Ti-based, Mg-based, Fe-based alloys and multi-component alloys. Each data entry integrates combustion metrics with alloy composition and critical experimental metadata, such as sample geometry, oxygen partial pressure and test method. By integrating scattered literature data into a unified framework with standardized parameters, this work provides a foundation for data-driven discovery of next-generation materials with tailored combustion performance.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146207600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-16DOI: 10.1038/s41597-026-06846-8
Limin Gao, Yue Dan, Haohao Wang, Ruiyu Li, Guang Yang
The uncertainty in blade machining deviations leads to the offset in the average performance and the performance scatter of aero-engine compressors, posing a threat to the safe operation of engines. Therefore, quantifying the uncertainty effects of machining deviations is critically important. However, due to factors such as prolonged inspection cycles and high costs, geometric data on blade machining deviations remain scarce. Most uncertainty quantification analyses are conducted under assumed statistical distributions of deviations, making it difficult to guarantee the accuracy of the quantification. In this paper, a dataset of measured machining deviations of 100 compressor rotor blades is presented. And it includes 7 types of machining deviations of 13 blade sections from blade root to tip. The work fills a critical gap in available geometric deviation data for compressor rotor blades, and provides a reliable foundation for subsequent uncertainty quantification investigation.
{"title":"A dataset of measured machining deviations of compressor rotor blades.","authors":"Limin Gao, Yue Dan, Haohao Wang, Ruiyu Li, Guang Yang","doi":"10.1038/s41597-026-06846-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06846-8","url":null,"abstract":"<p><p>The uncertainty in blade machining deviations leads to the offset in the average performance and the performance scatter of aero-engine compressors, posing a threat to the safe operation of engines. Therefore, quantifying the uncertainty effects of machining deviations is critically important. However, due to factors such as prolonged inspection cycles and high costs, geometric data on blade machining deviations remain scarce. Most uncertainty quantification analyses are conducted under assumed statistical distributions of deviations, making it difficult to guarantee the accuracy of the quantification. In this paper, a dataset of measured machining deviations of 100 compressor rotor blades is presented. And it includes 7 types of machining deviations of 13 blade sections from blade root to tip. The work fills a critical gap in available geometric deviation data for compressor rotor blades, and provides a reliable foundation for subsequent uncertainty quantification investigation.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146207609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-16DOI: 10.1038/s41597-026-06835-x
Eva Moracho, Juan Miguel Arroyo, Blanca Arroyo-Correa, Gemma Calvo, Pablo Homet, Jorge Isla, Miguel Jácome-Flores, Irene Mendoza, Elena Quintero, Francisco Rodríguez-Sánchez, Pablo Villalva, Pedro Jordano
Mutualistic plant-animal interactions for seed dispersal are crucial for vegetation dynamics, benefiting over half of the world's plant species. Beyond the tropics, the Mediterranean biome harbors the highest proportion of species adapted to endozoochory, yet major gaps remain in quantifying interaction diversity in these biodiversity-rich areas and their links to ecosystem functioning. High-resolution, quantitative interaction data are essential not only to fill these gaps but also to enable large-scale ecological modeling of species interactions across biomes. Here, we present the FRUGivory INTegration (FRUGINT) dataset - an extensive collection of quantitative frugivory interactions and associated species traits from a Mediterranean biodiversity hotspot in southwestern Spain. By integrating six complementary sampling methods (camera trapping, continuous-monitoring cameras, DNA-barcoding, mist-netting, direct observation and track records) across multiple years, the dataset overcomes limitations of sampling biases, variable effort and spatio-temporal heterogeneity, providing a comprehensive picture of plant-frugivore interactions across the region. Based on a total of 37,923 interaction records and 481 unique pairwise interactions, involving 26 fleshy-fruited plant species present in Doñana and 78 frugivorous vertebrate species, FRUGINT yields estimates of regional-scale plant-frugivore networks based on pairwise interaction probabilities. The dataset encompasses both common and numerous rare interactions, offering a valuable resource for advancing research on plant-animal interactions, network ecology, and biodiversity conservation.
{"title":"A comprehensive, multi-method dataset of plant-frugivore interactions in a Mediterranean hotspot.","authors":"Eva Moracho, Juan Miguel Arroyo, Blanca Arroyo-Correa, Gemma Calvo, Pablo Homet, Jorge Isla, Miguel Jácome-Flores, Irene Mendoza, Elena Quintero, Francisco Rodríguez-Sánchez, Pablo Villalva, Pedro Jordano","doi":"10.1038/s41597-026-06835-x","DOIUrl":"https://doi.org/10.1038/s41597-026-06835-x","url":null,"abstract":"<p><p>Mutualistic plant-animal interactions for seed dispersal are crucial for vegetation dynamics, benefiting over half of the world's plant species. Beyond the tropics, the Mediterranean biome harbors the highest proportion of species adapted to endozoochory, yet major gaps remain in quantifying interaction diversity in these biodiversity-rich areas and their links to ecosystem functioning. High-resolution, quantitative interaction data are essential not only to fill these gaps but also to enable large-scale ecological modeling of species interactions across biomes. Here, we present the FRUGivory INTegration (FRUGINT) dataset - an extensive collection of quantitative frugivory interactions and associated species traits from a Mediterranean biodiversity hotspot in southwestern Spain. By integrating six complementary sampling methods (camera trapping, continuous-monitoring cameras, DNA-barcoding, mist-netting, direct observation and track records) across multiple years, the dataset overcomes limitations of sampling biases, variable effort and spatio-temporal heterogeneity, providing a comprehensive picture of plant-frugivore interactions across the region. Based on a total of 37,923 interaction records and 481 unique pairwise interactions, involving 26 fleshy-fruited plant species present in Doñana and 78 frugivorous vertebrate species, FRUGINT yields estimates of regional-scale plant-frugivore networks based on pairwise interaction probabilities. The dataset encompasses both common and numerous rare interactions, offering a valuable resource for advancing research on plant-animal interactions, network ecology, and biodiversity conservation.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146207585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06838-8
Junyuan Li, Rongye Tang, Juan Feng, Tinghui Xie, Sitong Liu, Yang Li
Paracondylactis sinensis is a burrowing sea anemone inhabiting soft sediments along the Chinese coast, representing an ecologically and economically important actiniarian species. Despite its unique adaptations to hypoxia and sediment-associated stressors, genomic resources for burrowing sea anemones have been lacking. Here, we report a high-quality, chromosome-level genome assembly of P. sinensis. With PacBio HiFi long reads (39.77 × coverage), Illumina short reads, and Hi-C data, a 210.63 Mb genome with a contig N50 of 8.70 Mb and a scaffold N50 of 9.41 Mb was generated. A total of 93.44% of the assembly was anchored to 19 pseudo-chromosomes. BUSCO analysis indicated 95.91% completeness, confirming high assembly quality. Comprehensive annotation identified 19,420 protein-coding genes, of which 91.35% were functionally annotated. Repetitive elements accounted for 26.43% of the genome, with transposable elements representing 20.47%. This genome provides a crucial reference for understanding the genetic basis of environmental adaptation in P. sinensis and supports future efforts in its conservation, aquaculture, and bioactive compound exploration.
{"title":"Chromosome-scale genome of the burrowing sea anemone Paracondylactis sinensis.","authors":"Junyuan Li, Rongye Tang, Juan Feng, Tinghui Xie, Sitong Liu, Yang Li","doi":"10.1038/s41597-026-06838-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06838-8","url":null,"abstract":"<p><p>Paracondylactis sinensis is a burrowing sea anemone inhabiting soft sediments along the Chinese coast, representing an ecologically and economically important actiniarian species. Despite its unique adaptations to hypoxia and sediment-associated stressors, genomic resources for burrowing sea anemones have been lacking. Here, we report a high-quality, chromosome-level genome assembly of P. sinensis. With PacBio HiFi long reads (39.77 × coverage), Illumina short reads, and Hi-C data, a 210.63 Mb genome with a contig N50 of 8.70 Mb and a scaffold N50 of 9.41 Mb was generated. A total of 93.44% of the assembly was anchored to 19 pseudo-chromosomes. BUSCO analysis indicated 95.91% completeness, confirming high assembly quality. Comprehensive annotation identified 19,420 protein-coding genes, of which 91.35% were functionally annotated. Repetitive elements accounted for 26.43% of the genome, with transposable elements representing 20.47%. This genome provides a crucial reference for understanding the genetic basis of environmental adaptation in P. sinensis and supports future efforts in its conservation, aquaculture, and bioactive compound exploration.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06858-4
Amy C Green, Selma B Guerreiro, Hayley J Fowler
Short-duration extreme rainfall events can cause flash flooding and infrastructure failures, yet resources to assess these remain limited, particularly at the global scale. Heterogeneous data availability, inconsistent quality control, and methodological differences hinder the development of comparable intensity-duration-frequency (IDF) estimates. To address this gap, we present GSDR-IDF, a global dataset of intensity-duration-frequency curves derived from the largest quality-controlled sub-daily rain gauge dataset: the Global Sub-Daily Rainfall dataset (GSDR), comprising +24,000 hourly rain gauge records for all major climate regions. We apply robust extreme value analysis methods, including single-gauge and regional frequency approaches, to estimate return levels for 1-, 3-, 6- and 24-hour durations and for 10-, 30-, and 100-year return levels. These are then combined to give IDF curves for each rain gauge, providing an openly accessible, traceable, and reproducible resource for hydrological modelling, engineering design, flood-risk assessment and climate-resilience planning. This dataset represents a step change in accessibility and precision for global IDF estimation and enables a wide range of cross-disciplinary applications.
{"title":"Global Intensity-Duration-Frequency curves based on observed sub-daily rainfall (GSDR-IDF).","authors":"Amy C Green, Selma B Guerreiro, Hayley J Fowler","doi":"10.1038/s41597-026-06858-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06858-4","url":null,"abstract":"<p><p>Short-duration extreme rainfall events can cause flash flooding and infrastructure failures, yet resources to assess these remain limited, particularly at the global scale. Heterogeneous data availability, inconsistent quality control, and methodological differences hinder the development of comparable intensity-duration-frequency (IDF) estimates. To address this gap, we present GSDR-IDF, a global dataset of intensity-duration-frequency curves derived from the largest quality-controlled sub-daily rain gauge dataset: the Global Sub-Daily Rainfall dataset (GSDR), comprising +24,000 hourly rain gauge records for all major climate regions. We apply robust extreme value analysis methods, including single-gauge and regional frequency approaches, to estimate return levels for 1-, 3-, 6- and 24-hour durations and for 10-, 30-, and 100-year return levels. These are then combined to give IDF curves for each rain gauge, providing an openly accessible, traceable, and reproducible resource for hydrological modelling, engineering design, flood-risk assessment and climate-resilience planning. This dataset represents a step change in accessibility and precision for global IDF estimation and enables a wide range of cross-disciplinary applications.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146198032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06803-5
Alexandra Medvedeva, Nikolay Syrov, Lev Yakovlev, Yana Alieva, Artemiy Berkmush-Antipova, Galina Ivanova, Natalia Shusharina, Alexander Kaplan
Accurate diagnosis and monitoring of recovery after stroke are critical for effective motor rehabilitation. As stroke is inherently associated with impaired cerebral blood flow, functional near-infrared spectroscopy (fNIRS) provides a valuable tool for assessing hemodynamic changes in the brain. When combined with electroencephalography (EEG), this multimodal approach can provide complementary insights into neural and vascular responses during recovery. However, longitudinal datasets combining fNIRS and EEG in stroke populations remain limited. The current article presents an open access dataset with simultaneous fNIRS and EEG recordings from 16 post-stroke patients over 84 rehabilitation sessions. Participants performed motor tasks with both paretic and intact hands. The dataset includes raw and processed signals, clinical scores (ARAT, Fugl-Meyer) and patient demographics. This resource supports research into stroke recovery, development of neurorehabilitation strategies and fNIRS-based brain computer interfaces (BCI).
{"title":"Multisession fNIRS-EEG data of Post-Stroke Motor Recovery. Recordings During Intact and Paretic Hand Movements.","authors":"Alexandra Medvedeva, Nikolay Syrov, Lev Yakovlev, Yana Alieva, Artemiy Berkmush-Antipova, Galina Ivanova, Natalia Shusharina, Alexander Kaplan","doi":"10.1038/s41597-026-06803-5","DOIUrl":"https://doi.org/10.1038/s41597-026-06803-5","url":null,"abstract":"<p><p>Accurate diagnosis and monitoring of recovery after stroke are critical for effective motor rehabilitation. As stroke is inherently associated with impaired cerebral blood flow, functional near-infrared spectroscopy (fNIRS) provides a valuable tool for assessing hemodynamic changes in the brain. When combined with electroencephalography (EEG), this multimodal approach can provide complementary insights into neural and vascular responses during recovery. However, longitudinal datasets combining fNIRS and EEG in stroke populations remain limited. The current article presents an open access dataset with simultaneous fNIRS and EEG recordings from 16 post-stroke patients over 84 rehabilitation sessions. Participants performed motor tasks with both paretic and intact hands. The dataset includes raw and processed signals, clinical scores (ARAT, Fugl-Meyer) and patient demographics. This resource supports research into stroke recovery, development of neurorehabilitation strategies and fNIRS-based brain computer interfaces (BCI).</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146195430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06855-7
Lingfeng Zha, Chengbo Fu, Xue Sha, Peijun Yin, Yanze Li
We present a longitudinal electronic health record (EHR) dataset from Wuhan Union Hospital, compiled from two distinct hospital information systems. The first dataset, derived from a legacy system, includes 35,243 patients and covers the period from 2010 to 2020. The second dataset, collected via the research-oriented YIDUYUN system, includes 37,975 patients and spans from 2011 to 2024. Both datasets provide structured and de-identified clinical information, including medical record number, demographics, diagnoses, admissions, discharges, timestamps record, laboratory test results (including COVID-19 test records) and patients' residential region. Using the patients' residential regions, we combined the data with information from the China Statistical Yearbook to collect regional socioeconomic indices. While not specifically designed for pandemic research, the dataset captures both pre-pandemic and post-pandemic periods with de-identified exact timestamps, making it suitable for analyzing long-term healthcare utilization, population behavior, and policy impacts. With comprehensive metadata and rigorous validation, this resource supports a wide range of applications in longitudinal health system research and data-driven modeling.
{"title":"CardioEHR: A longitudinal electronic health record dataset of cardiovascular patients from central China.","authors":"Lingfeng Zha, Chengbo Fu, Xue Sha, Peijun Yin, Yanze Li","doi":"10.1038/s41597-026-06855-7","DOIUrl":"https://doi.org/10.1038/s41597-026-06855-7","url":null,"abstract":"<p><p>We present a longitudinal electronic health record (EHR) dataset from Wuhan Union Hospital, compiled from two distinct hospital information systems. The first dataset, derived from a legacy system, includes 35,243 patients and covers the period from 2010 to 2020. The second dataset, collected via the research-oriented YIDUYUN system, includes 37,975 patients and spans from 2011 to 2024. Both datasets provide structured and de-identified clinical information, including medical record number, demographics, diagnoses, admissions, discharges, timestamps record, laboratory test results (including COVID-19 test records) and patients' residential region. Using the patients' residential regions, we combined the data with information from the China Statistical Yearbook to collect regional socioeconomic indices. While not specifically designed for pandemic research, the dataset captures both pre-pandemic and post-pandemic periods with de-identified exact timestamps, making it suitable for analyzing long-term healthcare utilization, population behavior, and policy impacts. With comprehensive metadata and rigorous validation, this resource supports a wide range of applications in longitudinal health system research and data-driven modeling.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06781-8
Sepideh Hatamikia, Elisabeth Steiner, Eashrat Jahan Muniya, Soraya Elmirad, Arezoo Borji, Gernot Kronreif, Wolfgang Birkfellner, Martin Buschmann
Radiomics, the extraction of quantitative features from medical images, has shown great potential in improving precision diagnosis, prognosis, and treatment planning. However, the reproducibility of radiomics features remains a major challenge due to the variability introduced by differences in imaging devices, acquisition protocols, and image reconstruction methods. This study introduces the first open-access cone-beam computed tomography (CBCT) phantom dataset specifically designed to test reproducibility in on-board imaging systems used in C-arm linear accelerators for radiotherapy. Using a widely recognized Catphan phantom, CBCT images were acquired from multiple devices across different imaging parameters, including variations in mAs, slice thickness, and reconstruction filters. The dataset includes 120 CBCT volumes with corresponding region of interest (ROI) segmentations and radiomics features enabling comprehensive testing of radiomics feature stability across intra- and inter-vendor comparisons. By providing this open-access dataset, the study aims to facilitate the standardization of CBCT radiomics research, improve feature reproducibility, and support the development of robust radiomics models for clinical applications.
{"title":"RadRepro CBCT: An Open-Access CBCT Phantom Dataset for Improved Standardization and Reproducibility of Radiomics Research.","authors":"Sepideh Hatamikia, Elisabeth Steiner, Eashrat Jahan Muniya, Soraya Elmirad, Arezoo Borji, Gernot Kronreif, Wolfgang Birkfellner, Martin Buschmann","doi":"10.1038/s41597-026-06781-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06781-8","url":null,"abstract":"<p><p>Radiomics, the extraction of quantitative features from medical images, has shown great potential in improving precision diagnosis, prognosis, and treatment planning. However, the reproducibility of radiomics features remains a major challenge due to the variability introduced by differences in imaging devices, acquisition protocols, and image reconstruction methods. This study introduces the first open-access cone-beam computed tomography (CBCT) phantom dataset specifically designed to test reproducibility in on-board imaging systems used in C-arm linear accelerators for radiotherapy. Using a widely recognized Catphan phantom, CBCT images were acquired from multiple devices across different imaging parameters, including variations in mAs, slice thickness, and reconstruction filters. The dataset includes 120 CBCT volumes with corresponding region of interest (ROI) segmentations and radiomics features enabling comprehensive testing of radiomics feature stability across intra- and inter-vendor comparisons. By providing this open-access dataset, the study aims to facilitate the standardization of CBCT radiomics research, improve feature reproducibility, and support the development of robust radiomics models for clinical applications.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study reported the first telomere-to-telomere (T2T) genome assembly of Castanopsis orthacantha, a keystone tree species with significant ecological and economic values endemic to the subtropical evergreen forests of southwestern China. Using multi-platform sequencing data and high-throughput chromosome conformation capture (Hi-C) scaffolding, we successfully generated a chromosome-scale assembly. The final assembly spanned 893.28 Mb, with a contig N50 of 76.19 Mb, indicating a high degree of continuity. Remarkably, 97.94% of the genome was successfully anchored to 12 chromosomes. Terminal telomeric repeat sequences were identified at both ends of all of the chromosomes, and the assembly contained only a single unresolved gap. A total of 35,978 protein-coding genes were detected in the assembly, with an average coding sequence (CDS) length of 1,116.3 bp. Genomic analysis further revealed that repetitive elements comprised 59.28% of the genome. The generation of this near-complete reference genome of C. orthacantha provides a critical genomic resource for advancing evolutionary study within the Fagaceae family and supports conservation genomics strategies aimed at the ecological restoration of this species.
{"title":"A telomere-to-telomere genome assembly of Castanopsis orthacantha (Fagaceae).","authors":"Si Yin, Haibo Wang, Honglong Chu, Yanan Zhang, Changxin Luo, Yanguo Xu, Yong Gao","doi":"10.1038/s41597-026-06787-2","DOIUrl":"https://doi.org/10.1038/s41597-026-06787-2","url":null,"abstract":"<p><p>This study reported the first telomere-to-telomere (T2T) genome assembly of Castanopsis orthacantha, a keystone tree species with significant ecological and economic values endemic to the subtropical evergreen forests of southwestern China. Using multi-platform sequencing data and high-throughput chromosome conformation capture (Hi-C) scaffolding, we successfully generated a chromosome-scale assembly. The final assembly spanned 893.28 Mb, with a contig N50 of 76.19 Mb, indicating a high degree of continuity. Remarkably, 97.94% of the genome was successfully anchored to 12 chromosomes. Terminal telomeric repeat sequences were identified at both ends of all of the chromosomes, and the assembly contained only a single unresolved gap. A total of 35,978 protein-coding genes were detected in the assembly, with an average coding sequence (CDS) length of 1,116.3 bp. Genomic analysis further revealed that repetitive elements comprised 59.28% of the genome. The generation of this near-complete reference genome of C. orthacantha provides a critical genomic resource for advancing evolutionary study within the Fagaceae family and supports conservation genomics strategies aimed at the ecological restoration of this species.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}