Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06838-8
Junyuan Li, Rongye Tang, Juan Feng, Tinghui Xie, Sitong Liu, Yang Li
Paracondylactis sinensis is a burrowing sea anemone inhabiting soft sediments along the Chinese coast, representing an ecologically and economically important actiniarian species. Despite its unique adaptations to hypoxia and sediment-associated stressors, genomic resources for burrowing sea anemones have been lacking. Here, we report a high-quality, chromosome-level genome assembly of P. sinensis. With PacBio HiFi long reads (39.77 × coverage), Illumina short reads, and Hi-C data, a 210.63 Mb genome with a contig N50 of 8.70 Mb and a scaffold N50 of 9.41 Mb was generated. A total of 93.44% of the assembly was anchored to 19 pseudo-chromosomes. BUSCO analysis indicated 95.91% completeness, confirming high assembly quality. Comprehensive annotation identified 19,420 protein-coding genes, of which 91.35% were functionally annotated. Repetitive elements accounted for 26.43% of the genome, with transposable elements representing 20.47%. This genome provides a crucial reference for understanding the genetic basis of environmental adaptation in P. sinensis and supports future efforts in its conservation, aquaculture, and bioactive compound exploration.
{"title":"Chromosome-scale genome of the burrowing sea anemone Paracondylactis sinensis.","authors":"Junyuan Li, Rongye Tang, Juan Feng, Tinghui Xie, Sitong Liu, Yang Li","doi":"10.1038/s41597-026-06838-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06838-8","url":null,"abstract":"<p><p>Paracondylactis sinensis is a burrowing sea anemone inhabiting soft sediments along the Chinese coast, representing an ecologically and economically important actiniarian species. Despite its unique adaptations to hypoxia and sediment-associated stressors, genomic resources for burrowing sea anemones have been lacking. Here, we report a high-quality, chromosome-level genome assembly of P. sinensis. With PacBio HiFi long reads (39.77 × coverage), Illumina short reads, and Hi-C data, a 210.63 Mb genome with a contig N50 of 8.70 Mb and a scaffold N50 of 9.41 Mb was generated. A total of 93.44% of the assembly was anchored to 19 pseudo-chromosomes. BUSCO analysis indicated 95.91% completeness, confirming high assembly quality. Comprehensive annotation identified 19,420 protein-coding genes, of which 91.35% were functionally annotated. Repetitive elements accounted for 26.43% of the genome, with transposable elements representing 20.47%. This genome provides a crucial reference for understanding the genetic basis of environmental adaptation in P. sinensis and supports future efforts in its conservation, aquaculture, and bioactive compound exploration.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06858-4
Amy C Green, Selma B Guerreiro, Hayley J Fowler
Short-duration extreme rainfall events can cause flash flooding and infrastructure failures, yet resources to assess these remain limited, particularly at the global scale. Heterogeneous data availability, inconsistent quality control, and methodological differences hinder the development of comparable intensity-duration-frequency (IDF) estimates. To address this gap, we present GSDR-IDF, a global dataset of intensity-duration-frequency curves derived from the largest quality-controlled sub-daily rain gauge dataset: the Global Sub-Daily Rainfall dataset (GSDR), comprising +24,000 hourly rain gauge records for all major climate regions. We apply robust extreme value analysis methods, including single-gauge and regional frequency approaches, to estimate return levels for 1-, 3-, 6- and 24-hour durations and for 10-, 30-, and 100-year return levels. These are then combined to give IDF curves for each rain gauge, providing an openly accessible, traceable, and reproducible resource for hydrological modelling, engineering design, flood-risk assessment and climate-resilience planning. This dataset represents a step change in accessibility and precision for global IDF estimation and enables a wide range of cross-disciplinary applications.
{"title":"Global Intensity-Duration-Frequency curves based on observed sub-daily rainfall (GSDR-IDF).","authors":"Amy C Green, Selma B Guerreiro, Hayley J Fowler","doi":"10.1038/s41597-026-06858-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06858-4","url":null,"abstract":"<p><p>Short-duration extreme rainfall events can cause flash flooding and infrastructure failures, yet resources to assess these remain limited, particularly at the global scale. Heterogeneous data availability, inconsistent quality control, and methodological differences hinder the development of comparable intensity-duration-frequency (IDF) estimates. To address this gap, we present GSDR-IDF, a global dataset of intensity-duration-frequency curves derived from the largest quality-controlled sub-daily rain gauge dataset: the Global Sub-Daily Rainfall dataset (GSDR), comprising +24,000 hourly rain gauge records for all major climate regions. We apply robust extreme value analysis methods, including single-gauge and regional frequency approaches, to estimate return levels for 1-, 3-, 6- and 24-hour durations and for 10-, 30-, and 100-year return levels. These are then combined to give IDF curves for each rain gauge, providing an openly accessible, traceable, and reproducible resource for hydrological modelling, engineering design, flood-risk assessment and climate-resilience planning. This dataset represents a step change in accessibility and precision for global IDF estimation and enables a wide range of cross-disciplinary applications.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146198032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06855-7
Lingfeng Zha, Chengbo Fu, Xue Sha, Peijun Yin, Yanze Li
We present a longitudinal electronic health record (EHR) dataset from Wuhan Union Hospital, compiled from two distinct hospital information systems. The first dataset, derived from a legacy system, includes 35,243 patients and covers the period from 2010 to 2020. The second dataset, collected via the research-oriented YIDUYUN system, includes 37,975 patients and spans from 2011 to 2024. Both datasets provide structured and de-identified clinical information, including medical record number, demographics, diagnoses, admissions, discharges, timestamps record, laboratory test results (including COVID-19 test records) and patients' residential region. Using the patients' residential regions, we combined the data with information from the China Statistical Yearbook to collect regional socioeconomic indices. While not specifically designed for pandemic research, the dataset captures both pre-pandemic and post-pandemic periods with de-identified exact timestamps, making it suitable for analyzing long-term healthcare utilization, population behavior, and policy impacts. With comprehensive metadata and rigorous validation, this resource supports a wide range of applications in longitudinal health system research and data-driven modeling.
{"title":"CardioEHR: A longitudinal electronic health record dataset of cardiovascular patients from central China.","authors":"Lingfeng Zha, Chengbo Fu, Xue Sha, Peijun Yin, Yanze Li","doi":"10.1038/s41597-026-06855-7","DOIUrl":"https://doi.org/10.1038/s41597-026-06855-7","url":null,"abstract":"<p><p>We present a longitudinal electronic health record (EHR) dataset from Wuhan Union Hospital, compiled from two distinct hospital information systems. The first dataset, derived from a legacy system, includes 35,243 patients and covers the period from 2010 to 2020. The second dataset, collected via the research-oriented YIDUYUN system, includes 37,975 patients and spans from 2011 to 2024. Both datasets provide structured and de-identified clinical information, including medical record number, demographics, diagnoses, admissions, discharges, timestamps record, laboratory test results (including COVID-19 test records) and patients' residential region. Using the patients' residential regions, we combined the data with information from the China Statistical Yearbook to collect regional socioeconomic indices. While not specifically designed for pandemic research, the dataset captures both pre-pandemic and post-pandemic periods with de-identified exact timestamps, making it suitable for analyzing long-term healthcare utilization, population behavior, and policy impacts. With comprehensive metadata and rigorous validation, this resource supports a wide range of applications in longitudinal health system research and data-driven modeling.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06781-8
Sepideh Hatamikia, Elisabeth Steiner, Eashrat Jahan Muniya, Soraya Elmirad, Arezoo Borji, Gernot Kronreif, Wolfgang Birkfellner, Martin Buschmann
Radiomics, the extraction of quantitative features from medical images, has shown great potential in improving precision diagnosis, prognosis, and treatment planning. However, the reproducibility of radiomics features remains a major challenge due to the variability introduced by differences in imaging devices, acquisition protocols, and image reconstruction methods. This study introduces the first open-access cone-beam computed tomography (CBCT) phantom dataset specifically designed to test reproducibility in on-board imaging systems used in C-arm linear accelerators for radiotherapy. Using a widely recognized Catphan phantom, CBCT images were acquired from multiple devices across different imaging parameters, including variations in mAs, slice thickness, and reconstruction filters. The dataset includes 120 CBCT volumes with corresponding region of interest (ROI) segmentations and radiomics features enabling comprehensive testing of radiomics feature stability across intra- and inter-vendor comparisons. By providing this open-access dataset, the study aims to facilitate the standardization of CBCT radiomics research, improve feature reproducibility, and support the development of robust radiomics models for clinical applications.
{"title":"RadRepro CBCT: An Open-Access CBCT Phantom Dataset for Improved Standardization and Reproducibility of Radiomics Research.","authors":"Sepideh Hatamikia, Elisabeth Steiner, Eashrat Jahan Muniya, Soraya Elmirad, Arezoo Borji, Gernot Kronreif, Wolfgang Birkfellner, Martin Buschmann","doi":"10.1038/s41597-026-06781-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06781-8","url":null,"abstract":"<p><p>Radiomics, the extraction of quantitative features from medical images, has shown great potential in improving precision diagnosis, prognosis, and treatment planning. However, the reproducibility of radiomics features remains a major challenge due to the variability introduced by differences in imaging devices, acquisition protocols, and image reconstruction methods. This study introduces the first open-access cone-beam computed tomography (CBCT) phantom dataset specifically designed to test reproducibility in on-board imaging systems used in C-arm linear accelerators for radiotherapy. Using a widely recognized Catphan phantom, CBCT images were acquired from multiple devices across different imaging parameters, including variations in mAs, slice thickness, and reconstruction filters. The dataset includes 120 CBCT volumes with corresponding region of interest (ROI) segmentations and radiomics features enabling comprehensive testing of radiomics feature stability across intra- and inter-vendor comparisons. By providing this open-access dataset, the study aims to facilitate the standardization of CBCT radiomics research, improve feature reproducibility, and support the development of robust radiomics models for clinical applications.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study reported the first telomere-to-telomere (T2T) genome assembly of Castanopsis orthacantha, a keystone tree species with significant ecological and economic values endemic to the subtropical evergreen forests of southwestern China. Using multi-platform sequencing data and high-throughput chromosome conformation capture (Hi-C) scaffolding, we successfully generated a chromosome-scale assembly. The final assembly spanned 893.28 Mb, with a contig N50 of 76.19 Mb, indicating a high degree of continuity. Remarkably, 97.94% of the genome was successfully anchored to 12 chromosomes. Terminal telomeric repeat sequences were identified at both ends of all of the chromosomes, and the assembly contained only a single unresolved gap. A total of 35,978 protein-coding genes were detected in the assembly, with an average coding sequence (CDS) length of 1,116.3 bp. Genomic analysis further revealed that repetitive elements comprised 59.28% of the genome. The generation of this near-complete reference genome of C. orthacantha provides a critical genomic resource for advancing evolutionary study within the Fagaceae family and supports conservation genomics strategies aimed at the ecological restoration of this species.
{"title":"A telomere-to-telomere genome assembly of Castanopsis orthacantha (Fagaceae).","authors":"Si Yin, Haibo Wang, Honglong Chu, Yanan Zhang, Changxin Luo, Yanguo Xu, Yong Gao","doi":"10.1038/s41597-026-06787-2","DOIUrl":"https://doi.org/10.1038/s41597-026-06787-2","url":null,"abstract":"<p><p>This study reported the first telomere-to-telomere (T2T) genome assembly of Castanopsis orthacantha, a keystone tree species with significant ecological and economic values endemic to the subtropical evergreen forests of southwestern China. Using multi-platform sequencing data and high-throughput chromosome conformation capture (Hi-C) scaffolding, we successfully generated a chromosome-scale assembly. The final assembly spanned 893.28 Mb, with a contig N50 of 76.19 Mb, indicating a high degree of continuity. Remarkably, 97.94% of the genome was successfully anchored to 12 chromosomes. Terminal telomeric repeat sequences were identified at both ends of all of the chromosomes, and the assembly contained only a single unresolved gap. A total of 35,978 protein-coding genes were detected in the assembly, with an average coding sequence (CDS) length of 1,116.3 bp. Genomic analysis further revealed that repetitive elements comprised 59.28% of the genome. The generation of this near-complete reference genome of C. orthacantha provides a critical genomic resource for advancing evolutionary study within the Fagaceae family and supports conservation genomics strategies aimed at the ecological restoration of this species.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06749-8
Kate Oglethorpe, Joshua Lanham, Rafael S Reiss, Emma J D Boland, Alberto C Naveira Garabato, Colm-Cille P Caulfield, Ali Mashayek
The Arctic Ocean has been changing rapidly in a warming climate. To monitor these changes, it is useful to classify the Arctic Ocean into water masses-bodies of water with similar origin and physical and biogeochemical properties. However, there are significant barriers to Arctic water mass classification: observations of seawater properties are sparse, and traditional classification relies on extensive knowledge of water mass characteristics and circulation. To address these challenges, we compile existing hydrographic observations of the upper 1000 m of the Arctic Ocean and classify these observations into water masses. We present the classification tool and accompanying dataset, Water Masses of the Arctic (WMA), to support basin-wide investigations of Arctic Ocean circulation, its variability, drivers and impacts on wider Arctic climate. Our dataset reproduces key spatial and temporal features of Arctic water masses, including Atlantic and Pacific Water pathways. The WMA dataset will improve understanding of Arctic Ocean dynamics and provide an accessible framework for assessing the accuracy of the representation of the Arctic Ocean in Earth System Models.
{"title":"Water Masses of the Arctic from 40 Years of Hydrographic Observations.","authors":"Kate Oglethorpe, Joshua Lanham, Rafael S Reiss, Emma J D Boland, Alberto C Naveira Garabato, Colm-Cille P Caulfield, Ali Mashayek","doi":"10.1038/s41597-026-06749-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06749-8","url":null,"abstract":"<p><p>The Arctic Ocean has been changing rapidly in a warming climate. To monitor these changes, it is useful to classify the Arctic Ocean into water masses-bodies of water with similar origin and physical and biogeochemical properties. However, there are significant barriers to Arctic water mass classification: observations of seawater properties are sparse, and traditional classification relies on extensive knowledge of water mass characteristics and circulation. To address these challenges, we compile existing hydrographic observations of the upper 1000 m of the Arctic Ocean and classify these observations into water masses. We present the classification tool and accompanying dataset, Water Masses of the Arctic (WMA), to support basin-wide investigations of Arctic Ocean circulation, its variability, drivers and impacts on wider Arctic climate. Our dataset reproduces key spatial and temporal features of Arctic water masses, including Atlantic and Pacific Water pathways. The WMA dataset will improve understanding of Arctic Ocean dynamics and provide an accessible framework for assessing the accuracy of the representation of the Arctic Ocean in Earth System Models.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146198052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06804-4
Setareh Amini, Adrian Huerta, Jörg Franke, Yuri Brugnara, Steven Caluwaerts, Julien Anet, Stevan Savić, Moritz Gubler, Gert-Jan Steeneveld, Lee Chapman, Fred Meier, Vincent Dubreuil, Andreas Christen, Matthias Zeeman, Branislava Lalić, Sebastian Schlögl, Jukka Käyhkö, AmirMasoud Azadfar, Stefan Brönnimann
This study provides a comprehensive dataset (FAIRUrbTemp) that addresses the lack of high-resolution urban air temperature data across Europe. It compiles sub-hourly street-level air temperature data from 811 low-cost to commercial sensors across several European cities and offers data in a quality-controlled, standardized format in sub-hourly, hourly, and daily resolutions. In addition, detailed metadata, as an important source of information in urban studies, is provided at network, station, and measurement levels. This pan-European dataset is rigorously quality-controlled using a serially automatic method applicable to diverse city-scale air temperature data, which identifies systematic and minor inconsistencies to enhance reliability. Expert-based validation shows that the QC reliably identifies problematic measurements, while its performance varies across urban and climatic settings due to local environmental and instrumental effects. To ensure transparency, the results of the quality control are provided to the user together with the original value in the dataset. The validated FAIRUrbTemp is a valuable resource for urban climate studies, with direct applications in validating microclimate models, assessing heat-health risks, and informing climate-adaptive urban planning.
{"title":"Comprehensive compilation and quality assessment of street-level urban air temperature measurements across European networks.","authors":"Setareh Amini, Adrian Huerta, Jörg Franke, Yuri Brugnara, Steven Caluwaerts, Julien Anet, Stevan Savić, Moritz Gubler, Gert-Jan Steeneveld, Lee Chapman, Fred Meier, Vincent Dubreuil, Andreas Christen, Matthias Zeeman, Branislava Lalić, Sebastian Schlögl, Jukka Käyhkö, AmirMasoud Azadfar, Stefan Brönnimann","doi":"10.1038/s41597-026-06804-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06804-4","url":null,"abstract":"<p><p>This study provides a comprehensive dataset (FAIRUrbTemp) that addresses the lack of high-resolution urban air temperature data across Europe. It compiles sub-hourly street-level air temperature data from 811 low-cost to commercial sensors across several European cities and offers data in a quality-controlled, standardized format in sub-hourly, hourly, and daily resolutions. In addition, detailed metadata, as an important source of information in urban studies, is provided at network, station, and measurement levels. This pan-European dataset is rigorously quality-controlled using a serially automatic method applicable to diverse city-scale air temperature data, which identifies systematic and minor inconsistencies to enhance reliability. Expert-based validation shows that the QC reliably identifies problematic measurements, while its performance varies across urban and climatic settings due to local environmental and instrumental effects. To ensure transparency, the results of the quality control are provided to the user together with the original value in the dataset. The validated FAIRUrbTemp is a valuable resource for urban climate studies, with direct applications in validating microclimate models, assessing heat-health risks, and informing climate-adaptive urban planning.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Acrossocheilus wenchowensis is a lukewarm-water fish found in southern Chinese mountain streams, valued for both ornamental and edible purposes. We assembled a near telomere-to-telomere (T2T) genome using HiFi, ONT, Hi-C and Illumina data. The assembly is approximately 870.69 Mb with a contig N50 of about 21.28 Mb. Among these, 14 chromosomes in Hap1 and 15 chromosomes in Hap2 have reached T2T levels. A total of 24,909 protein-coding genes were predicted in Hap1 and 24,496 in Hap2, with BUSCO scores of 97.4% and 97.6%, respectively. A conserved centromeric satellite sequence (262 bp) derived from an LTR transposon was identified. Comparative genomics showed that Acrossocheilus and Onychostoma diverged approximately 13.7 million years ago (Mya), while A. wenchowensis diverged from A. fasciatus about 5.25 Mya. Resequencing of four geographic populations of A. wenchowensis revealed distinct genetic structure in the LY group compared to the other populations based on SNP and InDel analysis. This genome provides a framework for diploid T2T studies in fish and supports further functional genomics research.
{"title":"Near telomere-to-telomere diploid genome assembly of Acrossocheilus wenchowensis.","authors":"Lingzhan Xue, Mingkun Luo, Haoyu Wang, Wenbin Zhu, Duhuang Chen, Gaoxiong Zeng, Mengxiang Liao, Ji Zhao, Bin Wu, Luohao Xu, Zaijie Dong","doi":"10.1038/s41597-026-06752-z","DOIUrl":"https://doi.org/10.1038/s41597-026-06752-z","url":null,"abstract":"<p><p>Acrossocheilus wenchowensis is a lukewarm-water fish found in southern Chinese mountain streams, valued for both ornamental and edible purposes. We assembled a near telomere-to-telomere (T2T) genome using HiFi, ONT, Hi-C and Illumina data. The assembly is approximately 870.69 Mb with a contig N50 of about 21.28 Mb. Among these, 14 chromosomes in Hap1 and 15 chromosomes in Hap2 have reached T2T levels. A total of 24,909 protein-coding genes were predicted in Hap1 and 24,496 in Hap2, with BUSCO scores of 97.4% and 97.6%, respectively. A conserved centromeric satellite sequence (262 bp) derived from an LTR transposon was identified. Comparative genomics showed that Acrossocheilus and Onychostoma diverged approximately 13.7 million years ago (Mya), while A. wenchowensis diverged from A. fasciatus about 5.25 Mya. Resequencing of four geographic populations of A. wenchowensis revealed distinct genetic structure in the LY group compared to the other populations based on SNP and InDel analysis. This genome provides a framework for diploid T2T studies in fish and supports further functional genomics research.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146195460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-025-06419-1
Vignesh Sampath, Andrew S Lee, Samuel David Miller, Noah H Paulson, Yuepeng Zhang, Logan Ward
Electrode is a key component of many energy storage and energy conversion devices such as batteries and fuel cells. Defects in electrodes can significantly influence device performance and reliability and thus need to be monitored and eliminated during the electrode manufacturing process. Advancements in in-line metrology, computer vision, and machine learning have enabled the development of integrated hardware-software systems for automated defect detection and diagnostics. While several manufacturing domains have published defect datasets to support such efforts, publicly available datasets specific to electrode coating processes are not available. To fill this gap and support research on defect detection for automated coating processes, we present CoatingVision, a comprehensive dataset of slot-die coating images with labeled defect types. This dataset supports a diverse range of image recognition tasks, including defect segmentation, defect detection, and multi-label classification. It includes high-resolution images with associated labels for common defects such as surface cracks, delamination cracks, pinholes, and unclassified defects. To facilitate benchmarking and reproducible research, CoatingVision is packaged with an open-source codebase that enables comparative evaluation of AI models and hyperparameter configurations. The dataset has been meticulously curated to ensure high quality and consistency, providing researchers with reliable data for training and evaluating computer vision models. With over 2,200 image samples under various processing conditions, CoatingVision offers a robust foundation for developing automated defect detection systems. It promotes deeper insights into defect formation in coating manufacturing processes, which can be used to advance various coating-related applications including batteries and fuel cells.
{"title":"A Defect Dataset for Electrode Coating Manufacturing.","authors":"Vignesh Sampath, Andrew S Lee, Samuel David Miller, Noah H Paulson, Yuepeng Zhang, Logan Ward","doi":"10.1038/s41597-025-06419-1","DOIUrl":"https://doi.org/10.1038/s41597-025-06419-1","url":null,"abstract":"<p><p>Electrode is a key component of many energy storage and energy conversion devices such as batteries and fuel cells. Defects in electrodes can significantly influence device performance and reliability and thus need to be monitored and eliminated during the electrode manufacturing process. Advancements in in-line metrology, computer vision, and machine learning have enabled the development of integrated hardware-software systems for automated defect detection and diagnostics. While several manufacturing domains have published defect datasets to support such efforts, publicly available datasets specific to electrode coating processes are not available. To fill this gap and support research on defect detection for automated coating processes, we present CoatingVision, a comprehensive dataset of slot-die coating images with labeled defect types. This dataset supports a diverse range of image recognition tasks, including defect segmentation, defect detection, and multi-label classification. It includes high-resolution images with associated labels for common defects such as surface cracks, delamination cracks, pinholes, and unclassified defects. To facilitate benchmarking and reproducible research, CoatingVision is packaged with an open-source codebase that enables comparative evaluation of AI models and hyperparameter configurations. The dataset has been meticulously curated to ensure high quality and consistency, providing researchers with reliable data for training and evaluating computer vision models. With over 2,200 image samples under various processing conditions, CoatingVision offers a robust foundation for developing automated defect detection systems. It promotes deeper insights into defect formation in coating manufacturing processes, which can be used to advance various coating-related applications including batteries and fuel cells.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06877-1
Wenbo Yu, Jun Yang, Yuyu Zhou, Xiangming Xiao
Continuing global warming and urbanization have increased the frequency and severity of extreme heat events in cities. Therefore, understanding how the urban heat island (UHI) effect influences cities is essential for developing effective mitigation and prevention strategies. A 1-km resolution dataset was constructed to assess heat-wave exposure attributable to UHIs in urban human settlements worldwide from 2003 to 2020. An adaptive urban-rural threshold method was employed to delineate the spatial extent of UHI impacts, and a spatiotemporally fitted MODIS surface temperature dataset was used to address missing data caused by cloud contamination. This dataset explicitly separates the contributions of background climate, local landscape characteristics, and urbanization to heat wave exposure, providing a scientific basis for identifying key UHI mitigation areas and developing heat wave risk early warning models that account for UHI effects. The proposed methodology and dataset support synergistic decision-making for integrating urban climate adaptation with sustainable development, and the technical framework can be extended to studies of UHIs and heat wave exposure in other regions worldwide.
{"title":"Global dataset on heat wave exposure due to the urban heat island effect.","authors":"Wenbo Yu, Jun Yang, Yuyu Zhou, Xiangming Xiao","doi":"10.1038/s41597-026-06877-1","DOIUrl":"https://doi.org/10.1038/s41597-026-06877-1","url":null,"abstract":"<p><p>Continuing global warming and urbanization have increased the frequency and severity of extreme heat events in cities. Therefore, understanding how the urban heat island (UHI) effect influences cities is essential for developing effective mitigation and prevention strategies. A 1-km resolution dataset was constructed to assess heat-wave exposure attributable to UHIs in urban human settlements worldwide from 2003 to 2020. An adaptive urban-rural threshold method was employed to delineate the spatial extent of UHI impacts, and a spatiotemporally fitted MODIS surface temperature dataset was used to address missing data caused by cloud contamination. This dataset explicitly separates the contributions of background climate, local landscape characteristics, and urbanization to heat wave exposure, providing a scientific basis for identifying key UHI mitigation areas and developing heat wave risk early warning models that account for UHI effects. The proposed methodology and dataset support synergistic decision-making for integrating urban climate adaptation with sustainable development, and the technical framework can be extended to studies of UHIs and heat wave exposure in other regions worldwide.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}