Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06803-5
Alexandra Medvedeva, Nikolay Syrov, Lev Yakovlev, Yana Alieva, Artemiy Berkmush-Antipova, Galina Ivanova, Natalia Shusharina, Alexander Kaplan
Accurate diagnosis and monitoring of recovery after stroke are critical for effective motor rehabilitation. As stroke is inherently associated with impaired cerebral blood flow, functional near-infrared spectroscopy (fNIRS) provides a valuable tool for assessing hemodynamic changes in the brain. When combined with electroencephalography (EEG), this multimodal approach can provide complementary insights into neural and vascular responses during recovery. However, longitudinal datasets combining fNIRS and EEG in stroke populations remain limited. The current article presents an open access dataset with simultaneous fNIRS and EEG recordings from 16 post-stroke patients over 84 rehabilitation sessions. Participants performed motor tasks with both paretic and intact hands. The dataset includes raw and processed signals, clinical scores (ARAT, Fugl-Meyer) and patient demographics. This resource supports research into stroke recovery, development of neurorehabilitation strategies and fNIRS-based brain computer interfaces (BCI).
{"title":"Multisession fNIRS-EEG data of Post-Stroke Motor Recovery. Recordings During Intact and Paretic Hand Movements.","authors":"Alexandra Medvedeva, Nikolay Syrov, Lev Yakovlev, Yana Alieva, Artemiy Berkmush-Antipova, Galina Ivanova, Natalia Shusharina, Alexander Kaplan","doi":"10.1038/s41597-026-06803-5","DOIUrl":"https://doi.org/10.1038/s41597-026-06803-5","url":null,"abstract":"<p><p>Accurate diagnosis and monitoring of recovery after stroke are critical for effective motor rehabilitation. As stroke is inherently associated with impaired cerebral blood flow, functional near-infrared spectroscopy (fNIRS) provides a valuable tool for assessing hemodynamic changes in the brain. When combined with electroencephalography (EEG), this multimodal approach can provide complementary insights into neural and vascular responses during recovery. However, longitudinal datasets combining fNIRS and EEG in stroke populations remain limited. The current article presents an open access dataset with simultaneous fNIRS and EEG recordings from 16 post-stroke patients over 84 rehabilitation sessions. Participants performed motor tasks with both paretic and intact hands. The dataset includes raw and processed signals, clinical scores (ARAT, Fugl-Meyer) and patient demographics. This resource supports research into stroke recovery, development of neurorehabilitation strategies and fNIRS-based brain computer interfaces (BCI).</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146195430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06855-7
Lingfeng Zha, Chengbo Fu, Xue Sha, Peijun Yin, Yanze Li
We present a longitudinal electronic health record (EHR) dataset from Wuhan Union Hospital, compiled from two distinct hospital information systems. The first dataset, derived from a legacy system, includes 35,243 patients and covers the period from 2010 to 2020. The second dataset, collected via the research-oriented YIDUYUN system, includes 37,975 patients and spans from 2011 to 2024. Both datasets provide structured and de-identified clinical information, including medical record number, demographics, diagnoses, admissions, discharges, timestamps record, laboratory test results (including COVID-19 test records) and patients' residential region. Using the patients' residential regions, we combined the data with information from the China Statistical Yearbook to collect regional socioeconomic indices. While not specifically designed for pandemic research, the dataset captures both pre-pandemic and post-pandemic periods with de-identified exact timestamps, making it suitable for analyzing long-term healthcare utilization, population behavior, and policy impacts. With comprehensive metadata and rigorous validation, this resource supports a wide range of applications in longitudinal health system research and data-driven modeling.
{"title":"CardioEHR: A longitudinal electronic health record dataset of cardiovascular patients from central China.","authors":"Lingfeng Zha, Chengbo Fu, Xue Sha, Peijun Yin, Yanze Li","doi":"10.1038/s41597-026-06855-7","DOIUrl":"https://doi.org/10.1038/s41597-026-06855-7","url":null,"abstract":"<p><p>We present a longitudinal electronic health record (EHR) dataset from Wuhan Union Hospital, compiled from two distinct hospital information systems. The first dataset, derived from a legacy system, includes 35,243 patients and covers the period from 2010 to 2020. The second dataset, collected via the research-oriented YIDUYUN system, includes 37,975 patients and spans from 2011 to 2024. Both datasets provide structured and de-identified clinical information, including medical record number, demographics, diagnoses, admissions, discharges, timestamps record, laboratory test results (including COVID-19 test records) and patients' residential region. Using the patients' residential regions, we combined the data with information from the China Statistical Yearbook to collect regional socioeconomic indices. While not specifically designed for pandemic research, the dataset captures both pre-pandemic and post-pandemic periods with de-identified exact timestamps, making it suitable for analyzing long-term healthcare utilization, population behavior, and policy impacts. With comprehensive metadata and rigorous validation, this resource supports a wide range of applications in longitudinal health system research and data-driven modeling.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06781-8
Sepideh Hatamikia, Elisabeth Steiner, Eashrat Jahan Muniya, Soraya Elmirad, Arezoo Borji, Gernot Kronreif, Wolfgang Birkfellner, Martin Buschmann
Radiomics, the extraction of quantitative features from medical images, has shown great potential in improving precision diagnosis, prognosis, and treatment planning. However, the reproducibility of radiomics features remains a major challenge due to the variability introduced by differences in imaging devices, acquisition protocols, and image reconstruction methods. This study introduces the first open-access cone-beam computed tomography (CBCT) phantom dataset specifically designed to test reproducibility in on-board imaging systems used in C-arm linear accelerators for radiotherapy. Using a widely recognized Catphan phantom, CBCT images were acquired from multiple devices across different imaging parameters, including variations in mAs, slice thickness, and reconstruction filters. The dataset includes 120 CBCT volumes with corresponding region of interest (ROI) segmentations and radiomics features enabling comprehensive testing of radiomics feature stability across intra- and inter-vendor comparisons. By providing this open-access dataset, the study aims to facilitate the standardization of CBCT radiomics research, improve feature reproducibility, and support the development of robust radiomics models for clinical applications.
{"title":"RadRepro CBCT: An Open-Access CBCT Phantom Dataset for Improved Standardization and Reproducibility of Radiomics Research.","authors":"Sepideh Hatamikia, Elisabeth Steiner, Eashrat Jahan Muniya, Soraya Elmirad, Arezoo Borji, Gernot Kronreif, Wolfgang Birkfellner, Martin Buschmann","doi":"10.1038/s41597-026-06781-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06781-8","url":null,"abstract":"<p><p>Radiomics, the extraction of quantitative features from medical images, has shown great potential in improving precision diagnosis, prognosis, and treatment planning. However, the reproducibility of radiomics features remains a major challenge due to the variability introduced by differences in imaging devices, acquisition protocols, and image reconstruction methods. This study introduces the first open-access cone-beam computed tomography (CBCT) phantom dataset specifically designed to test reproducibility in on-board imaging systems used in C-arm linear accelerators for radiotherapy. Using a widely recognized Catphan phantom, CBCT images were acquired from multiple devices across different imaging parameters, including variations in mAs, slice thickness, and reconstruction filters. The dataset includes 120 CBCT volumes with corresponding region of interest (ROI) segmentations and radiomics features enabling comprehensive testing of radiomics feature stability across intra- and inter-vendor comparisons. By providing this open-access dataset, the study aims to facilitate the standardization of CBCT radiomics research, improve feature reproducibility, and support the development of robust radiomics models for clinical applications.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study reported the first telomere-to-telomere (T2T) genome assembly of Castanopsis orthacantha, a keystone tree species with significant ecological and economic values endemic to the subtropical evergreen forests of southwestern China. Using multi-platform sequencing data and high-throughput chromosome conformation capture (Hi-C) scaffolding, we successfully generated a chromosome-scale assembly. The final assembly spanned 893.28 Mb, with a contig N50 of 76.19 Mb, indicating a high degree of continuity. Remarkably, 97.94% of the genome was successfully anchored to 12 chromosomes. Terminal telomeric repeat sequences were identified at both ends of all of the chromosomes, and the assembly contained only a single unresolved gap. A total of 35,978 protein-coding genes were detected in the assembly, with an average coding sequence (CDS) length of 1,116.3 bp. Genomic analysis further revealed that repetitive elements comprised 59.28% of the genome. The generation of this near-complete reference genome of C. orthacantha provides a critical genomic resource for advancing evolutionary study within the Fagaceae family and supports conservation genomics strategies aimed at the ecological restoration of this species.
{"title":"A telomere-to-telomere genome assembly of Castanopsis orthacantha (Fagaceae).","authors":"Si Yin, Haibo Wang, Honglong Chu, Yanan Zhang, Changxin Luo, Yanguo Xu, Yong Gao","doi":"10.1038/s41597-026-06787-2","DOIUrl":"https://doi.org/10.1038/s41597-026-06787-2","url":null,"abstract":"<p><p>This study reported the first telomere-to-telomere (T2T) genome assembly of Castanopsis orthacantha, a keystone tree species with significant ecological and economic values endemic to the subtropical evergreen forests of southwestern China. Using multi-platform sequencing data and high-throughput chromosome conformation capture (Hi-C) scaffolding, we successfully generated a chromosome-scale assembly. The final assembly spanned 893.28 Mb, with a contig N50 of 76.19 Mb, indicating a high degree of continuity. Remarkably, 97.94% of the genome was successfully anchored to 12 chromosomes. Terminal telomeric repeat sequences were identified at both ends of all of the chromosomes, and the assembly contained only a single unresolved gap. A total of 35,978 protein-coding genes were detected in the assembly, with an average coding sequence (CDS) length of 1,116.3 bp. Genomic analysis further revealed that repetitive elements comprised 59.28% of the genome. The generation of this near-complete reference genome of C. orthacantha provides a critical genomic resource for advancing evolutionary study within the Fagaceae family and supports conservation genomics strategies aimed at the ecological restoration of this species.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06749-8
Kate Oglethorpe, Joshua Lanham, Rafael S Reiss, Emma J D Boland, Alberto C Naveira Garabato, Colm-Cille P Caulfield, Ali Mashayek
The Arctic Ocean has been changing rapidly in a warming climate. To monitor these changes, it is useful to classify the Arctic Ocean into water masses-bodies of water with similar origin and physical and biogeochemical properties. However, there are significant barriers to Arctic water mass classification: observations of seawater properties are sparse, and traditional classification relies on extensive knowledge of water mass characteristics and circulation. To address these challenges, we compile existing hydrographic observations of the upper 1000 m of the Arctic Ocean and classify these observations into water masses. We present the classification tool and accompanying dataset, Water Masses of the Arctic (WMA), to support basin-wide investigations of Arctic Ocean circulation, its variability, drivers and impacts on wider Arctic climate. Our dataset reproduces key spatial and temporal features of Arctic water masses, including Atlantic and Pacific Water pathways. The WMA dataset will improve understanding of Arctic Ocean dynamics and provide an accessible framework for assessing the accuracy of the representation of the Arctic Ocean in Earth System Models.
{"title":"Water Masses of the Arctic from 40 Years of Hydrographic Observations.","authors":"Kate Oglethorpe, Joshua Lanham, Rafael S Reiss, Emma J D Boland, Alberto C Naveira Garabato, Colm-Cille P Caulfield, Ali Mashayek","doi":"10.1038/s41597-026-06749-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06749-8","url":null,"abstract":"<p><p>The Arctic Ocean has been changing rapidly in a warming climate. To monitor these changes, it is useful to classify the Arctic Ocean into water masses-bodies of water with similar origin and physical and biogeochemical properties. However, there are significant barriers to Arctic water mass classification: observations of seawater properties are sparse, and traditional classification relies on extensive knowledge of water mass characteristics and circulation. To address these challenges, we compile existing hydrographic observations of the upper 1000 m of the Arctic Ocean and classify these observations into water masses. We present the classification tool and accompanying dataset, Water Masses of the Arctic (WMA), to support basin-wide investigations of Arctic Ocean circulation, its variability, drivers and impacts on wider Arctic climate. Our dataset reproduces key spatial and temporal features of Arctic water masses, including Atlantic and Pacific Water pathways. The WMA dataset will improve understanding of Arctic Ocean dynamics and provide an accessible framework for assessing the accuracy of the representation of the Arctic Ocean in Earth System Models.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146198052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06804-4
Setareh Amini, Adrian Huerta, Jörg Franke, Yuri Brugnara, Steven Caluwaerts, Julien Anet, Stevan Savić, Moritz Gubler, Gert-Jan Steeneveld, Lee Chapman, Fred Meier, Vincent Dubreuil, Andreas Christen, Matthias Zeeman, Branislava Lalić, Sebastian Schlögl, Jukka Käyhkö, AmirMasoud Azadfar, Stefan Brönnimann
This study provides a comprehensive dataset (FAIRUrbTemp) that addresses the lack of high-resolution urban air temperature data across Europe. It compiles sub-hourly street-level air temperature data from 811 low-cost to commercial sensors across several European cities and offers data in a quality-controlled, standardized format in sub-hourly, hourly, and daily resolutions. In addition, detailed metadata, as an important source of information in urban studies, is provided at network, station, and measurement levels. This pan-European dataset is rigorously quality-controlled using a serially automatic method applicable to diverse city-scale air temperature data, which identifies systematic and minor inconsistencies to enhance reliability. Expert-based validation shows that the QC reliably identifies problematic measurements, while its performance varies across urban and climatic settings due to local environmental and instrumental effects. To ensure transparency, the results of the quality control are provided to the user together with the original value in the dataset. The validated FAIRUrbTemp is a valuable resource for urban climate studies, with direct applications in validating microclimate models, assessing heat-health risks, and informing climate-adaptive urban planning.
{"title":"Comprehensive compilation and quality assessment of street-level urban air temperature measurements across European networks.","authors":"Setareh Amini, Adrian Huerta, Jörg Franke, Yuri Brugnara, Steven Caluwaerts, Julien Anet, Stevan Savić, Moritz Gubler, Gert-Jan Steeneveld, Lee Chapman, Fred Meier, Vincent Dubreuil, Andreas Christen, Matthias Zeeman, Branislava Lalić, Sebastian Schlögl, Jukka Käyhkö, AmirMasoud Azadfar, Stefan Brönnimann","doi":"10.1038/s41597-026-06804-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06804-4","url":null,"abstract":"<p><p>This study provides a comprehensive dataset (FAIRUrbTemp) that addresses the lack of high-resolution urban air temperature data across Europe. It compiles sub-hourly street-level air temperature data from 811 low-cost to commercial sensors across several European cities and offers data in a quality-controlled, standardized format in sub-hourly, hourly, and daily resolutions. In addition, detailed metadata, as an important source of information in urban studies, is provided at network, station, and measurement levels. This pan-European dataset is rigorously quality-controlled using a serially automatic method applicable to diverse city-scale air temperature data, which identifies systematic and minor inconsistencies to enhance reliability. Expert-based validation shows that the QC reliably identifies problematic measurements, while its performance varies across urban and climatic settings due to local environmental and instrumental effects. To ensure transparency, the results of the quality control are provided to the user together with the original value in the dataset. The validated FAIRUrbTemp is a valuable resource for urban climate studies, with direct applications in validating microclimate models, assessing heat-health risks, and informing climate-adaptive urban planning.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Acrossocheilus wenchowensis is a lukewarm-water fish found in southern Chinese mountain streams, valued for both ornamental and edible purposes. We assembled a near telomere-to-telomere (T2T) genome using HiFi, ONT, Hi-C and Illumina data. The assembly is approximately 870.69 Mb with a contig N50 of about 21.28 Mb. Among these, 14 chromosomes in Hap1 and 15 chromosomes in Hap2 have reached T2T levels. A total of 24,909 protein-coding genes were predicted in Hap1 and 24,496 in Hap2, with BUSCO scores of 97.4% and 97.6%, respectively. A conserved centromeric satellite sequence (262 bp) derived from an LTR transposon was identified. Comparative genomics showed that Acrossocheilus and Onychostoma diverged approximately 13.7 million years ago (Mya), while A. wenchowensis diverged from A. fasciatus about 5.25 Mya. Resequencing of four geographic populations of A. wenchowensis revealed distinct genetic structure in the LY group compared to the other populations based on SNP and InDel analysis. This genome provides a framework for diploid T2T studies in fish and supports further functional genomics research.
{"title":"Near telomere-to-telomere diploid genome assembly of Acrossocheilus wenchowensis.","authors":"Lingzhan Xue, Mingkun Luo, Haoyu Wang, Wenbin Zhu, Duhuang Chen, Gaoxiong Zeng, Mengxiang Liao, Ji Zhao, Bin Wu, Luohao Xu, Zaijie Dong","doi":"10.1038/s41597-026-06752-z","DOIUrl":"https://doi.org/10.1038/s41597-026-06752-z","url":null,"abstract":"<p><p>Acrossocheilus wenchowensis is a lukewarm-water fish found in southern Chinese mountain streams, valued for both ornamental and edible purposes. We assembled a near telomere-to-telomere (T2T) genome using HiFi, ONT, Hi-C and Illumina data. The assembly is approximately 870.69 Mb with a contig N50 of about 21.28 Mb. Among these, 14 chromosomes in Hap1 and 15 chromosomes in Hap2 have reached T2T levels. A total of 24,909 protein-coding genes were predicted in Hap1 and 24,496 in Hap2, with BUSCO scores of 97.4% and 97.6%, respectively. A conserved centromeric satellite sequence (262 bp) derived from an LTR transposon was identified. Comparative genomics showed that Acrossocheilus and Onychostoma diverged approximately 13.7 million years ago (Mya), while A. wenchowensis diverged from A. fasciatus about 5.25 Mya. Resequencing of four geographic populations of A. wenchowensis revealed distinct genetic structure in the LY group compared to the other populations based on SNP and InDel analysis. This genome provides a framework for diploid T2T studies in fish and supports further functional genomics research.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146195460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-025-06419-1
Vignesh Sampath, Andrew S Lee, Samuel David Miller, Noah H Paulson, Yuepeng Zhang, Logan Ward
Electrode is a key component of many energy storage and energy conversion devices such as batteries and fuel cells. Defects in electrodes can significantly influence device performance and reliability and thus need to be monitored and eliminated during the electrode manufacturing process. Advancements in in-line metrology, computer vision, and machine learning have enabled the development of integrated hardware-software systems for automated defect detection and diagnostics. While several manufacturing domains have published defect datasets to support such efforts, publicly available datasets specific to electrode coating processes are not available. To fill this gap and support research on defect detection for automated coating processes, we present CoatingVision, a comprehensive dataset of slot-die coating images with labeled defect types. This dataset supports a diverse range of image recognition tasks, including defect segmentation, defect detection, and multi-label classification. It includes high-resolution images with associated labels for common defects such as surface cracks, delamination cracks, pinholes, and unclassified defects. To facilitate benchmarking and reproducible research, CoatingVision is packaged with an open-source codebase that enables comparative evaluation of AI models and hyperparameter configurations. The dataset has been meticulously curated to ensure high quality and consistency, providing researchers with reliable data for training and evaluating computer vision models. With over 2,200 image samples under various processing conditions, CoatingVision offers a robust foundation for developing automated defect detection systems. It promotes deeper insights into defect formation in coating manufacturing processes, which can be used to advance various coating-related applications including batteries and fuel cells.
{"title":"A Defect Dataset for Electrode Coating Manufacturing.","authors":"Vignesh Sampath, Andrew S Lee, Samuel David Miller, Noah H Paulson, Yuepeng Zhang, Logan Ward","doi":"10.1038/s41597-025-06419-1","DOIUrl":"https://doi.org/10.1038/s41597-025-06419-1","url":null,"abstract":"<p><p>Electrode is a key component of many energy storage and energy conversion devices such as batteries and fuel cells. Defects in electrodes can significantly influence device performance and reliability and thus need to be monitored and eliminated during the electrode manufacturing process. Advancements in in-line metrology, computer vision, and machine learning have enabled the development of integrated hardware-software systems for automated defect detection and diagnostics. While several manufacturing domains have published defect datasets to support such efforts, publicly available datasets specific to electrode coating processes are not available. To fill this gap and support research on defect detection for automated coating processes, we present CoatingVision, a comprehensive dataset of slot-die coating images with labeled defect types. This dataset supports a diverse range of image recognition tasks, including defect segmentation, defect detection, and multi-label classification. It includes high-resolution images with associated labels for common defects such as surface cracks, delamination cracks, pinholes, and unclassified defects. To facilitate benchmarking and reproducible research, CoatingVision is packaged with an open-source codebase that enables comparative evaluation of AI models and hyperparameter configurations. The dataset has been meticulously curated to ensure high quality and consistency, providing researchers with reliable data for training and evaluating computer vision models. With over 2,200 image samples under various processing conditions, CoatingVision offers a robust foundation for developing automated defect detection systems. It promotes deeper insights into defect formation in coating manufacturing processes, which can be used to advance various coating-related applications including batteries and fuel cells.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-14DOI: 10.1038/s41597-026-06877-1
Wenbo Yu, Jun Yang, Yuyu Zhou, Xiangming Xiao
Continuing global warming and urbanization have increased the frequency and severity of extreme heat events in cities. Therefore, understanding how the urban heat island (UHI) effect influences cities is essential for developing effective mitigation and prevention strategies. A 1-km resolution dataset was constructed to assess heat-wave exposure attributable to UHIs in urban human settlements worldwide from 2003 to 2020. An adaptive urban-rural threshold method was employed to delineate the spatial extent of UHI impacts, and a spatiotemporally fitted MODIS surface temperature dataset was used to address missing data caused by cloud contamination. This dataset explicitly separates the contributions of background climate, local landscape characteristics, and urbanization to heat wave exposure, providing a scientific basis for identifying key UHI mitigation areas and developing heat wave risk early warning models that account for UHI effects. The proposed methodology and dataset support synergistic decision-making for integrating urban climate adaptation with sustainable development, and the technical framework can be extended to studies of UHIs and heat wave exposure in other regions worldwide.
{"title":"Global dataset on heat wave exposure due to the urban heat island effect.","authors":"Wenbo Yu, Jun Yang, Yuyu Zhou, Xiangming Xiao","doi":"10.1038/s41597-026-06877-1","DOIUrl":"https://doi.org/10.1038/s41597-026-06877-1","url":null,"abstract":"<p><p>Continuing global warming and urbanization have increased the frequency and severity of extreme heat events in cities. Therefore, understanding how the urban heat island (UHI) effect influences cities is essential for developing effective mitigation and prevention strategies. A 1-km resolution dataset was constructed to assess heat-wave exposure attributable to UHIs in urban human settlements worldwide from 2003 to 2020. An adaptive urban-rural threshold method was employed to delineate the spatial extent of UHI impacts, and a spatiotemporally fitted MODIS surface temperature dataset was used to address missing data caused by cloud contamination. This dataset explicitly separates the contributions of background climate, local landscape characteristics, and urbanization to heat wave exposure, providing a scientific basis for identifying key UHI mitigation areas and developing heat wave risk early warning models that account for UHI effects. The proposed methodology and dataset support synergistic decision-making for integrating urban climate adaptation with sustainable development, and the technical framework can be extended to studies of UHIs and heat wave exposure in other regions worldwide.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The reed vole (Microtus fortis) is an important rodent model for studying unique biological traits, such as its natural resistance to Schistosoma japonicum. To facilitate the genetic study of these phenotypes, we have produced the first high-quality, chromosome-level genome assembly for this species. The genome was assembled using PacBio HiFi long-read sequencing and scaffolded to the chromosome level with Hi-C data. The final 2.29 Gb assembly exhibits excellent continuity (contig N50 = 68.89 Mb; scaffold N50 = 91.23 Mb), with 97.7% of the sequence anchored into 26 pseudomolecules, consistent with the species' karyotype. Genome completeness was estimated at 96.3% via BUSCO analysis (glires_odb10). The annotation includes 23,678 protein-coding genes, with 97.5% assigned a putative function. This publicly available, high-quality genomic resource will be invaluable for future research, providing the necessary foundation to explore the genetic mechanisms behind the unique adaptations of M. fortis, including its innate immunity, digestive physiology, and disease models. The assembly will also serve as a key reference for comparative genomics, enriching our understanding of rodent evolution.
{"title":"Assembling a chromosome-level genome for the Microtus fortis using PacBio HiFi and Hi-C technologies.","authors":"Du Zhang, Qi Hu, Tianqiong He, Junkang Zhou, Yixin Wen, Qian Liu, Jing Zhang, Wenlin Zhi, Lingxuan Ouyang, Suisui Gao, Ruotong Guan, Zhijun Zhou","doi":"10.1038/s41597-026-06813-3","DOIUrl":"https://doi.org/10.1038/s41597-026-06813-3","url":null,"abstract":"<p><p>The reed vole (Microtus fortis) is an important rodent model for studying unique biological traits, such as its natural resistance to Schistosoma japonicum. To facilitate the genetic study of these phenotypes, we have produced the first high-quality, chromosome-level genome assembly for this species. The genome was assembled using PacBio HiFi long-read sequencing and scaffolded to the chromosome level with Hi-C data. The final 2.29 Gb assembly exhibits excellent continuity (contig N50 = 68.89 Mb; scaffold N50 = 91.23 Mb), with 97.7% of the sequence anchored into 26 pseudomolecules, consistent with the species' karyotype. Genome completeness was estimated at 96.3% via BUSCO analysis (glires_odb10). The annotation includes 23,678 protein-coding genes, with 97.5% assigned a putative function. This publicly available, high-quality genomic resource will be invaluable for future research, providing the necessary foundation to explore the genetic mechanisms behind the unique adaptations of M. fortis, including its innate immunity, digestive physiology, and disease models. The assembly will also serve as a key reference for comparative genomics, enriching our understanding of rodent evolution.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146195484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}