Pub Date : 2026-03-20DOI: 10.1038/s41597-026-06951-8
Yuxuan Ouyang, Lin Yang, Binbin Cheng, Chang Xiao, Weiwei Zhang
The Blue-crowned Laughingthrush (Pterorhinus courtoisi) is a critically endangered species and listed as National First-class Protected Wildlife in China, with a small population size and highly restricted geographic distribution in Jiangxi Province. However, the genetic mechanisms underlying its endangered status remain unclear. In this study, we constructed a chromosome-level reference genome by integrating Illumina short-read, PacBio long-read, and Hi-C chromatin interaction data. The final assembled genome spans 1.255 Gb, with 1.158 Gb (92.32%) of the sequences anchored onto 39 pseudochromosomes. A total of 16,807 protein-coding genes were predicted, among which 15,574 genes (92.7%) were functionally annotated. This high-quality genome assembly provides a valuable genomic resource for future genetic studies and conservation efforts for the Blue-crowned Laughingthrush.
{"title":"Chromosome-scale Genome Assembly of the Critically Endangered Blue-crowned Laughingthrush (Pterorhinus courtoisi, Leiothrichidae).","authors":"Yuxuan Ouyang, Lin Yang, Binbin Cheng, Chang Xiao, Weiwei Zhang","doi":"10.1038/s41597-026-06951-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06951-8","url":null,"abstract":"<p><p>The Blue-crowned Laughingthrush (Pterorhinus courtoisi) is a critically endangered species and listed as National First-class Protected Wildlife in China, with a small population size and highly restricted geographic distribution in Jiangxi Province. However, the genetic mechanisms underlying its endangered status remain unclear. In this study, we constructed a chromosome-level reference genome by integrating Illumina short-read, PacBio long-read, and Hi-C chromatin interaction data. The final assembled genome spans 1.255 Gb, with 1.158 Gb (92.32%) of the sequences anchored onto 39 pseudochromosomes. A total of 16,807 protein-coding genes were predicted, among which 15,574 genes (92.7%) were functionally annotated. This high-quality genome assembly provides a valuable genomic resource for future genetic studies and conservation efforts for the Blue-crowned Laughingthrush.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-20DOI: 10.1038/s41597-026-06975-0
Avalon S Moore, Bridget Vitu, Felicia Fraizer-Bisner, Peter J Williams, Lucy van der Merwe, Abdelrhman Gouda, Dessislava Kirilova, Christopher Pittenger, Helen Pushkarskaya
National COVID-19 response plans in the United States recognized that the primary responsibility for addressing domestic health emergencies lay with states and localities, though each state's pandemic response authority varied. States utilized a range of tools to manage infectious-disease outbreaks, including vaccination rules, incentives, and communication strategies. This database includes online publications from state governors and departments of health across all 50 U.S. states and the District of Columbia. It spans from December 2020, when Phase 1a of the COVID-19 vaccination allocation began, to September 2021, when vaccines were widely available and often mandated. In total, 5,223 unique publications were collected, each classified by type: Flyer, Milestone, Info, and Policy. We also address key considerations for analyzing this data and suggest potential research questions that can be explored with it.
{"title":"COVID Diaries, State Response to COVID Vaccination Program, December 2020 to September 2021.","authors":"Avalon S Moore, Bridget Vitu, Felicia Fraizer-Bisner, Peter J Williams, Lucy van der Merwe, Abdelrhman Gouda, Dessislava Kirilova, Christopher Pittenger, Helen Pushkarskaya","doi":"10.1038/s41597-026-06975-0","DOIUrl":"https://doi.org/10.1038/s41597-026-06975-0","url":null,"abstract":"<p><p>National COVID-19 response plans in the United States recognized that the primary responsibility for addressing domestic health emergencies lay with states and localities, though each state's pandemic response authority varied. States utilized a range of tools to manage infectious-disease outbreaks, including vaccination rules, incentives, and communication strategies. This database includes online publications from state governors and departments of health across all 50 U.S. states and the District of Columbia. It spans from December 2020, when Phase 1a of the COVID-19 vaccination allocation began, to September 2021, when vaccines were widely available and often mandated. In total, 5,223 unique publications were collected, each classified by type: Flyer, Milestone, Info, and Policy. We also address key considerations for analyzing this data and suggest potential research questions that can be explored with it.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-20DOI: 10.1038/s41597-025-06510-7
Lu Song, Zhihao He, Yinghao Pan, Haijun Yue
Cities worldwide are rapidly adopting smart governance strategies to address complex urban challenges, yet systematic measurement of their effectiveness remains limited. This study develops and applies a comprehensive Smart Governance Index (SGI) to evaluate governance transformation across 296 Chinese cities from 2017 to 2023. Our framework integrates three critical dimensions of urban governance: the Value Objectives Sub-index (VOS) that establishes normative goals and strategic priorities; the System Applications Sub-index (SAS) that delivers governance services through operational platforms; and the Institutional-Technical Support Sub-index (ITSS) that provides the underlying infrastructure and organizational capacity. This multidimensional assessment reveals substantial heterogeneity in smart governance adoption and effectiveness across Chinese cities, with global implications for urban policy design. The initial version of the dataset includes SGI and its sub-indices for 296 Chinese cities from 2017 to 2023, with annual updates planned. The spatiotemporal patterns identified demonstrate how cities at different development stages can optimize their governance pathways, offering insights for achieving sustainable urban transformation in diverse contexts.
{"title":"A dataset of the smart governance index for Chinese cities.","authors":"Lu Song, Zhihao He, Yinghao Pan, Haijun Yue","doi":"10.1038/s41597-025-06510-7","DOIUrl":"https://doi.org/10.1038/s41597-025-06510-7","url":null,"abstract":"<p><p>Cities worldwide are rapidly adopting smart governance strategies to address complex urban challenges, yet systematic measurement of their effectiveness remains limited. This study develops and applies a comprehensive Smart Governance Index (SGI) to evaluate governance transformation across 296 Chinese cities from 2017 to 2023. Our framework integrates three critical dimensions of urban governance: the Value Objectives Sub-index (VOS) that establishes normative goals and strategic priorities; the System Applications Sub-index (SAS) that delivers governance services through operational platforms; and the Institutional-Technical Support Sub-index (ITSS) that provides the underlying infrastructure and organizational capacity. This multidimensional assessment reveals substantial heterogeneity in smart governance adoption and effectiveness across Chinese cities, with global implications for urban policy design. The initial version of the dataset includes SGI and its sub-indices for 296 Chinese cities from 2017 to 2023, with annual updates planned. The spatiotemporal patterns identified demonstrate how cities at different development stages can optimize their governance pathways, offering insights for achieving sustainable urban transformation in diverse contexts.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Intraoperative cardiac arrhythmias present distinct characteristics compared to non-surgical environments, yet publicly available electrocardiogram (ECG) databases have primarily focused on ambulatory or intensive care environments. To address this gap, we present the VitalDB Arrhythmia Database, a comprehensive collection of intraoperative ECG recordings with beat and rhythm labels specifically designed for developing and validating arrhythmia detection algorithms in surgical patients. The database comprises 734,528 seconds of continuous ECG data from 482 surgical patients, with a median annotated recording duration of 20 minutes. It contains over 660,000 annotated heartbeats across four beat types and 10 distinct rhythm categories. To efficiently process the extensive source data, we developed a custom deep learning beat classifier that serves as an automated screening tool for arrhythmia candidate segments. All annotations underwent rigorous validation by five anesthesiologists, with each segment independently reviewed by at least two anesthesiologists, and 9.3% required full committee consensus. Inter-rater reliability analysis demonstrated excellent agreement with an overall Cohen's kappa of 0.930 ± 0.130. This publicly accessible resource provides the research community with clinically validated intraoperative arrhythmia data, facilitating the development of robust arrhythmia detection algorithms and enabling multimodal analysis to investigate the hemodynamic impact of intraoperative arrhythmias.
{"title":"VitalDB Arrhythmia Database: An Anesthesiologist-Validated Large-scale Intraoperative Arrhythmia Dataset with Beat and Rhythm Labels.","authors":"Da-In Eun, Kayoung Shim, Hyunsoo Lee, Yeji Lim, Hanbyeol Lim, Hyeonhoon Lee, Jiwon Lee, Hyung-Chul Lee","doi":"10.1038/s41597-026-07076-8","DOIUrl":"https://doi.org/10.1038/s41597-026-07076-8","url":null,"abstract":"<p><p>Intraoperative cardiac arrhythmias present distinct characteristics compared to non-surgical environments, yet publicly available electrocardiogram (ECG) databases have primarily focused on ambulatory or intensive care environments. To address this gap, we present the VitalDB Arrhythmia Database, a comprehensive collection of intraoperative ECG recordings with beat and rhythm labels specifically designed for developing and validating arrhythmia detection algorithms in surgical patients. The database comprises 734,528 seconds of continuous ECG data from 482 surgical patients, with a median annotated recording duration of 20 minutes. It contains over 660,000 annotated heartbeats across four beat types and 10 distinct rhythm categories. To efficiently process the extensive source data, we developed a custom deep learning beat classifier that serves as an automated screening tool for arrhythmia candidate segments. All annotations underwent rigorous validation by five anesthesiologists, with each segment independently reviewed by at least two anesthesiologists, and 9.3% required full committee consensus. Inter-rater reliability analysis demonstrated excellent agreement with an overall Cohen's kappa of 0.930 ± 0.130. This publicly accessible resource provides the research community with clinically validated intraoperative arrhythmia data, facilitating the development of robust arrhythmia detection algorithms and enabling multimodal analysis to investigate the hemodynamic impact of intraoperative arrhythmias.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-20DOI: 10.1038/s41597-026-07047-z
Dmitry Malikov, Lev Krasnov, Marina Kiseleva, Elizaveta Meshcheriakova, Fedor Kuznetsov, Vladimir Elistratov, Matvei Vasiyarov, Sergei Tatarin, Stanislav Bezzubov
Solubility is a crucial property of organic compounds, impacting their potential applications in synthetic chemistry, materials science and drug design. Moreover, in technological processes mixtures of solvents are often utilized, making the solubility assessment more complicated. Predicting solubility values in mixtures of solvents from a molecular structure can help to address this issue, although a large and diverse dataset is needed to effectively pursue data-driven studies. In this research, we present a dataset containing 175166 experimental solubility values within the temperature range from 252 to 383 K for 810 organic compounds possessing 3001 unique solute-binary solvent systems and 750 unique binary solvent mixtures extracted from 1115 peer-reviewed articles. The solubility data and molecular structures of solutes and solvents are translated to a unified machine-readable format, facilitating data analysis and machine learning model development. An interactive online tool for visualization and navigation through the data has also been developed. This dataset can serve as a comprehensive benchmark for predicting solubility in mixtures of solvents.
{"title":"Dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures.","authors":"Dmitry Malikov, Lev Krasnov, Marina Kiseleva, Elizaveta Meshcheriakova, Fedor Kuznetsov, Vladimir Elistratov, Matvei Vasiyarov, Sergei Tatarin, Stanislav Bezzubov","doi":"10.1038/s41597-026-07047-z","DOIUrl":"https://doi.org/10.1038/s41597-026-07047-z","url":null,"abstract":"<p><p>Solubility is a crucial property of organic compounds, impacting their potential applications in synthetic chemistry, materials science and drug design. Moreover, in technological processes mixtures of solvents are often utilized, making the solubility assessment more complicated. Predicting solubility values in mixtures of solvents from a molecular structure can help to address this issue, although a large and diverse dataset is needed to effectively pursue data-driven studies. In this research, we present a dataset containing 175166 experimental solubility values within the temperature range from 252 to 383 K for 810 organic compounds possessing 3001 unique solute-binary solvent systems and 750 unique binary solvent mixtures extracted from 1115 peer-reviewed articles. The solubility data and molecular structures of solutes and solvents are translated to a unified machine-readable format, facilitating data analysis and machine learning model development. An interactive online tool for visualization and navigation through the data has also been developed. This dataset can serve as a comprehensive benchmark for predicting solubility in mixtures of solvents.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A high-resolution climate projections dataset is produced by statistically downscaling climate projections from the CMIP6 experiment. This global dataset is at a spatial resolution of 0.0375° × 0.0375° from 19 climate models over Senegal domain. It includes five essential surface daily variables: mean, minimum, and maximum air temperatures, precipitation, and terrestrial radiation. The dataset covers daily climate data for the historical period (1850-2014) and future projections (2015-2100) for three greenhouse gas emissions scenarios: SSP1-2.6, SSP2-4.5, and SSP5-8.5. The downscaling method used is the "Cumulative Distribution Function-transform", which is utilized for bias correction and has been widely referenced in peer-reviewed literature. The data processing includes rigorous quality control of metadata following climate modelling community standards and outlier detection to ensure data integrity.
{"title":"High-Resolution Downscaled CMIP6 Projections dataset of Key Climate Variables for Senegal.","authors":"Asse Mbengue, Benjamin Sultan, Redouane Lguensat, Mathieu Vrac, Aïda Diongue-Niang, Ousmane Ndiaye, Amadou Thierno Gaye","doi":"10.1038/s41597-026-07059-9","DOIUrl":"https://doi.org/10.1038/s41597-026-07059-9","url":null,"abstract":"<p><p>A high-resolution climate projections dataset is produced by statistically downscaling climate projections from the CMIP6 experiment. This global dataset is at a spatial resolution of 0.0375° × 0.0375° from 19 climate models over Senegal domain. It includes five essential surface daily variables: mean, minimum, and maximum air temperatures, precipitation, and terrestrial radiation. The dataset covers daily climate data for the historical period (1850-2014) and future projections (2015-2100) for three greenhouse gas emissions scenarios: SSP1-2.6, SSP2-4.5, and SSP5-8.5. The downscaling method used is the \"Cumulative Distribution Function-transform\", which is utilized for bias correction and has been widely referenced in peer-reviewed literature. The data processing includes rigorous quality control of metadata following climate modelling community standards and outlier detection to ensure data integrity.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Landslides are a major geological hazard causing significant casualties and economic losses. Reliable risk assessment requires high-quality spatiotemporal event data, yet no publicly available landslide catalogue with fine-grained precision exists for China. To address this, we developed a landslide event catalogue for mainland China from 2008-2024 based on news reports. The dataset was generated via large-scale web crawling, information extraction using an open-source large language model (LLM), event deduplication, geocoding, and multi-stage validation. It contains 1,582 events with detailed spatiotemporal attributes, some with minute-level temporal precision and spatial resolution down to the county, village, or specific reported sites. Evaluation shows that, while casualty-related information is less accurate, the LLM reliably captures key attributes such as time, location, and triggering factors. This demonstrates the feasibility of using LLMs to extract critical landslide data from news reports. Compared with existing catalogues, our dataset offers more events and improved spatiotemporal accuracy, providing a valuable resource for landslide hazard assessment, early warning model development, and disaster risk management in China.
{"title":"A high-precision catalogue of landslide events in China based on news text mining with large language model.","authors":"Binru Zhao, Lulu Zhang, Zhenxia Liu, Wenchao Ma, Jian Wang, Qiang Sun, Wen Luo, Zhaoyuan Yu, Linwang Yuan","doi":"10.1038/s41597-026-07066-w","DOIUrl":"https://doi.org/10.1038/s41597-026-07066-w","url":null,"abstract":"<p><p>Landslides are a major geological hazard causing significant casualties and economic losses. Reliable risk assessment requires high-quality spatiotemporal event data, yet no publicly available landslide catalogue with fine-grained precision exists for China. To address this, we developed a landslide event catalogue for mainland China from 2008-2024 based on news reports. The dataset was generated via large-scale web crawling, information extraction using an open-source large language model (LLM), event deduplication, geocoding, and multi-stage validation. It contains 1,582 events with detailed spatiotemporal attributes, some with minute-level temporal precision and spatial resolution down to the county, village, or specific reported sites. Evaluation shows that, while casualty-related information is less accurate, the LLM reliably captures key attributes such as time, location, and triggering factors. This demonstrates the feasibility of using LLMs to extract critical landslide data from news reports. Compared with existing catalogues, our dataset offers more events and improved spatiotemporal accuracy, providing a valuable resource for landslide hazard assessment, early warning model development, and disaster risk management in China.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-20DOI: 10.1038/s41597-026-07074-w
Yu Gong, Yifei He, Xuefeng Zhang, Ling Wang, Haibo You, Mo Zhou, Jie Liu
Existing tomato datasets often focus on short-term experiments or lack integrated environmental and agronomic data. We present Horti-M3-Tomato, a comprehensive three-year dataset collected in Northeast China's greenhouse, including high-resolution RGB images, environmental sensor data (recorded every 30 minutes), soil conditions, and detailed agronomic records such as yield data and management practices. Spanning three growing seasons (2023-2025), the dataset integrates temporal imaging, environmental monitoring, soil data, and manual phenotypic and yield records. Horti-M3-Tomato supports research on growth dynamics, genotype-environment interactions, and provides a benchmark for AI-based phenotyping and precision horticulture. The dataset is openly available for further research in controlled-environment agriculture.
{"title":"A Three-Year Multimodal Holistic Dataset For Horticultural Tomato Cultivation.","authors":"Yu Gong, Yifei He, Xuefeng Zhang, Ling Wang, Haibo You, Mo Zhou, Jie Liu","doi":"10.1038/s41597-026-07074-w","DOIUrl":"https://doi.org/10.1038/s41597-026-07074-w","url":null,"abstract":"<p><p>Existing tomato datasets often focus on short-term experiments or lack integrated environmental and agronomic data. We present Horti-M3-Tomato, a comprehensive three-year dataset collected in Northeast China's greenhouse, including high-resolution RGB images, environmental sensor data (recorded every 30 minutes), soil conditions, and detailed agronomic records such as yield data and management practices. Spanning three growing seasons (2023-2025), the dataset integrates temporal imaging, environmental monitoring, soil data, and manual phenotypic and yield records. Horti-M3-Tomato supports research on growth dynamics, genotype-environment interactions, and provides a benchmark for AI-based phenotyping and precision horticulture. The dataset is openly available for further research in controlled-environment agriculture.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-19DOI: 10.1038/s41597-026-07075-9
Zehui Lao, Bei-Wen Ying
High-throughput phenotyping of microbial growth is crucial for understanding genotype-phenotype relationships in systems biology. Linking genetic variation to dynamic growth responses across environments remains challenging. Here, we present a time series dataset representing the growth curves of 3,909 single-gene knockout Escherichia coli strains grown in rich (LB) and minimal (M63) media. Using microplate assays with biological triplicates at 37 °C, we generated 23,454 OD600 time-series trajectories (3,909 strains × 2 media × 3 replicates) recorded every 15 minutes for 24-48 hours. The dataset provides plate-background-corrected growth curves, derived growth parameters including carrying capacity (K) and maximal growth rate (r), and gene category annotations. This standardized resource facilitates comparative analyses of genotype-dependent growth dynamics between rich and poor nutritional conditions and supports methodological development for time-series processing and growth-phenotype characterization. By making the complete growth trajectories publicly available with metadata and quality indicators, we aim to enable reuse and reproducible analyses of bacterial growth dynamics across the Keio collection.
{"title":"Growth dynamics of 3,909 Escherichia coli single-gene knockouts in rich and minimal media.","authors":"Zehui Lao, Bei-Wen Ying","doi":"10.1038/s41597-026-07075-9","DOIUrl":"https://doi.org/10.1038/s41597-026-07075-9","url":null,"abstract":"<p><p>High-throughput phenotyping of microbial growth is crucial for understanding genotype-phenotype relationships in systems biology. Linking genetic variation to dynamic growth responses across environments remains challenging. Here, we present a time series dataset representing the growth curves of 3,909 single-gene knockout Escherichia coli strains grown in rich (LB) and minimal (M63) media. Using microplate assays with biological triplicates at 37 °C, we generated 23,454 OD<sub>600</sub> time-series trajectories (3,909 strains × 2 media × 3 replicates) recorded every 15 minutes for 24-48 hours. The dataset provides plate-background-corrected growth curves, derived growth parameters including carrying capacity (K) and maximal growth rate (r), and gene category annotations. This standardized resource facilitates comparative analyses of genotype-dependent growth dynamics between rich and poor nutritional conditions and supports methodological development for time-series processing and growth-phenotype characterization. By making the complete growth trajectories publicly available with metadata and quality indicators, we aim to enable reuse and reproducible analyses of bacterial growth dynamics across the Keio collection.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147487385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-19DOI: 10.1038/s41597-026-06642-4
Andreas Bueckle, Bruce W Herr, Lu Chen, Daniel Bolin, Danial Qaurooni, Michael Ginda, Yashvardhan Jain, Aleix Puig-Barbe, Kristin Ardlie, Fusheng Wang, Katy Börner
The human body contains ~27-37 trillion cells of up to 10,000 cell types (CTs) within a volume of ~62-120 liters (males) and 52-89 liters (females). The Human Reference Atlas (HRA) v2.3 provides a quantitative 3D framework of CTs across 73 reference organs and 1,283 3D anatomical structures (ASs). The HRA Cell Type Population (HRApop) effort has quantified CTs per AS using high-quality single-cell datasets processed through scalable, reproducible workflows and cell type annotation (CTann) tools. HRApop v1.0 includes reference CT populations for 73 ASs (112 when sex-specific) using 662 datasets spatially registered to 230 locations across 17 organs (31 when sex-specific). For 558 single-cell (sc-)transcriptomics datasets (11,042,750 cells), CTs and biomarker expressions were computed using Azimuth, CellTypist, and popV. To test generalizability, 104 sc-proteomics datasets (16,576,863 cells) were integrated. In total, HRApop includes 27,619,613 cells and serves as a healthy reference for researchers aiming to elucidate mechanisms underlying cellular interactions as well as cellular and tissue level disease progression, which may facilitate advancements in basic discovery and lead to new therapeutic strategies.
{"title":"Cell Type Populations for 3D Anatomical Structures of the Human Reference Atlas.","authors":"Andreas Bueckle, Bruce W Herr, Lu Chen, Daniel Bolin, Danial Qaurooni, Michael Ginda, Yashvardhan Jain, Aleix Puig-Barbe, Kristin Ardlie, Fusheng Wang, Katy Börner","doi":"10.1038/s41597-026-06642-4","DOIUrl":"10.1038/s41597-026-06642-4","url":null,"abstract":"<p><p>The human body contains ~27-37 trillion cells of up to 10,000 cell types (CTs) within a volume of ~62-120 liters (males) and 52-89 liters (females). The Human Reference Atlas (HRA) v2.3 provides a quantitative 3D framework of CTs across 73 reference organs and 1,283 3D anatomical structures (ASs). The HRA Cell Type Population (HRApop) effort has quantified CTs per AS using high-quality single-cell datasets processed through scalable, reproducible workflows and cell type annotation (CTann) tools. HRApop v1.0 includes reference CT populations for 73 ASs (112 when sex-specific) using 662 datasets spatially registered to 230 locations across 17 organs (31 when sex-specific). For 558 single-cell (sc-)transcriptomics datasets (11,042,750 cells), CTs and biomarker expressions were computed using Azimuth, CellTypist, and popV. To test generalizability, 104 sc-proteomics datasets (16,576,863 cells) were integrated. In total, HRApop includes 27,619,613 cells and serves as a healthy reference for researchers aiming to elucidate mechanisms underlying cellular interactions as well as cellular and tissue level disease progression, which may facilitate advancements in basic discovery and lead to new therapeutic strategies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147487351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}