首页 > 最新文献

Scientific Data最新文献

英文 中文
Chromosome-scale Genome Assembly of the Critically Endangered Blue-crowned Laughingthrush (Pterorhinus courtoisi, Leiothrichidae). 极度濒危蓝冠笑鸫(Pterorhinus courtoisi, leiothricdae)的染色体尺度基因组组装。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-20 DOI: 10.1038/s41597-026-06951-8
Yuxuan Ouyang, Lin Yang, Binbin Cheng, Chang Xiao, Weiwei Zhang

The Blue-crowned Laughingthrush (Pterorhinus courtoisi) is a critically endangered species and listed as National First-class Protected Wildlife in China, with a small population size and highly restricted geographic distribution in Jiangxi Province. However, the genetic mechanisms underlying its endangered status remain unclear. In this study, we constructed a chromosome-level reference genome by integrating Illumina short-read, PacBio long-read, and Hi-C chromatin interaction data. The final assembled genome spans 1.255 Gb, with 1.158 Gb (92.32%) of the sequences anchored onto 39 pseudochromosomes. A total of 16,807 protein-coding genes were predicted, among which 15,574 genes (92.7%) were functionally annotated. This high-quality genome assembly provides a valuable genomic resource for future genetic studies and conservation efforts for the Blue-crowned Laughingthrush.

蓝冠笑鸫(Pterorhinus courtoisi)是中国国家一级保护野生动物,是一种极危物种,在江西省种群规模小,地理分布受到高度限制。然而,其濒危状态的遗传机制尚不清楚。在这项研究中,我们通过整合Illumina短读、PacBio长读和Hi-C染色质相互作用数据,构建了染色体水平的参考基因组。最终组装的基因组全长1.255 Gb,其中1.158 Gb(92.32%)的序列固定在39条假染色体上。共预测蛋白编码基因16807个,其中功能注释基因15574个,占92.7%。这一高质量的基因组组合为今后的遗传研究和保护工作提供了宝贵的基因组资源。
{"title":"Chromosome-scale Genome Assembly of the Critically Endangered Blue-crowned Laughingthrush (Pterorhinus courtoisi, Leiothrichidae).","authors":"Yuxuan Ouyang, Lin Yang, Binbin Cheng, Chang Xiao, Weiwei Zhang","doi":"10.1038/s41597-026-06951-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06951-8","url":null,"abstract":"<p><p>The Blue-crowned Laughingthrush (Pterorhinus courtoisi) is a critically endangered species and listed as National First-class Protected Wildlife in China, with a small population size and highly restricted geographic distribution in Jiangxi Province. However, the genetic mechanisms underlying its endangered status remain unclear. In this study, we constructed a chromosome-level reference genome by integrating Illumina short-read, PacBio long-read, and Hi-C chromatin interaction data. The final assembled genome spans 1.255 Gb, with 1.158 Gb (92.32%) of the sequences anchored onto 39 pseudochromosomes. A total of 16,807 protein-coding genes were predicted, among which 15,574 genes (92.7%) were functionally annotated. This high-quality genome assembly provides a valuable genomic resource for future genetic studies and conservation efforts for the Blue-crowned Laughingthrush.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COVID Diaries, State Response to COVID Vaccination Program, December 2020 to September 2021. COVID日记,国家对COVID疫苗接种计划的反应,2020年12月至2021年9月。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-20 DOI: 10.1038/s41597-026-06975-0
Avalon S Moore, Bridget Vitu, Felicia Fraizer-Bisner, Peter J Williams, Lucy van der Merwe, Abdelrhman Gouda, Dessislava Kirilova, Christopher Pittenger, Helen Pushkarskaya

National COVID-19 response plans in the United States recognized that the primary responsibility for addressing domestic health emergencies lay with states and localities, though each state's pandemic response authority varied. States utilized a range of tools to manage infectious-disease outbreaks, including vaccination rules, incentives, and communication strategies. This database includes online publications from state governors and departments of health across all 50 U.S. states and the District of Columbia. It spans from December 2020, when Phase 1a of the COVID-19 vaccination allocation began, to September 2021, when vaccines were widely available and often mandated. In total, 5,223 unique publications were collected, each classified by type: Flyer, Milestone, Info, and Policy. We also address key considerations for analyzing this data and suggest potential research questions that can be explored with it.

美国的国家COVID-19应对计划承认,应对国内突发卫生事件的主要责任在于州和地方,尽管每个州的大流行应对权力各不相同。各国利用一系列工具来管理传染病暴发,包括疫苗接种规则、奖励措施和宣传战略。该数据库包括美国所有50个州和哥伦比亚特区的州长和卫生部门的在线出版物。它从2020年12月开始COVID-19疫苗分配的第1a阶段,到2021年9月,疫苗广泛可用,通常是强制性的。总共收集了5,223个独特的出版物,每个出版物按类型分类:传单,里程碑,信息和政策。我们还提出了分析这些数据的关键考虑因素,并提出了可以利用这些数据探索的潜在研究问题。
{"title":"COVID Diaries, State Response to COVID Vaccination Program, December 2020 to September 2021.","authors":"Avalon S Moore, Bridget Vitu, Felicia Fraizer-Bisner, Peter J Williams, Lucy van der Merwe, Abdelrhman Gouda, Dessislava Kirilova, Christopher Pittenger, Helen Pushkarskaya","doi":"10.1038/s41597-026-06975-0","DOIUrl":"https://doi.org/10.1038/s41597-026-06975-0","url":null,"abstract":"<p><p>National COVID-19 response plans in the United States recognized that the primary responsibility for addressing domestic health emergencies lay with states and localities, though each state's pandemic response authority varied. States utilized a range of tools to manage infectious-disease outbreaks, including vaccination rules, incentives, and communication strategies. This database includes online publications from state governors and departments of health across all 50 U.S. states and the District of Columbia. It spans from December 2020, when Phase 1a of the COVID-19 vaccination allocation began, to September 2021, when vaccines were widely available and often mandated. In total, 5,223 unique publications were collected, each classified by type: Flyer, Milestone, Info, and Policy. We also address key considerations for analyzing this data and suggest potential research questions that can be explored with it.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dataset of the smart governance index for Chinese cities. 中国城市智慧治理指数数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-20 DOI: 10.1038/s41597-025-06510-7
Lu Song, Zhihao He, Yinghao Pan, Haijun Yue

Cities worldwide are rapidly adopting smart governance strategies to address complex urban challenges, yet systematic measurement of their effectiveness remains limited. This study develops and applies a comprehensive Smart Governance Index (SGI) to evaluate governance transformation across 296 Chinese cities from 2017 to 2023. Our framework integrates three critical dimensions of urban governance: the Value Objectives Sub-index (VOS) that establishes normative goals and strategic priorities; the System Applications Sub-index (SAS) that delivers governance services through operational platforms; and the Institutional-Technical Support Sub-index (ITSS) that provides the underlying infrastructure and organizational capacity. This multidimensional assessment reveals substantial heterogeneity in smart governance adoption and effectiveness across Chinese cities, with global implications for urban policy design. The initial version of the dataset includes SGI and its sub-indices for 296 Chinese cities from 2017 to 2023, with annual updates planned. The spatiotemporal patterns identified demonstrate how cities at different development stages can optimize their governance pathways, offering insights for achieving sustainable urban transformation in diverse contexts.

世界各地的城市正在迅速采用智能治理战略来应对复杂的城市挑战,但对其有效性的系统衡量仍然有限。本研究开发并应用综合智慧治理指数(SGI)来评估2017 - 2023年中国296个城市的治理转型。我们的框架整合了城市治理的三个关键维度:价值目标分类指数(VOS),确立规范性目标和战略重点;通过操作平台提供治理服务的系统应用子索引(SAS);以及提供底层基础设施和组织能力的机构技术支持分类指数(ITSS)。这一多维度评估揭示了中国城市在智能治理采用和有效性方面的巨大异质性,对城市政策设计具有全球性影响。该数据集的初始版本包括2017年至2023年中国296个城市的SGI及其子指数,并计划每年更新。所确定的时空模式展示了处于不同发展阶段的城市如何优化其治理路径,为在不同背景下实现可持续的城市转型提供了见解。
{"title":"A dataset of the smart governance index for Chinese cities.","authors":"Lu Song, Zhihao He, Yinghao Pan, Haijun Yue","doi":"10.1038/s41597-025-06510-7","DOIUrl":"https://doi.org/10.1038/s41597-025-06510-7","url":null,"abstract":"<p><p>Cities worldwide are rapidly adopting smart governance strategies to address complex urban challenges, yet systematic measurement of their effectiveness remains limited. This study develops and applies a comprehensive Smart Governance Index (SGI) to evaluate governance transformation across 296 Chinese cities from 2017 to 2023. Our framework integrates three critical dimensions of urban governance: the Value Objectives Sub-index (VOS) that establishes normative goals and strategic priorities; the System Applications Sub-index (SAS) that delivers governance services through operational platforms; and the Institutional-Technical Support Sub-index (ITSS) that provides the underlying infrastructure and organizational capacity. This multidimensional assessment reveals substantial heterogeneity in smart governance adoption and effectiveness across Chinese cities, with global implications for urban policy design. The initial version of the dataset includes SGI and its sub-indices for 296 Chinese cities from 2017 to 2023, with annual updates planned. The spatiotemporal patterns identified demonstrate how cities at different development stages can optimize their governance pathways, offering insights for achieving sustainable urban transformation in diverse contexts.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VitalDB Arrhythmia Database: An Anesthesiologist-Validated Large-scale Intraoperative Arrhythmia Dataset with Beat and Rhythm Labels. VitalDB心律失常数据库:一个麻醉医师验证的大规模术中心律失常数据集,带有节拍和节奏标签。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-20 DOI: 10.1038/s41597-026-07076-8
Da-In Eun, Kayoung Shim, Hyunsoo Lee, Yeji Lim, Hanbyeol Lim, Hyeonhoon Lee, Jiwon Lee, Hyung-Chul Lee

Intraoperative cardiac arrhythmias present distinct characteristics compared to non-surgical environments, yet publicly available electrocardiogram (ECG) databases have primarily focused on ambulatory or intensive care environments. To address this gap, we present the VitalDB Arrhythmia Database, a comprehensive collection of intraoperative ECG recordings with beat and rhythm labels specifically designed for developing and validating arrhythmia detection algorithms in surgical patients. The database comprises 734,528 seconds of continuous ECG data from 482 surgical patients, with a median annotated recording duration of 20 minutes. It contains over 660,000 annotated heartbeats across four beat types and 10 distinct rhythm categories. To efficiently process the extensive source data, we developed a custom deep learning beat classifier that serves as an automated screening tool for arrhythmia candidate segments. All annotations underwent rigorous validation by five anesthesiologists, with each segment independently reviewed by at least two anesthesiologists, and 9.3% required full committee consensus. Inter-rater reliability analysis demonstrated excellent agreement with an overall Cohen's kappa of 0.930 ± 0.130. This publicly accessible resource provides the research community with clinically validated intraoperative arrhythmia data, facilitating the development of robust arrhythmia detection algorithms and enabling multimodal analysis to investigate the hemodynamic impact of intraoperative arrhythmias.

与非手术环境相比,术中心律失常表现出明显的特征,但公开可用的心电图(ECG)数据库主要集中在门诊或重症监护环境。为了解决这一差距,我们提出了VitalDB心律失常数据库,这是一个全面的术中心电图记录的集合,带有节拍和节奏标签,专门用于开发和验证手术患者心律失常检测算法。该数据库包括来自482例手术患者的734,528秒连续心电图数据,平均注释记录时间为20分钟。它包含超过660,000个注释的心跳,跨越四种节拍类型和10种不同的节奏类别。为了有效地处理大量的源数据,我们开发了一个定制的深度学习节奏分类器,作为心律失常候选段的自动筛选工具。所有注释都经过五名麻醉师的严格验证,每个部分至少由两名麻醉师独立审查,9.3%需要全体委员会一致同意。评估者间信度分析结果与总体的科恩kappa(0.930±0.130)非常吻合。这个可公开访问的资源为研究界提供了经临床验证的术中心律失常数据,促进了强大的心律失常检测算法的发展,并使多模态分析能够研究术中心律失常的血流动力学影响。
{"title":"VitalDB Arrhythmia Database: An Anesthesiologist-Validated Large-scale Intraoperative Arrhythmia Dataset with Beat and Rhythm Labels.","authors":"Da-In Eun, Kayoung Shim, Hyunsoo Lee, Yeji Lim, Hanbyeol Lim, Hyeonhoon Lee, Jiwon Lee, Hyung-Chul Lee","doi":"10.1038/s41597-026-07076-8","DOIUrl":"https://doi.org/10.1038/s41597-026-07076-8","url":null,"abstract":"<p><p>Intraoperative cardiac arrhythmias present distinct characteristics compared to non-surgical environments, yet publicly available electrocardiogram (ECG) databases have primarily focused on ambulatory or intensive care environments. To address this gap, we present the VitalDB Arrhythmia Database, a comprehensive collection of intraoperative ECG recordings with beat and rhythm labels specifically designed for developing and validating arrhythmia detection algorithms in surgical patients. The database comprises 734,528 seconds of continuous ECG data from 482 surgical patients, with a median annotated recording duration of 20 minutes. It contains over 660,000 annotated heartbeats across four beat types and 10 distinct rhythm categories. To efficiently process the extensive source data, we developed a custom deep learning beat classifier that serves as an automated screening tool for arrhythmia candidate segments. All annotations underwent rigorous validation by five anesthesiologists, with each segment independently reviewed by at least two anesthesiologists, and 9.3% required full committee consensus. Inter-rater reliability analysis demonstrated excellent agreement with an overall Cohen's kappa of 0.930 ± 0.130. This publicly accessible resource provides the research community with clinically validated intraoperative arrhythmia data, facilitating the development of robust arrhythmia detection algorithms and enabling multimodal analysis to investigate the hemodynamic impact of intraoperative arrhythmias.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures. 不同温度下有机化合物在二元溶剂混合物中的溶解度值数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-20 DOI: 10.1038/s41597-026-07047-z
Dmitry Malikov, Lev Krasnov, Marina Kiseleva, Elizaveta Meshcheriakova, Fedor Kuznetsov, Vladimir Elistratov, Matvei Vasiyarov, Sergei Tatarin, Stanislav Bezzubov

Solubility is a crucial property of organic compounds, impacting their potential applications in synthetic chemistry, materials science and drug design. Moreover, in technological processes mixtures of solvents are often utilized, making the solubility assessment more complicated. Predicting solubility values in mixtures of solvents from a molecular structure can help to address this issue, although a large and diverse dataset is needed to effectively pursue data-driven studies. In this research, we present a dataset containing 175166 experimental solubility values within the temperature range from 252 to 383 K for 810 organic compounds possessing 3001 unique solute-binary solvent systems and 750 unique binary solvent mixtures extracted from 1115 peer-reviewed articles. The solubility data and molecular structures of solutes and solvents are translated to a unified machine-readable format, facilitating data analysis and machine learning model development. An interactive online tool for visualization and navigation through the data has also been developed. This dataset can serve as a comprehensive benchmark for predicting solubility in mixtures of solvents.

溶解度是有机化合物的一项重要性质,影响着它们在合成化学、材料科学和药物设计方面的潜在应用。此外,在工艺过程中经常使用混合溶剂,使溶解度评估更加复杂。从分子结构预测溶剂混合物中的溶解度值可以帮助解决这一问题,尽管需要大量不同的数据集来有效地进行数据驱动的研究。在这项研究中,我们提供了一个数据集,其中包含从1115篇同行评审文章中提取的810种有机化合物的175166个实验溶解度值,这些化合物具有3001种独特的溶质-二元溶剂体系和750种独特的二元溶剂混合物。溶质和溶剂的溶解度数据和分子结构转换为统一的机器可读格式,便于数据分析和机器学习模型开发。还开发了一种交互式在线工具,用于通过数据进行可视化和导航。该数据集可以作为预测溶剂混合物溶解度的综合基准。
{"title":"Dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures.","authors":"Dmitry Malikov, Lev Krasnov, Marina Kiseleva, Elizaveta Meshcheriakova, Fedor Kuznetsov, Vladimir Elistratov, Matvei Vasiyarov, Sergei Tatarin, Stanislav Bezzubov","doi":"10.1038/s41597-026-07047-z","DOIUrl":"https://doi.org/10.1038/s41597-026-07047-z","url":null,"abstract":"<p><p>Solubility is a crucial property of organic compounds, impacting their potential applications in synthetic chemistry, materials science and drug design. Moreover, in technological processes mixtures of solvents are often utilized, making the solubility assessment more complicated. Predicting solubility values in mixtures of solvents from a molecular structure can help to address this issue, although a large and diverse dataset is needed to effectively pursue data-driven studies. In this research, we present a dataset containing 175166 experimental solubility values within the temperature range from 252 to 383 K for 810 organic compounds possessing 3001 unique solute-binary solvent systems and 750 unique binary solvent mixtures extracted from 1115 peer-reviewed articles. The solubility data and molecular structures of solutes and solvents are translated to a unified machine-readable format, facilitating data analysis and machine learning model development. An interactive online tool for visualization and navigation through the data has also been developed. This dataset can serve as a comprehensive benchmark for predicting solubility in mixtures of solvents.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Resolution Downscaled CMIP6 Projections dataset of Key Climate Variables for Senegal. 塞内加尔关键气候变量的高分辨率缩尺CMIP6预估数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-20 DOI: 10.1038/s41597-026-07059-9
Asse Mbengue, Benjamin Sultan, Redouane Lguensat, Mathieu Vrac, Aïda Diongue-Niang, Ousmane Ndiaye, Amadou Thierno Gaye

A high-resolution climate projections dataset is produced by statistically downscaling climate projections from the CMIP6 experiment. This global dataset is at a spatial resolution of 0.0375° × 0.0375° from 19 climate models over Senegal domain. It includes five essential surface daily variables: mean, minimum, and maximum air temperatures, precipitation, and terrestrial radiation. The dataset covers daily climate data for the historical period (1850-2014) and future projections (2015-2100) for three greenhouse gas emissions scenarios: SSP1-2.6, SSP2-4.5, and SSP5-8.5. The downscaling method used is the "Cumulative Distribution Function-transform", which is utilized for bias correction and has been widely referenced in peer-reviewed literature. The data processing includes rigorous quality control of metadata following climate modelling community standards and outlier detection to ensure data integrity.

一个高分辨率的气候预估数据集是由CMIP6实验的气候预估统计降尺度产生的。该全球数据集的空间分辨率为0.0375°× 0.0375°,来自塞内加尔地区的19个气候模式。它包括五个基本的地表日变量:平均、最低和最高气温、降水和地面辐射。该数据集涵盖了三个温室气体排放情景(SSP1-2.6、SSP2-4.5和SSP5-8.5)的历史时期(1850-2014)和未来预测(2015-2100)的每日气候数据。使用的降尺度方法是“累积分布函数-变换”,该方法用于偏差校正,在同行评审文献中被广泛引用。数据处理包括严格的元数据质量控制,遵循气候建模社区标准和异常值检测,以确保数据的完整性。
{"title":"High-Resolution Downscaled CMIP6 Projections dataset of Key Climate Variables for Senegal.","authors":"Asse Mbengue, Benjamin Sultan, Redouane Lguensat, Mathieu Vrac, Aïda Diongue-Niang, Ousmane Ndiaye, Amadou Thierno Gaye","doi":"10.1038/s41597-026-07059-9","DOIUrl":"https://doi.org/10.1038/s41597-026-07059-9","url":null,"abstract":"<p><p>A high-resolution climate projections dataset is produced by statistically downscaling climate projections from the CMIP6 experiment. This global dataset is at a spatial resolution of 0.0375° × 0.0375° from 19 climate models over Senegal domain. It includes five essential surface daily variables: mean, minimum, and maximum air temperatures, precipitation, and terrestrial radiation. The dataset covers daily climate data for the historical period (1850-2014) and future projections (2015-2100) for three greenhouse gas emissions scenarios: SSP1-2.6, SSP2-4.5, and SSP5-8.5. The downscaling method used is the \"Cumulative Distribution Function-transform\", which is utilized for bias correction and has been widely referenced in peer-reviewed literature. The data processing includes rigorous quality control of metadata following climate modelling community standards and outlier detection to ensure data integrity.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A high-precision catalogue of landslide events in China based on news text mining with large language model. 基于大语言模型新闻文本挖掘的中国滑坡事件高精度目录。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-20 DOI: 10.1038/s41597-026-07066-w
Binru Zhao, Lulu Zhang, Zhenxia Liu, Wenchao Ma, Jian Wang, Qiang Sun, Wen Luo, Zhaoyuan Yu, Linwang Yuan

Landslides are a major geological hazard causing significant casualties and economic losses. Reliable risk assessment requires high-quality spatiotemporal event data, yet no publicly available landslide catalogue with fine-grained precision exists for China. To address this, we developed a landslide event catalogue for mainland China from 2008-2024 based on news reports. The dataset was generated via large-scale web crawling, information extraction using an open-source large language model (LLM), event deduplication, geocoding, and multi-stage validation. It contains 1,582 events with detailed spatiotemporal attributes, some with minute-level temporal precision and spatial resolution down to the county, village, or specific reported sites. Evaluation shows that, while casualty-related information is less accurate, the LLM reliably captures key attributes such as time, location, and triggering factors. This demonstrates the feasibility of using LLMs to extract critical landslide data from news reports. Compared with existing catalogues, our dataset offers more events and improved spatiotemporal accuracy, providing a valuable resource for landslide hazard assessment, early warning model development, and disaster risk management in China.

山体滑坡是造成重大人员伤亡和经济损失的重大地质灾害。可靠的风险评估需要高质量的时空事件数据,但中国没有公开的细粒度精度滑坡目录。为了解决这个问题,我们根据新闻报道编制了2008-2024年中国大陆滑坡事件目录。该数据集是通过大规模网络抓取、使用开源大型语言模型(LLM)的信息提取、事件重复数据删除、地理编码和多阶段验证生成的。它包含1,582个具有详细时空属性的事件,其中一些具有分钟级的时间精度和空间分辨率,可精确到县、村或特定报告地点。评估表明,虽然伤亡相关信息不太准确,但LLM可以可靠地捕获关键属性,如时间、地点和触发因素。这证明了利用llm从新闻报道中提取关键滑坡数据的可行性。与现有目录相比,我们的数据集提供了更多的事件和更高的时空精度,为中国的滑坡危害评估、早期预警模型开发和灾害风险管理提供了宝贵的资源。
{"title":"A high-precision catalogue of landslide events in China based on news text mining with large language model.","authors":"Binru Zhao, Lulu Zhang, Zhenxia Liu, Wenchao Ma, Jian Wang, Qiang Sun, Wen Luo, Zhaoyuan Yu, Linwang Yuan","doi":"10.1038/s41597-026-07066-w","DOIUrl":"https://doi.org/10.1038/s41597-026-07066-w","url":null,"abstract":"<p><p>Landslides are a major geological hazard causing significant casualties and economic losses. Reliable risk assessment requires high-quality spatiotemporal event data, yet no publicly available landslide catalogue with fine-grained precision exists for China. To address this, we developed a landslide event catalogue for mainland China from 2008-2024 based on news reports. The dataset was generated via large-scale web crawling, information extraction using an open-source large language model (LLM), event deduplication, geocoding, and multi-stage validation. It contains 1,582 events with detailed spatiotemporal attributes, some with minute-level temporal precision and spatial resolution down to the county, village, or specific reported sites. Evaluation shows that, while casualty-related information is less accurate, the LLM reliably captures key attributes such as time, location, and triggering factors. This demonstrates the feasibility of using LLMs to extract critical landslide data from news reports. Compared with existing catalogues, our dataset offers more events and improved spatiotemporal accuracy, providing a valuable resource for landslide hazard assessment, early warning model development, and disaster risk management in China.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Three-Year Multimodal Holistic Dataset For Horticultural Tomato Cultivation. 园艺番茄栽培三年多模态整体数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-20 DOI: 10.1038/s41597-026-07074-w
Yu Gong, Yifei He, Xuefeng Zhang, Ling Wang, Haibo You, Mo Zhou, Jie Liu

Existing tomato datasets often focus on short-term experiments or lack integrated environmental and agronomic data. We present Horti-M3-Tomato, a comprehensive three-year dataset collected in Northeast China's greenhouse, including high-resolution RGB images, environmental sensor data (recorded every 30 minutes), soil conditions, and detailed agronomic records such as yield data and management practices. Spanning three growing seasons (2023-2025), the dataset integrates temporal imaging, environmental monitoring, soil data, and manual phenotypic and yield records. Horti-M3-Tomato supports research on growth dynamics, genotype-environment interactions, and provides a benchmark for AI-based phenotyping and precision horticulture. The dataset is openly available for further research in controlled-environment agriculture.

现有的番茄数据集往往侧重于短期试验或缺乏综合的环境和农艺数据。我们展示了在中国东北温室收集的3年综合数据集“Horti-M3-Tomato”,包括高分辨率RGB图像、环境传感器数据(每30分钟记录一次)、土壤条件和详细的农艺记录,如产量数据和管理实践。该数据集跨越三个生长季节(2023-2025),集成了时间成像、环境监测、土壤数据以及人工表型和产量记录。Horti-M3-Tomato支持生长动力学、基因型-环境相互作用的研究,并为基于人工智能的表型和精确园艺提供基准。该数据集可供进一步的受控环境农业研究使用。
{"title":"A Three-Year Multimodal Holistic Dataset For Horticultural Tomato Cultivation.","authors":"Yu Gong, Yifei He, Xuefeng Zhang, Ling Wang, Haibo You, Mo Zhou, Jie Liu","doi":"10.1038/s41597-026-07074-w","DOIUrl":"https://doi.org/10.1038/s41597-026-07074-w","url":null,"abstract":"<p><p>Existing tomato datasets often focus on short-term experiments or lack integrated environmental and agronomic data. We present Horti-M3-Tomato, a comprehensive three-year dataset collected in Northeast China's greenhouse, including high-resolution RGB images, environmental sensor data (recorded every 30 minutes), soil conditions, and detailed agronomic records such as yield data and management practices. Spanning three growing seasons (2023-2025), the dataset integrates temporal imaging, environmental monitoring, soil data, and manual phenotypic and yield records. Horti-M3-Tomato supports research on growth dynamics, genotype-environment interactions, and provides a benchmark for AI-based phenotyping and precision horticulture. The dataset is openly available for further research in controlled-environment agriculture.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147491427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Growth dynamics of 3,909 Escherichia coli single-gene knockouts in rich and minimal media. 3909个大肠杆菌单基因敲除基因在丰富和最小培养基中的生长动态。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-19 DOI: 10.1038/s41597-026-07075-9
Zehui Lao, Bei-Wen Ying

High-throughput phenotyping of microbial growth is crucial for understanding genotype-phenotype relationships in systems biology. Linking genetic variation to dynamic growth responses across environments remains challenging. Here, we present a time series dataset representing the growth curves of 3,909 single-gene knockout Escherichia coli strains grown in rich (LB) and minimal (M63) media. Using microplate assays with biological triplicates at 37 °C, we generated 23,454 OD600 time-series trajectories (3,909 strains × 2 media × 3 replicates) recorded every 15 minutes for 24-48 hours. The dataset provides plate-background-corrected growth curves, derived growth parameters including carrying capacity (K) and maximal growth rate (r), and gene category annotations. This standardized resource facilitates comparative analyses of genotype-dependent growth dynamics between rich and poor nutritional conditions and supports methodological development for time-series processing and growth-phenotype characterization. By making the complete growth trajectories publicly available with metadata and quality indicators, we aim to enable reuse and reproducible analyses of bacterial growth dynamics across the Keio collection.

在系统生物学中,微生物生长的高通量表型是理解基因型-表型关系的关键。将遗传变异与不同环境下的动态生长反应联系起来仍然具有挑战性。在这里,我们展示了一个时间序列数据集,代表了在丰富(LB)和最小(M63)培养基中生长的3909株单基因敲除大肠杆菌菌株的生长曲线。在37°C条件下,使用生物三副本的微孔板实验,我们每15分钟记录一次23,454个OD600时间序列轨迹(3,909株× 2培养基× 3个重复),持续24-48小时。该数据集提供了平板背景校正的生长曲线,导出的生长参数,包括承载能力(K)和最大生长速率(r),以及基因类别注释。这一标准化资源有助于在营养丰富和营养不良条件下对基因型依赖的生长动态进行比较分析,并支持时间序列处理和生长表型表征的方法学发展。通过将完整的生长轨迹与元数据和质量指标公开,我们的目标是实现庆应义塾收集的细菌生长动态的重用和可重复分析。
{"title":"Growth dynamics of 3,909 Escherichia coli single-gene knockouts in rich and minimal media.","authors":"Zehui Lao, Bei-Wen Ying","doi":"10.1038/s41597-026-07075-9","DOIUrl":"https://doi.org/10.1038/s41597-026-07075-9","url":null,"abstract":"<p><p>High-throughput phenotyping of microbial growth is crucial for understanding genotype-phenotype relationships in systems biology. Linking genetic variation to dynamic growth responses across environments remains challenging. Here, we present a time series dataset representing the growth curves of 3,909 single-gene knockout Escherichia coli strains grown in rich (LB) and minimal (M63) media. Using microplate assays with biological triplicates at 37 °C, we generated 23,454 OD<sub>600</sub> time-series trajectories (3,909 strains × 2 media × 3 replicates) recorded every 15 minutes for 24-48 hours. The dataset provides plate-background-corrected growth curves, derived growth parameters including carrying capacity (K) and maximal growth rate (r), and gene category annotations. This standardized resource facilitates comparative analyses of genotype-dependent growth dynamics between rich and poor nutritional conditions and supports methodological development for time-series processing and growth-phenotype characterization. By making the complete growth trajectories publicly available with metadata and quality indicators, we aim to enable reuse and reproducible analyses of bacterial growth dynamics across the Keio collection.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147487385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cell Type Populations for 3D Anatomical Structures of the Human Reference Atlas. 细胞类型群体的三维解剖结构的人类参考图集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-19 DOI: 10.1038/s41597-026-06642-4
Andreas Bueckle, Bruce W Herr, Lu Chen, Daniel Bolin, Danial Qaurooni, Michael Ginda, Yashvardhan Jain, Aleix Puig-Barbe, Kristin Ardlie, Fusheng Wang, Katy Börner

The human body contains ~27-37 trillion cells of up to 10,000 cell types (CTs) within a volume of ~62-120 liters (males) and 52-89 liters (females). The Human Reference Atlas (HRA) v2.3 provides a quantitative 3D framework of CTs across 73 reference organs and 1,283 3D anatomical structures (ASs). The HRA Cell Type Population (HRApop) effort has quantified CTs per AS using high-quality single-cell datasets processed through scalable, reproducible workflows and cell type annotation (CTann) tools. HRApop v1.0 includes reference CT populations for 73 ASs (112 when sex-specific) using 662 datasets spatially registered to 230 locations across 17 organs (31 when sex-specific). For 558 single-cell (sc-)transcriptomics datasets (11,042,750 cells), CTs and biomarker expressions were computed using Azimuth, CellTypist, and popV. To test generalizability, 104 sc-proteomics datasets (16,576,863 cells) were integrated. In total, HRApop includes 27,619,613 cells and serves as a healthy reference for researchers aiming to elucidate mechanisms underlying cellular interactions as well as cellular and tissue level disease progression, which may facilitate advancements in basic discovery and lead to new therapeutic strategies.

人体的体积分别为62 ~ 120升(男性)和52 ~ 89升(女性),其中含有多达1万种细胞类型(ct)的27 ~ 37万亿个细胞。人类参考图谱(HRA) v2.3提供了73个参考器官和1,283个3D解剖结构(as)的定量3D ct框架。HRA细胞类型群体(HRApop)工作使用高质量的单细胞数据集,通过可扩展、可重复的工作流程和细胞类型注释(CTann)工具进行处理,量化了每个AS的ct。HRApop v1.0包括73个as的参考CT人群(性别特异性为112个),使用662个数据集在空间上注册到17个器官的230个位置(性别特异性为31个)。对于558个单细胞(sc-)转录组学数据集(11042750个细胞),使用Azimuth、CellTypist和popV计算ct和生物标志物表达。为了测试通用性,我们整合了104个sc-蛋白质组学数据集(16,576,863个细胞)。总的来说,HRApop包括27,619,613个细胞,为研究人员阐明细胞相互作用的机制以及细胞和组织水平的疾病进展提供了健康参考,这可能促进基础发现的进展并产生新的治疗策略。
{"title":"Cell Type Populations for 3D Anatomical Structures of the Human Reference Atlas.","authors":"Andreas Bueckle, Bruce W Herr, Lu Chen, Daniel Bolin, Danial Qaurooni, Michael Ginda, Yashvardhan Jain, Aleix Puig-Barbe, Kristin Ardlie, Fusheng Wang, Katy Börner","doi":"10.1038/s41597-026-06642-4","DOIUrl":"10.1038/s41597-026-06642-4","url":null,"abstract":"<p><p>The human body contains ~27-37 trillion cells of up to 10,000 cell types (CTs) within a volume of ~62-120 liters (males) and 52-89 liters (females). The Human Reference Atlas (HRA) v2.3 provides a quantitative 3D framework of CTs across 73 reference organs and 1,283 3D anatomical structures (ASs). The HRA Cell Type Population (HRApop) effort has quantified CTs per AS using high-quality single-cell datasets processed through scalable, reproducible workflows and cell type annotation (CTann) tools. HRApop v1.0 includes reference CT populations for 73 ASs (112 when sex-specific) using 662 datasets spatially registered to 230 locations across 17 organs (31 when sex-specific). For 558 single-cell (sc-)transcriptomics datasets (11,042,750 cells), CTs and biomarker expressions were computed using Azimuth, CellTypist, and popV. To test generalizability, 104 sc-proteomics datasets (16,576,863 cells) were integrated. In total, HRApop includes 27,619,613 cells and serves as a healthy reference for researchers aiming to elucidate mechanisms underlying cellular interactions as well as cellular and tissue level disease progression, which may facilitate advancements in basic discovery and lead to new therapeutic strategies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147487351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Scientific Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1