首页 > 最新文献

Scientific Data最新文献

英文 中文
Transcriptomic Resource of Trissolcus cultratus: A Key Biological Control Agent for Halyomorpha halys. 鸢尾草转录组学资源:一种重要的Halyomorpha生物防治剂。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-23 DOI: 10.1038/s41597-026-06617-5
Feng-Qi Li, Yong-Zhi Zhong, Tim Haye, Francesco Tortorici, Sofia Victoria Prieto, Li Wang, Zi-Jian Song, Jin-Ping Zhang

Trissolcus cultratus, a parasitoid wasp of brown marmorated stink bug (BMSB), exhibits divergent parasitic capacities between Chinese and Swiss populations, with Chinese strains successfully reproducing on fresh and cold storage host eggs in both laboratory and field conditions, while Swiss strains fail to develop in fresh BMSB egg. We sequenced and assembled the first T. cultratus transcriptome, a total of 184,932,102 and 195,101,432 clean reads from the Chinese and Swiss strains, respectively, were de novo assembled into 19,280 and 16,322 unigenes. These assemblies predicted 9,811 and 9,582 protein-coding genes for the two strains. Among the 19,280 and 16,322 unigenes, we further identified 554 and 557 transcription factors in the Chinese and Swiss strains, respectively. This work presents the first transcriptomic dataset for T. cultratus, offering a valuable foundation for subsequent research on its population genetics.

褐纹臭虱(BMSB)的寄生蜂Trissolcus culatus在中国和瑞士种群之间表现出不同的寄生能力,中国菌株在实验室和现场条件下都能在新鲜和冷藏寄主卵上成功繁殖,而瑞士菌株在新鲜BMSB卵上不能发育。我们测序并组装了第一个T. cultratus转录组,分别来自中国和瑞士菌株的184,932,102和195,101,432个clean reads,重新组装成19,280和16,322个unigenes。这些组合预测了这两个菌株的9811个和9582个蛋白质编码基因。在19280和16322个单基因中,我们分别在中国和瑞士菌株中鉴定出554和557个转录因子。本研究首次建立了培养田鼠的转录组学数据集,为后续的群体遗传学研究提供了有价值的基础。
{"title":"Transcriptomic Resource of Trissolcus cultratus: A Key Biological Control Agent for Halyomorpha halys.","authors":"Feng-Qi Li, Yong-Zhi Zhong, Tim Haye, Francesco Tortorici, Sofia Victoria Prieto, Li Wang, Zi-Jian Song, Jin-Ping Zhang","doi":"10.1038/s41597-026-06617-5","DOIUrl":"https://doi.org/10.1038/s41597-026-06617-5","url":null,"abstract":"<p><p>Trissolcus cultratus, a parasitoid wasp of brown marmorated stink bug (BMSB), exhibits divergent parasitic capacities between Chinese and Swiss populations, with Chinese strains successfully reproducing on fresh and cold storage host eggs in both laboratory and field conditions, while Swiss strains fail to develop in fresh BMSB egg. We sequenced and assembled the first T. cultratus transcriptome, a total of 184,932,102 and 195,101,432 clean reads from the Chinese and Swiss strains, respectively, were de novo assembled into 19,280 and 16,322 unigenes. These assemblies predicted 9,811 and 9,582 protein-coding genes for the two strains. Among the 19,280 and 16,322 unigenes, we further identified 554 and 557 transcription factors in the Chinese and Swiss strains, respectively. This work presents the first transcriptomic dataset for T. cultratus, offering a valuable foundation for subsequent research on its population genetics.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146041540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chromosome-level genome assembly of the stonefly Rhopalopsole triangulispina Mo and Li, 2025 (Plecoptera: Leuctridae). 石蝇(Rhopalopsole triangulispina Mo and Li)染色体水平基因组组装,2025(翅目:白蛉科)。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-23 DOI: 10.1038/s41597-026-06631-7
Aili Lin, Jinjun Cao, Dávid Murányi, Ding Yang, Weihai Li, Raorao Mo

The superfamily Nemouroidea (Plecoptera) represents one of the most diverse and ecologically significant groups of stoneflies, with nymphs serving as crucial bioindicators of freshwater ecosystem health due to their sensitivity to water quality. However, the evolutionary and genomic studies of this group have been hindered by the lack of high-quality reference genomes. Here, we present a chromosome-level genome assembly for Rhopalopsole triangulispina Mo and Li, 2025 within Nemouroidea, generated by integrating PacBio HiFi long reads, Illumina short reads, and Hi-C chromatin interaction data. The final assembly spans 347.119 Mb with a scaffold N50 of 27.479 Mb, and 96.91% (336.39 Mb) of the genome is anchored to 13 pseudochromosomes. BUSCO assessment reveals a high completeness of 98.4% (insecta_odb10). The genome contains 48.50% repetitive elements (168.35 Mb) and encodes 12,857 protein-coding genes, which were comprehensively annotated using homology, transcriptomic, and ab initio evidence. This high-quality genome provides a foundational resource for resolving phylogenetic relationships within Nemouroidea, advancing studies on insect genome evolution, and enhancing freshwater biomonitoring efforts through genomic tools.

石蝇总科(翅翅目)是石蝇种类最丰富、生态意义最显著的类群之一,其若虫对水质敏感,是淡水生态系统健康的重要生物指标。然而,由于缺乏高质量的参考基因组,这一群体的进化和基因组研究一直受到阻碍。在这里,我们展示了Nemouroidea中Rhopalopsole triangulispina Mo和Li, 2025的染色体水平基因组组装,该基因组组装通过整合PacBio HiFi长读取,Illumina短读取和Hi-C染色质相互作用数据生成。最终组装全长347.119 Mb,支架N50为27.479 Mb,基因组的96.91% (336.39 Mb)被锚定在13条假染色体上。BUSCO评估显示,完整度高达98.4%(昆虫ta_odb10)。该基因组包含48.50%的重复元件(168.35 Mb),编码12857个蛋白质编码基因,利用同源性、转录组学和从头算证据对这些基因进行了全面的注释。这个高质量的基因组为解决Nemouroidea内的系统发育关系、推进昆虫基因组进化研究以及通过基因组工具加强淡水生物监测工作提供了基础资源。
{"title":"Chromosome-level genome assembly of the stonefly Rhopalopsole triangulispina Mo and Li, 2025 (Plecoptera: Leuctridae).","authors":"Aili Lin, Jinjun Cao, Dávid Murányi, Ding Yang, Weihai Li, Raorao Mo","doi":"10.1038/s41597-026-06631-7","DOIUrl":"https://doi.org/10.1038/s41597-026-06631-7","url":null,"abstract":"<p><p>The superfamily Nemouroidea (Plecoptera) represents one of the most diverse and ecologically significant groups of stoneflies, with nymphs serving as crucial bioindicators of freshwater ecosystem health due to their sensitivity to water quality. However, the evolutionary and genomic studies of this group have been hindered by the lack of high-quality reference genomes. Here, we present a chromosome-level genome assembly for Rhopalopsole triangulispina Mo and Li, 2025 within Nemouroidea, generated by integrating PacBio HiFi long reads, Illumina short reads, and Hi-C chromatin interaction data. The final assembly spans 347.119 Mb with a scaffold N50 of 27.479 Mb, and 96.91% (336.39 Mb) of the genome is anchored to 13 pseudochromosomes. BUSCO assessment reveals a high completeness of 98.4% (insecta_odb10). The genome contains 48.50% repetitive elements (168.35 Mb) and encodes 12,857 protein-coding genes, which were comprehensively annotated using homology, transcriptomic, and ab initio evidence. This high-quality genome provides a foundational resource for resolving phylogenetic relationships within Nemouroidea, advancing studies on insect genome evolution, and enhancing freshwater biomonitoring efforts through genomic tools.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146041523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SYSU_Topo: a 1-arc-minute global bathymetry from SWOT-derived gravity using the gravity-geological method. SYSU_Topo:利用重力地质方法从swt导出的重力进行1弧分全球水深测量。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-23 DOI: 10.1038/s41597-026-06641-5
Wei Feng, Dechao An, Cheinway Hwang, Mingzhi Sun, Xiaodong Chen, Zeyuan Zhang, Meng Yang, Min Zhong

The Surface Water and Ocean Topography (SWOT) mission provides novel perspectives. In this study, a new global bathymetric product, SYSU_Topo, is developed using gravity anomalies (the SWOT_02 model released by Scripps Institution of Oceanography) and the high-precision gravity-geological method (GGM). The data (NetCDF format; global range from 80°S to 80°N, 1-arc-minute resolution; variables: lat, lon, z) and processing codes are openly available for immediate reuse in ocean modeling, geophysics, and seafloor mapping. To reliably obtain the optimal density contrast for GGM, a sliding-window strategy of partition inversion was adopted, and a fusion method with boundary-constraint points is developed to effectively eliminate the splicing effect of partition inversion. The model has been reliably validated with 11,167,583 single-beam bathymetric points and newly added multibeam grid points from GEBCO_2024. The SYSU_Topo model achieves superior performance in the South China Sea, with a standard deviation of 132.07 m, which is 8%-26% better than other models. Compared to traditional altimeter-derived gravity anomalies, SWOT data exhibits greater potential in filling regions lacking high-precision bathymetry.

地表水和海洋地形(SWOT)任务提供了新的视角。本文利用重力异常(美国Scripps海洋学研究所发布的SWOT_02模型)和高精度重力地质方法(GGM),开发了一种新的全球水深测量产品SYSU_Topo。数据(NetCDF格式;全球范围从80°S到80°N, 1角分分辨率;变量:纬度、经度、z)和处理代码是公开的,可以立即在海洋建模、地球物理和海底测绘中重用。为了可靠地获得最优的GGM密度对比度,采用了滑动窗口分割反演策略,并提出了一种带有边界约束点的融合方法,有效消除了分割反演的拼接效应。利用来自GEBCO_2024的11167583个单波束测深点和新增的多波束网格点对模型进行了可靠验证。SYSU_Topo模型在南海海域表现优异,标准差为132.07 m,优于其他模型8% ~ 26%。与传统的高度计重力异常相比,SWOT数据在缺乏高精度测深的填充区域显示出更大的潜力。
{"title":"SYSU_Topo: a 1-arc-minute global bathymetry from SWOT-derived gravity using the gravity-geological method.","authors":"Wei Feng, Dechao An, Cheinway Hwang, Mingzhi Sun, Xiaodong Chen, Zeyuan Zhang, Meng Yang, Min Zhong","doi":"10.1038/s41597-026-06641-5","DOIUrl":"https://doi.org/10.1038/s41597-026-06641-5","url":null,"abstract":"<p><p>The Surface Water and Ocean Topography (SWOT) mission provides novel perspectives. In this study, a new global bathymetric product, SYSU_Topo, is developed using gravity anomalies (the SWOT_02 model released by Scripps Institution of Oceanography) and the high-precision gravity-geological method (GGM). The data (NetCDF format; global range from 80°S to 80°N, 1-arc-minute resolution; variables: lat, lon, z) and processing codes are openly available for immediate reuse in ocean modeling, geophysics, and seafloor mapping. To reliably obtain the optimal density contrast for GGM, a sliding-window strategy of partition inversion was adopted, and a fusion method with boundary-constraint points is developed to effectively eliminate the splicing effect of partition inversion. The model has been reliably validated with 11,167,583 single-beam bathymetric points and newly added multibeam grid points from GEBCO_2024. The SYSU_Topo model achieves superior performance in the South China Sea, with a standard deviation of 132.07 m, which is 8%-26% better than other models. Compared to traditional altimeter-derived gravity anomalies, SWOT data exhibits greater potential in filling regions lacking high-precision bathymetry.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146041543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Resolution Leaf Image Sequences with Geometric Alignment for Dynamic Phenotyping of Foliar Diseases. 基于几何对齐的高分辨率叶片图像序列用于叶面疾病的动态表型分析。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-23 DOI: 10.1038/s41597-026-06567-y
Jonas Anderegg, Bruce A McDonald

Time-resolved phenotyping of disease symptoms enables dissection of resistance mechanisms and improves diagnosis, but acquiring phenotypic data at satisfactory scale remains challenging. Advances in imaging and image processing have improved measurement precision, robustness, and throughput, but further improvements are needed for practical application. We present a data set comprising 12,520 high-resolution (~0.03 mm/pixel) RGB images representing 1,032 time series of wheat leaves with developing disease symptoms. All images are geometrically aligned with a median precision of 0.16 mm (≈5 pixels). The dataset includes transformation matrices, symptom segmentation masks, metadata on treatments, weather, crop phenology, and disease occurrence, and a lightweight Python toolkit for loading, aligning, inspecting, and editing image sequences. These resources enable detailed investigation of leaf-level disease dynamics such as lesion, pustule, and fruiting body emergence rates, lesion growth, and dynamic interactions of disease development with spatial and environmental contexts. They offer a broad basis for developing improved methods for image alignment and symptom detection, segmentation, and tracking, possibly by tackling these connected challenges within a single end-to-end framework.

时间解决的疾病症状表型分析能够解剖耐药机制并改善诊断,但在令人满意的规模上获取表型数据仍然具有挑战性。成像和图像处理技术的进步提高了测量精度、鲁棒性和吞吐量,但实际应用还需要进一步改进。我们提出了一个包含12520张高分辨率(~0.03 mm/像素)RGB图像的数据集,代表了1032张小麦叶片出现疾病症状的时间序列。所有图像以几何方式对齐,中位数精度为0.16 mm(≈5像素)。该数据集包括转换矩阵、症状分割掩码、有关治疗、天气、作物物候和疾病发生的元数据,以及用于加载、对齐、检查和编辑图像序列的轻量级Python工具包。这些资源可以详细研究叶片水平的疾病动态,如病变、脓疱和子实体出苗率、病变生长以及疾病发展与空间和环境背景的动态相互作用。它们为开发用于图像对齐和症状检测、分割和跟踪的改进方法提供了广泛的基础,可能通过在单个端到端框架内解决这些相互关联的挑战。
{"title":"High-Resolution Leaf Image Sequences with Geometric Alignment for Dynamic Phenotyping of Foliar Diseases.","authors":"Jonas Anderegg, Bruce A McDonald","doi":"10.1038/s41597-026-06567-y","DOIUrl":"https://doi.org/10.1038/s41597-026-06567-y","url":null,"abstract":"<p><p>Time-resolved phenotyping of disease symptoms enables dissection of resistance mechanisms and improves diagnosis, but acquiring phenotypic data at satisfactory scale remains challenging. Advances in imaging and image processing have improved measurement precision, robustness, and throughput, but further improvements are needed for practical application. We present a data set comprising 12,520 high-resolution (~0.03 mm/pixel) RGB images representing 1,032 time series of wheat leaves with developing disease symptoms. All images are geometrically aligned with a median precision of 0.16 mm (≈5 pixels). The dataset includes transformation matrices, symptom segmentation masks, metadata on treatments, weather, crop phenology, and disease occurrence, and a lightweight Python toolkit for loading, aligning, inspecting, and editing image sequences. These resources enable detailed investigation of leaf-level disease dynamics such as lesion, pustule, and fruiting body emergence rates, lesion growth, and dynamic interactions of disease development with spatial and environmental contexts. They offer a broad basis for developing improved methods for image alignment and symptom detection, segmentation, and tracking, possibly by tackling these connected challenges within a single end-to-end framework.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146041504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Extended VIIRS-like Artificial Nighttime Light Data Reconstruction (1986-2024). 扩展的类viirs人工夜间灯光数据重建(1986-2024)。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-23 DOI: 10.1038/s41597-026-06549-0
Yihe Tian, Kwan Man Cheng, Zhengbo Zhang, Tao Zhang, Junning Feng, Zhehao Ren, Suju Li, Dongmei Yan, Bing Xu

Artificial Night-Time Light (NTL) remote sensing is a vital proxy for quantifying the intensity and spatial distribution of human activities. Although the NPP-VIIRS sensor provides high-quality NTL observations, its temporal coverage, which begins in 2012, restricts long-term time-series studies that extend to earlier periods. Current extended VIIRS-like NTL data products suffer from two significant shortcomings: the underestimation of light intensity and the omission of structural details. To overcome these limitations, we present the Extended VIIRS-like Artificial Nighttime Light (EVAL) dataset, a new annual NTL dataset for China spanning from 1986 to 2024. This dataset was generated using a novel two-stage deep learning model designed to address the aforementioned shortcomings. The model first constructs an initial estimate and subsequently refines fine-grained structural details using high-resolution impervious surface data as guidance. Quantitative evaluations demonstrate that EVAL significantly outperforms state-of-the-art products, exhibiting superior temporal consistency and a stronger correlation with socioeconomic indicators.

人工夜间光(NTL)遥感是量化人类活动强度和空间分布的重要手段。尽管NPP-VIIRS传感器提供了高质量的NTL观测,但其2012年开始的时间覆盖范围限制了延伸到早期的长期时间序列研究。目前扩展的类似viirs的NTL数据产品有两个显著的缺点:光强的低估和结构细节的遗漏。为了克服这些限制,我们提出了扩展的类似viirs的人工夜间灯光(EVAL)数据集,这是一个新的1986 - 2024年中国年度NTL数据集。该数据集是使用一种新的两阶段深度学习模型生成的,旨在解决上述缺点。该模型首先构建初始估计,然后使用高分辨率不透水表面数据作为指导,细化细粒度结构细节。定量评价表明,EVAL显著优于最先进的产品,表现出优越的时间一致性和与社会经济指标更强的相关性。
{"title":"An Extended VIIRS-like Artificial Nighttime Light Data Reconstruction (1986-2024).","authors":"Yihe Tian, Kwan Man Cheng, Zhengbo Zhang, Tao Zhang, Junning Feng, Zhehao Ren, Suju Li, Dongmei Yan, Bing Xu","doi":"10.1038/s41597-026-06549-0","DOIUrl":"https://doi.org/10.1038/s41597-026-06549-0","url":null,"abstract":"<p><p>Artificial Night-Time Light (NTL) remote sensing is a vital proxy for quantifying the intensity and spatial distribution of human activities. Although the NPP-VIIRS sensor provides high-quality NTL observations, its temporal coverage, which begins in 2012, restricts long-term time-series studies that extend to earlier periods. Current extended VIIRS-like NTL data products suffer from two significant shortcomings: the underestimation of light intensity and the omission of structural details. To overcome these limitations, we present the Extended VIIRS-like Artificial Nighttime Light (EVAL) dataset, a new annual NTL dataset for China spanning from 1986 to 2024. This dataset was generated using a novel two-stage deep learning model designed to address the aforementioned shortcomings. The model first constructs an initial estimate and subsequently refines fine-grained structural details using high-resolution impervious surface data as guidance. Quantitative evaluations demonstrate that EVAL significantly outperforms state-of-the-art products, exhibiting superior temporal consistency and a stronger correlation with socioeconomic indicators.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146041528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An annotated dataset of Gram stains from positive blood cultures. 阳性血培养革兰氏染色的注释数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-23 DOI: 10.1038/s41597-026-06651-3
Qiaolian Yi, Xiaoyan Gou, Renyuan Zhu, Xiuli Xie, Mengting Hu, Xing Wang, Tai'e Wang, Kaiwen Xu, Ying-Chun Xu

Bloodstream infections (BSIs) of high morbidity and mortality are across all age groups, and urgent for accurate intervention. Gram stain interpretation of positive blood cultures (PBCs) is crucial for early diagnosing BSIs, yet this manual process is labor-intensive, time-consuming, and highly operator-dependent. Artificial intelligence (AI)-assisted microscopic interpretation of stained smears presents beneficial to microbiology diagnostics. Addressing the auto-identification of blood-culture Gram stains, this study introduces a dataset of Gram-stain smears collected in clinical practice. The dataset includes 505 microscopic images, covering up to 57 species associated with BSIs, with a total of 7528 annotations. These annotations categorized by staining characteristics and morphological features into cocci, bacilli, and fungi. We trained and validated an object detection model based on the YOLOv10 architecture on this dataset to automatically localize and classify these morphological categories in microscopic images. The publicly released dataset will help developments that utilize artificial intelligence to auto-interpretate the Gram stains from PBCs for routine clinical application.

血液感染(bsi)的高发病率和死亡率遍及所有年龄组,迫切需要进行准确的干预。革兰氏染色阳性血培养(pbc)的解释是早期诊断脑梗死的关键,但这一人工过程是劳动密集型的,耗时,高度依赖操作者。人工智能(AI)辅助染色涂片的显微解释对微生物学诊断有益。针对血液培养革兰氏染色的自动识别,本研究介绍了临床实践中收集的革兰氏染色涂片数据集。该数据集包括505张显微图像,涵盖了57种与bsi相关的物种,共有7528条注释。这些注释根据染色特征和形态特征分为球菌、杆菌和真菌。我们在该数据集上训练并验证了基于YOLOv10架构的目标检测模型,以自动定位和分类微观图像中的这些形态类别。公开发布的数据集将有助于利用人工智能自动解释pbc的革兰氏染色以用于常规临床应用。
{"title":"An annotated dataset of Gram stains from positive blood cultures.","authors":"Qiaolian Yi, Xiaoyan Gou, Renyuan Zhu, Xiuli Xie, Mengting Hu, Xing Wang, Tai'e Wang, Kaiwen Xu, Ying-Chun Xu","doi":"10.1038/s41597-026-06651-3","DOIUrl":"https://doi.org/10.1038/s41597-026-06651-3","url":null,"abstract":"<p><p>Bloodstream infections (BSIs) of high morbidity and mortality are across all age groups, and urgent for accurate intervention. Gram stain interpretation of positive blood cultures (PBCs) is crucial for early diagnosing BSIs, yet this manual process is labor-intensive, time-consuming, and highly operator-dependent. Artificial intelligence (AI)-assisted microscopic interpretation of stained smears presents beneficial to microbiology diagnostics. Addressing the auto-identification of blood-culture Gram stains, this study introduces a dataset of Gram-stain smears collected in clinical practice. The dataset includes 505 microscopic images, covering up to 57 species associated with BSIs, with a total of 7528 annotations. These annotations categorized by staining characteristics and morphological features into cocci, bacilli, and fungi. We trained and validated an object detection model based on the YOLOv10 architecture on this dataset to automatically localize and classify these morphological categories in microscopic images. The publicly released dataset will help developments that utilize artificial intelligence to auto-interpretate the Gram stains from PBCs for routine clinical application.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146041574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Chinese Elementary Science Question Dataset in Problem-Solving Process Generation. 基于问题解决过程生成的中文基础科学问题数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-23 DOI: 10.1038/s41597-026-06618-4
Dong Li, Zhi Liu, Chaodong Wen, Jiaxin Guo, Taotao Long, Xian Peng

Although large language models (LLMs) demonstrate significant potential for advancing personalized science education, they face challenges in generating science problem-solving processes adapted to students' grade levels. In this paper, we developed a Chinese Science Question (CSQ) dataset, which comprises both a benchmark and a training set, aiming to evaluate and enhance the science problem-solving capabilities of LLMs. The CSQ consists of 12,000 high-quality samples featuring a variety of question types and diverse discipline properties, covering four subjects and multiple topics at the Chinese primary school. We further designed the language model to reflect these discipline properties in the generated responses, emulating the thought process of students when solving science questions. We demonstrated that CSQ and its extensive annotations can be employed for fine-tuning models. This was confirmed through both automatic and human evaluations, particularly in generating problem-solving processes that are aligned with students' grade levels.

尽管大型语言模型(llm)在推进个性化科学教育方面显示出巨大的潜力,但它们在生成适应学生年级水平的科学问题解决过程方面面临挑战。在本文中,我们开发了一个中国科学问题(CSQ)数据集,该数据集包括一个基准和一个训练集,旨在评估和提高法学硕士解决科学问题的能力。CSQ由12000个高质量的样本组成,具有多种问题类型和多种学科属性,涵盖了中国小学的四个学科和多个主题。我们进一步设计了语言模型,以在生成的回答中反映这些学科属性,模拟学生在解决科学问题时的思维过程。我们证明了CSQ及其广泛的注释可以用于微调模型。这一点通过自动和人工评估得到了证实,特别是在生成与学生年级水平一致的问题解决过程中。
{"title":"A Chinese Elementary Science Question Dataset in Problem-Solving Process Generation.","authors":"Dong Li, Zhi Liu, Chaodong Wen, Jiaxin Guo, Taotao Long, Xian Peng","doi":"10.1038/s41597-026-06618-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06618-4","url":null,"abstract":"<p><p>Although large language models (LLMs) demonstrate significant potential for advancing personalized science education, they face challenges in generating science problem-solving processes adapted to students' grade levels. In this paper, we developed a Chinese Science Question (CSQ) dataset, which comprises both a benchmark and a training set, aiming to evaluate and enhance the science problem-solving capabilities of LLMs. The CSQ consists of 12,000 high-quality samples featuring a variety of question types and diverse discipline properties, covering four subjects and multiple topics at the Chinese primary school. We further designed the language model to reflect these discipline properties in the generated responses, emulating the thought process of students when solving science questions. We demonstrated that CSQ and its extensive annotations can be employed for fine-tuning models. This was confirmed through both automatic and human evaluations, particularly in generating problem-solving processes that are aligned with students' grade levels.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In-vivo optical properties spectra across five body locations on ten subjects using time-domain diffuse optics. 使用时域漫射光学在10个受试者的5个身体位置的体内光学特性光谱。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-22 DOI: 10.1038/s41597-026-06586-9
Vamshi Damagatla, Siënna Karremans, Alessandro Bossi, Edoardo Martinenghi, Srirang Manohar, Paola Taroni, Rinaldo Cubeddu, Antonio Pifferi, Ilaria Bargigia

We present a comprehensive dataset of absorption and reduced scattering spectra collected via time-domain diffuse optical spectroscopy in the 610-1110 nm range, across 10 subjects and on 5 different body locations - the upper arm, the radius-ulna region, the abdomen, the forehead, and the calcaneus. The ultrasound images acquired in the same location are included as well, and along with the demographic information shed useful insights on the inter-subject variability. The dataset, openly available in Zenodo, contains the raw data, the meta data, the tools to operate on them, and can be exploited to devise light-based diagnostics or therapeutic techniques, to appreciate biological variability, and also to test different models of photon migration.

我们提供了一个综合的吸收和减少散射光谱数据集,通过时域漫射光谱学在610-1110 nm范围内收集,涉及10名受试者和5个不同的身体部位-上臂,尺骨桡骨区,腹部,前额和跟骨。在同一位置获得的超声图像也包括在内,并与人口统计信息一起揭示了对主体间变异性的有用见解。该数据集在Zenodo公开提供,包含原始数据、元数据、对其进行操作的工具,可以用于设计基于光的诊断或治疗技术,以了解生物可变性,并测试不同的光子迁移模型。
{"title":"In-vivo optical properties spectra across five body locations on ten subjects using time-domain diffuse optics.","authors":"Vamshi Damagatla, Siënna Karremans, Alessandro Bossi, Edoardo Martinenghi, Srirang Manohar, Paola Taroni, Rinaldo Cubeddu, Antonio Pifferi, Ilaria Bargigia","doi":"10.1038/s41597-026-06586-9","DOIUrl":"https://doi.org/10.1038/s41597-026-06586-9","url":null,"abstract":"<p><p>We present a comprehensive dataset of absorption and reduced scattering spectra collected via time-domain diffuse optical spectroscopy in the 610-1110 nm range, across 10 subjects and on 5 different body locations - the upper arm, the radius-ulna region, the abdomen, the forehead, and the calcaneus. The ultrasound images acquired in the same location are included as well, and along with the demographic information shed useful insights on the inter-subject variability. The dataset, openly available in Zenodo, contains the raw data, the meta data, the tools to operate on them, and can be exploited to devise light-based diagnostics or therapeutic techniques, to appreciate biological variability, and also to test different models of photon migration.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The dataset for extending EMNIST evaluation. 扩展EMNIST评估的数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-22 DOI: 10.1038/s41597-025-06291-z
Julian Szymański, Kacper Skarżyński, Błażej Szutenberg, Klaudia Ratkowska, Szymon Drywa

The paper describes the dataset for a deeper evaluation of the machine learning models for handwritten character recognition. For that purpose, we build a dataset that, combined with existing NIST Databases, offers possibilities for additional analysis of the models built on these data. The paper summarizes the most popular publicly available machine learning models, trained on the EMNIST-letters dataset. We discuss issues related to the evaluation of state-of-the-art results that have been made by comparing accuracy achieved on the test set built in cross-validation setting. We propose additional evaluation on new, independently constructed data, unaffiliated with the NIST database authors. The dataset and source codes have been made available using Gdansk Tech University repository Most Wiedzy.

本文描述了用于手写字符识别的机器学习模型的更深入评估的数据集。为此,我们建立了一个数据集,与现有的NIST数据库相结合,提供了对基于这些数据的模型进行额外分析的可能性。本文总结了在EMNIST-letters数据集上训练的最流行的公开可用的机器学习模型。我们讨论了通过比较在交叉验证设置中构建的测试集上实现的准确性来评估最先进结果的相关问题。我们建议对新的、独立构建的、与NIST数据库作者无关的数据进行额外的评估。数据集和源代码已经通过格但斯克理工大学的资源库Most Wiedzy提供。
{"title":"The dataset for extending EMNIST evaluation.","authors":"Julian Szymański, Kacper Skarżyński, Błażej Szutenberg, Klaudia Ratkowska, Szymon Drywa","doi":"10.1038/s41597-025-06291-z","DOIUrl":"10.1038/s41597-025-06291-z","url":null,"abstract":"<p><p>The paper describes the dataset for a deeper evaluation of the machine learning models for handwritten character recognition. For that purpose, we build a dataset that, combined with existing NIST Databases, offers possibilities for additional analysis of the models built on these data. The paper summarizes the most popular publicly available machine learning models, trained on the EMNIST-letters dataset. We discuss issues related to the evaluation of state-of-the-art results that have been made by comparing accuracy achieved on the test set built in cross-validation setting. We propose additional evaluation on new, independently constructed data, unaffiliated with the NIST database authors. The dataset and source codes have been made available using Gdansk Tech University repository Most Wiedzy.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"13 1","pages":"73"},"PeriodicalIF":6.9,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12827260/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Telomere-to-telomere gap-free genome assembly of the Opsariichthys evolans (Cypriniformes: Cyprinidae). 进化鱼(鲤形目:鲤科)端粒-端粒无间隙基因组组装。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-22 DOI: 10.1038/s41597-026-06588-7
Pan Wang, Xinyue Wang, Denghua Yin, Jie Liu, Min Jiang, Kai Liu

Opsariichthys evolans is a stream-dwelling fish species endemic to China, with its primary distribution encompassing southeastern China and Taiwan. Initially classified under the genus Zacco, it was later reclassified into the genus Opsariichthys based primarily on mitochondrial DNA evidence. However, this taxonomic revision remains partially inconclusive due to the absence of whole-genome data. Therefore, we assembled a telomere-to-telomere, gap-free genome assembly of O. evolans, consisting of 39 chromosomes with one contiguous sequence per chromosome. The assembly had a total size of 886.9 Mb and a contig N50 of 25.44 Mb. The presence of the telomere repeat was clearly confirmed in the genome. BUSCO assessment confirmed 99.34% genome completeness. Collinearity analysis revealed high synteny between O. evolans, O. bidens and Zacco platypus. Genomic comparisons revealed key candidate genes and related biological pathways potentially responsible for color patterning and hydrodynamic adaptation. The complete O. evolans genome provides insights into its genome structure and function, and supports the taxonomic reclassification between the genera Opsariichthys and Zacco.

进化鱼(Opsariichthys evolans)是中国特有的流栖鱼类,主要分布于中国东南部和台湾地区。最初被归类为Zacco属,后来主要基于线粒体DNA证据被重新归类为Opsariichthys属。然而,由于缺乏全基因组数据,这一分类修订仍然部分不确定。因此,我们组装了一个由39条染色体组成的端粒到端粒、无间隙的进化猿基因组组装,每条染色体有一个连续的序列。该程序集的总大小为886.9 Mb, N50为25.44 Mb。端粒重复序列的存在在基因组中得到了明确的证实。BUSCO鉴定证实了99.34%的基因组完整性。共线性分析表明,O. evolans、O. bidens和Zacco鸭嘴兽具有较高的同步性。基因组比较揭示了关键的候选基因和相关的生物学途径可能负责颜色图案和水动力适应。完整的o . evolans基因组提供了洞察其基因组结构和功能,并支持之间的分类重新分类属Opsariichthys Zacco。
{"title":"Telomere-to-telomere gap-free genome assembly of the Opsariichthys evolans (Cypriniformes: Cyprinidae).","authors":"Pan Wang, Xinyue Wang, Denghua Yin, Jie Liu, Min Jiang, Kai Liu","doi":"10.1038/s41597-026-06588-7","DOIUrl":"https://doi.org/10.1038/s41597-026-06588-7","url":null,"abstract":"<p><p>Opsariichthys evolans is a stream-dwelling fish species endemic to China, with its primary distribution encompassing southeastern China and Taiwan. Initially classified under the genus Zacco, it was later reclassified into the genus Opsariichthys based primarily on mitochondrial DNA evidence. However, this taxonomic revision remains partially inconclusive due to the absence of whole-genome data. Therefore, we assembled a telomere-to-telomere, gap-free genome assembly of O. evolans, consisting of 39 chromosomes with one contiguous sequence per chromosome. The assembly had a total size of 886.9 Mb and a contig N50 of 25.44 Mb. The presence of the telomere repeat was clearly confirmed in the genome. BUSCO assessment confirmed 99.34% genome completeness. Collinearity analysis revealed high synteny between O. evolans, O. bidens and Zacco platypus. Genomic comparisons revealed key candidate genes and related biological pathways potentially responsible for color patterning and hydrodynamic adaptation. The complete O. evolans genome provides insights into its genome structure and function, and supports the taxonomic reclassification between the genera Opsariichthys and Zacco.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Scientific Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1