首页 > 最新文献

Scientific Data最新文献

英文 中文
The missing link in FAIR data policy: biodata resources in life sciences. 公平数据政策缺失的一环:生命科学中的生物数据资源。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1038/s41597-026-06690-w
Lucy Poveda, Gavin Farrell, Silvio C E Tosatto, Monique Zahn-Zabal, Patrick Ruch, Julien Gobeill, Robert M Waterhouse, Christophe Dessimoz
{"title":"The missing link in FAIR data policy: biodata resources in life sciences.","authors":"Lucy Poveda, Gavin Farrell, Silvio C E Tosatto, Monique Zahn-Zabal, Patrick Ruch, Julien Gobeill, Robert M Waterhouse, Christophe Dessimoz","doi":"10.1038/s41597-026-06690-w","DOIUrl":"https://doi.org/10.1038/s41597-026-06690-w","url":null,"abstract":"","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EgyPLI: A Real-life Annotated Image Dataset for Egyptian Plant Leaf Identification. EgyPLI:用于埃及植物叶片识别的真实注释图像数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1038/s41597-025-06539-8
Amany M Sarhan, Mahmoud A Shaheen

The Egyptian Plant Leaf Image Dataset (EgyPLI) is the first comprehensive collection of plant leaf images curated in Egypt to support research in automated plant identification. It addresses the lack of locally representative datasets and the broader need for geographically diverse data to enable the development of generalized models. EgyPLI contains real-world leaf images captured under varying viewpoints, lighting conditions, and background clutter, reflecting realistic agricultural environments. Unlike laboratory-controlled datasets, it includes natural noise and variability, supporting the training of robust deep learning models suitable for real deployment. The dataset is carefully annotated and preprocessed to establish a consistent standard for plant identification tasks. EgyPLI comprises 3,588 images covering eight widely cultivated plant species: apple, berry, fig, guava, orange, plum, persimmon, and tomato, including both healthy and diseased leaves. This diversity supports classification, diagnosis, and health assessment applications. To demonstrate its effectiveness, the dataset was evaluated using ResNet50, VGG16, and a custom CNN, achieving accuracies of 61.67%, 96.81%, and 99.22%, respectively. As an available resource, EgyPLI fills a critical gap.

埃及植物叶片图像数据集(EgyPLI)是埃及第一个全面的植物叶片图像集合,用于支持自动植物识别研究。它解决了缺乏具有本地代表性的数据集以及对地理上多样化数据的更广泛需求,以便能够开发通用模型。EgyPLI包含在不同视点、光照条件和背景杂波下拍摄的真实树叶图像,反映了真实的农业环境。与实验室控制的数据集不同,它包含自然噪声和可变性,支持适合实际部署的鲁棒深度学习模型的训练。数据集经过仔细注释和预处理,为植物识别任务建立一致的标准。EgyPLI包含3,588张图像,涵盖8种广泛种植的植物物种:苹果、浆果、无花果、番石榴、橙子、李子、柿子和西红柿,包括健康和患病的叶子。这种多样性支持分类、诊断和健康评估应用程序。为了证明其有效性,使用ResNet50、VGG16和自定义CNN对数据集进行了评估,准确率分别为61.67%、96.81%和99.22%。作为一种可用的资源,EgyPLI填补了一个关键的空白。
{"title":"EgyPLI: A Real-life Annotated Image Dataset for Egyptian Plant Leaf Identification.","authors":"Amany M Sarhan, Mahmoud A Shaheen","doi":"10.1038/s41597-025-06539-8","DOIUrl":"10.1038/s41597-025-06539-8","url":null,"abstract":"<p><p>The Egyptian Plant Leaf Image Dataset (EgyPLI) is the first comprehensive collection of plant leaf images curated in Egypt to support research in automated plant identification. It addresses the lack of locally representative datasets and the broader need for geographically diverse data to enable the development of generalized models. EgyPLI contains real-world leaf images captured under varying viewpoints, lighting conditions, and background clutter, reflecting realistic agricultural environments. Unlike laboratory-controlled datasets, it includes natural noise and variability, supporting the training of robust deep learning models suitable for real deployment. The dataset is carefully annotated and preprocessed to establish a consistent standard for plant identification tasks. EgyPLI comprises 3,588 images covering eight widely cultivated plant species: apple, berry, fig, guava, orange, plum, persimmon, and tomato, including both healthy and diseased leaves. This diversity supports classification, diagnosis, and health assessment applications. To demonstrate its effectiveness, the dataset was evaluated using ResNet50, VGG16, and a custom CNN, achieving accuracies of 61.67%, 96.81%, and 99.22%, respectively. As an available resource, EgyPLI fills a critical gap.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":"203"},"PeriodicalIF":6.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12886926/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Establishing dermatopathology encyclopedia DermpathNet with Artificial Intelligence-Based Workflow. 基于人工智能工作流的皮肤病理百科DermpathNet的建立。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1038/s41597-026-06715-4
Ziyang Xu, Mingquan Lin, Yiliang Zhou, Zihan Xu, Seth J Orlow, Shane A Meehan, Alexandra Flamm, Ata S Moshiri, Yifan Peng

Accessing high-quality, open-access dermatopathology image datasets for learning and cross-referencing is a common challenge for clinicians and trainees. To establish a comprehensive open-access dermatopathology dataset for educational, cross-referencing, and machine-learning purposes, we employed a hybrid workflow to curate and categorize images from the PubMed Central (PMC) repository. We used specific keywords to extract relevant images, and classified them using a novel hybrid method that combined deep learning-based image modality classification with figure caption analyses. Validation on 651 manually annotated images demonstrated the robustness of our workflow, with an F-score of 89.6% for the deep learning approach, 61.0% for the keyword-based retrieval method, and 90.4% for the hybrid approach. We retrieved over 7,772 images across 166 diagnoses and released this fully annotated dataset, reviewed by board-certified dermatopathologists. Using our dataset as a challenging task, we found the current image analysis algorithm from OpenAI inadequate for analyzing dermatopathology images. In conclusion, we have developed a large, peer-reviewed, open-access dermatopathology image dataset, DermpathNet, which features a semi-automated curation workflow.

获取用于学习和交叉参考的高质量、开放获取的皮肤病理学图像数据集是临床医生和学员面临的共同挑战。为了建立一个全面的开放存取皮肤病理学数据集,用于教育、交叉参考和机器学习目的,我们采用混合工作流程对PubMed Central (PMC)存储库中的图像进行整理和分类。我们使用特定的关键词提取相关图像,并使用基于深度学习的图像模态分类与图片标题分析相结合的新型混合方法对其进行分类。在651张手动标注的图像上验证了我们工作流程的鲁棒性,深度学习方法的f值为89.6%,基于关键字的检索方法为61.0%,混合方法为90.4%。我们检索了166种诊断中的7772张图像,并发布了这个完全注释的数据集,由委员会认证的皮肤病理学家审查。使用我们的数据集作为一项具有挑战性的任务,我们发现OpenAI目前的图像分析算法不足以分析皮肤病理图像。总之,我们开发了一个大型的、同行评审的、开放获取的皮肤病理学图像数据集,DermpathNet,它具有半自动化的管理工作流程。
{"title":"Establishing dermatopathology encyclopedia DermpathNet with Artificial Intelligence-Based Workflow.","authors":"Ziyang Xu, Mingquan Lin, Yiliang Zhou, Zihan Xu, Seth J Orlow, Shane A Meehan, Alexandra Flamm, Ata S Moshiri, Yifan Peng","doi":"10.1038/s41597-026-06715-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06715-4","url":null,"abstract":"<p><p>Accessing high-quality, open-access dermatopathology image datasets for learning and cross-referencing is a common challenge for clinicians and trainees. To establish a comprehensive open-access dermatopathology dataset for educational, cross-referencing, and machine-learning purposes, we employed a hybrid workflow to curate and categorize images from the PubMed Central (PMC) repository. We used specific keywords to extract relevant images, and classified them using a novel hybrid method that combined deep learning-based image modality classification with figure caption analyses. Validation on 651 manually annotated images demonstrated the robustness of our workflow, with an F-score of 89.6% for the deep learning approach, 61.0% for the keyword-based retrieval method, and 90.4% for the hybrid approach. We retrieved over 7,772 images across 166 diagnoses and released this fully annotated dataset, reviewed by board-certified dermatopathologists. Using our dataset as a challenging task, we found the current image analysis algorithm from OpenAI inadequate for analyzing dermatopathology images. In conclusion, we have developed a large, peer-reviewed, open-access dermatopathology image dataset, DermpathNet, which features a semi-automated curation workflow.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Indonesia Election Archive: Institutions, candidates and results. 更正:印度尼西亚选举档案:机构、候选人和结果。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1038/s41597-026-06770-x
Khoirunnisa Agustyati, Heroik Pratama, Jóhanna K Birnir, Henry Overos, Noory Okthariza, Iqbal Kholidin, Fadli Ramadhanil, Amalinda Savirani
{"title":"Correction: Indonesia Election Archive: Institutions, candidates and results.","authors":"Khoirunnisa Agustyati, Heroik Pratama, Jóhanna K Birnir, Henry Overos, Noory Okthariza, Iqbal Kholidin, Fadli Ramadhanil, Amalinda Savirani","doi":"10.1038/s41597-026-06770-x","DOIUrl":"10.1038/s41597-026-06770-x","url":null,"abstract":"","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"13 1","pages":"178"},"PeriodicalIF":6.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12881639/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Chain-of-thought Reasoning Breast Ultrasound Dataset Covering All Histopathology Categories. 一个涵盖所有组织病理学类别的思维链推理乳腺超声数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1038/s41597-026-06702-9
Haojun Yu, Youcheng Li, Zihan Niu, Nan Zhang, Xuantong Gong, Huan Li, Zhiying Zou, Haifeng Qi, Zhenxiao Cao, Zijie Lan, Xingjian Yuan, Jiating He, Haokai Zhang, Shengtao Zhang, Zicheng Wang, Dong Wang, Ziwei Zhao, Congying Chen, Yong Wang, Wangyan Qin, Qingli Zhu, Liwei Wang

Breast ultrasound (BUS) is an essential tool for diagnosing breast lesions, with millions of examinations per year. However, publicly available high-quality BUS benchmarks for AI development are limited in data scale and annotation richness. In this work, we present BUS-CoT, a BUS dataset for chain-of-thought (CoT) reasoning analysis, which contains 11,439 ultrasound images from 11,850 lesions and 4,838 patients, covering all 99 WHO-defined histopathology categories. For model training and evaluation, we provide a curated high-quality subset of 5,163 lesion-focused images annotated by experienced radiologists. To facilitate research on incentivizing CoT reasoning, we construct the reasoning processes based on observation, feature, diagnosis and pathology labels, annotated and verified by experienced experts. Moreover, by covering lesions of all histopathology types, we aim to facilitate robust AI systems in rare cases, which can be error-prone in clinical practice. The data and code are publicly available at https://doi.org/10.6084/m9.figshare.30838715.

乳腺超声(BUS)是诊断乳腺病变的重要工具,每年进行数百万次检查。然而,公开可用的用于人工智能开发的高质量总线基准在数据规模和注释丰富度方面受到限制。在这项工作中,我们提出了BUS-CoT,这是一个用于思维链(CoT)推理分析的BUS数据集,其中包含来自11,850个病变和4,838名患者的11,439张超声图像,涵盖了所有99个who定义的组织病理学类别。对于模型训练和评估,我们提供了由经验丰富的放射科医生注释的5163个病灶聚焦图像的精心策划的高质量子集。为了促进激励CoT推理的研究,我们基于观察、特征、诊断和病理标签构建推理过程,并由经验丰富的专家进行注释和验证。此外,通过覆盖所有组织病理学类型的病变,我们的目标是在罕见的情况下促进强大的人工智能系统,这在临床实践中可能容易出错。数据和代码可在https://doi.org/10.6084/m9.figshare.30838715上公开获取。
{"title":"A Chain-of-thought Reasoning Breast Ultrasound Dataset Covering All Histopathology Categories.","authors":"Haojun Yu, Youcheng Li, Zihan Niu, Nan Zhang, Xuantong Gong, Huan Li, Zhiying Zou, Haifeng Qi, Zhenxiao Cao, Zijie Lan, Xingjian Yuan, Jiating He, Haokai Zhang, Shengtao Zhang, Zicheng Wang, Dong Wang, Ziwei Zhao, Congying Chen, Yong Wang, Wangyan Qin, Qingli Zhu, Liwei Wang","doi":"10.1038/s41597-026-06702-9","DOIUrl":"https://doi.org/10.1038/s41597-026-06702-9","url":null,"abstract":"<p><p>Breast ultrasound (BUS) is an essential tool for diagnosing breast lesions, with millions of examinations per year. However, publicly available high-quality BUS benchmarks for AI development are limited in data scale and annotation richness. In this work, we present BUS-CoT, a BUS dataset for chain-of-thought (CoT) reasoning analysis, which contains 11,439 ultrasound images from 11,850 lesions and 4,838 patients, covering all 99 WHO-defined histopathology categories. For model training and evaluation, we provide a curated high-quality subset of 5,163 lesion-focused images annotated by experienced radiologists. To facilitate research on incentivizing CoT reasoning, we construct the reasoning processes based on observation, feature, diagnosis and pathology labels, annotated and verified by experienced experts. Moreover, by covering lesions of all histopathology types, we aim to facilitate robust AI systems in rare cases, which can be error-prone in clinical practice. The data and code are publicly available at https://doi.org/10.6084/m9.figshare.30838715.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human gut archaea collection from Estonian population. 爱沙尼亚人肠道古生菌收集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1038/s41597-026-06742-1
Kateryna Pantiukh, Elin Org

While microbiota plays a crucial role in maintaining overall health, archaea, a component of microbiota, remain relatively unexplored. Here, we present a newly assembled set of archaeal metagenome-assembled genomes (MAGs) from 1,878 fecal microbiome samples. These MAGs were reconstructed from metagenomic reads of the Estonian Microbiome Deep (EstMB-deep) cohort, which were reused here specifically for archaeal MAG reconstruction. We identified 273 archaeal MAGs, representing 21 species and 144 strains which we curated into the "EstMB MAGdb Archaea-273" MAGs collection.

虽然微生物群在维持整体健康方面起着至关重要的作用,但作为微生物群的一个组成部分,古细菌仍然相对未被探索。在这里,我们从1878个粪便微生物组样本中新组装了一组古细菌宏基因组组装基因组(MAGs)。这些MAG是根据爱沙尼亚微生物组深度(EstMB-deep)队列的宏基因组读数重建的,在这里专门用于古细菌MAG重建。我们鉴定了273个古细菌MAGs,代表21种和144株,我们将其整理成“EstMB MAGdb Archaea-273”MAGs集合。
{"title":"Human gut archaea collection from Estonian population.","authors":"Kateryna Pantiukh, Elin Org","doi":"10.1038/s41597-026-06742-1","DOIUrl":"https://doi.org/10.1038/s41597-026-06742-1","url":null,"abstract":"<p><p>While microbiota plays a crucial role in maintaining overall health, archaea, a component of microbiota, remain relatively unexplored. Here, we present a newly assembled set of archaeal metagenome-assembled genomes (MAGs) from 1,878 fecal microbiome samples. These MAGs were reconstructed from metagenomic reads of the Estonian Microbiome Deep (EstMB-deep) cohort, which were reused here specifically for archaeal MAG reconstruction. We identified 273 archaeal MAGs, representing 21 species and 144 strains which we curated into the \"EstMB MAGdb Archaea-273\" MAGs collection.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: The Ouranos CRCM5-CMIP6 ensemble: A dynamically downscaled ensemble of CMIP6 simulations over North America. 更正:Ouranos CRCM5-CMIP6集合:北美地区CMIP6模拟的动态缩小集合。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1038/s41597-026-06772-9
Dominique Paquin, Christopher D McCray, Charles B Gauthier, Michel Giguère, Olivier Asselin, Pascal Bourgault, Marie-Pier Labonté, Dominic Matte
{"title":"Correction: The Ouranos CRCM5-CMIP6 ensemble: A dynamically downscaled ensemble of CMIP6 simulations over North America.","authors":"Dominique Paquin, Christopher D McCray, Charles B Gauthier, Michel Giguère, Olivier Asselin, Pascal Bourgault, Marie-Pier Labonté, Dominic Matte","doi":"10.1038/s41597-026-06772-9","DOIUrl":"10.1038/s41597-026-06772-9","url":null,"abstract":"","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"13 1","pages":"176"},"PeriodicalIF":6.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12881436/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction:A bimodal image dataset for seed classification from the visible and near-infrared spectrum. 更正:种子分类的双峰图像数据集,来自可见光和近红外光谱。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1038/s41597-026-06771-w
Maksim Kukushkin, Martin Bogdan, Simon Goertz, Jan-Ole Callsen, Eric Oldenburg, Matthias Enders, Thomas Schmid
{"title":"Correction:A bimodal image dataset for seed classification from the visible and near-infrared spectrum.","authors":"Maksim Kukushkin, Martin Bogdan, Simon Goertz, Jan-Ole Callsen, Eric Oldenburg, Matthias Enders, Thomas Schmid","doi":"10.1038/s41597-026-06771-w","DOIUrl":"10.1038/s41597-026-06771-w","url":null,"abstract":"","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"13 1","pages":"177"},"PeriodicalIF":6.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12881475/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A High Magnifications Histopathology Image Dataset for Oral Squamous Cell Carcinoma Diagnosis and Prognosis. 一个用于口腔鳞状细胞癌诊断和预后的高倍组织病理学图像数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1038/s41597-026-06736-z
Jinquan Guan, Junhong Guo, Qi Chen, Jian Chen, Yongkang Cai, Yilin He, Zhiquan Huang, Yan Wang, Yutong Xie

Oral Squamous Cell Carcinoma (OSCC) is a prevalent and aggressive malignancy where deep learning-based computer-aided diagnosis and prognosis can enhance clinical assessments. However, existing publicly available OSCC datasets often suffer from limited patient cohorts and a restricted focus on either diagnostic or prognostic tasks, limiting the development of comprehensive and generalizable models. To bridge this gap, we introduce Multi-OSCC, a new histopathology image dataset comprising 1,325 OSCC patients, integrating both diagnostic and prognostic information to expand existing public resources. Each patient is represented by six high resolution histopathology images captured at ×200, ×400, and ×1000-two per magnification-covering both the core and edge tumor regions. The Multi-OSCC dataset is richly annotated for six critical clinical tasks: recurrence prediction (REC), lymph node metastasis (LNM), tumor differentiation (TD), tumor invasion (TI), cancer embolus (CE), and perineural invasion (PI). We systematically evaluate the impact of different visual encoders, multi-image fusion techniques, stain normalization, and multi-task learning frameworks to benchmark this dataset. To accelerate future research, we publicly release the Multi-OSCC dataset at: https://github.com/guanjinquan/OSCC-PathologyImageDataset.

口腔鳞状细胞癌(OSCC)是一种普遍的侵袭性恶性肿瘤,基于深度学习的计算机辅助诊断和预后可以增强临床评估。然而,现有的公开可用的OSCC数据集往往存在患者队列有限的问题,并且对诊断或预后任务的关注受到限制,限制了全面和可推广模型的发展。为了弥补这一差距,我们引入了一个新的组织病理学图像数据集,包括1325名OSCC患者,整合了诊断和预后信息,以扩大现有的公共资源。每名患者由6张高分辨率组织病理学图像代表,每张图像分别在×200、×400和×1000-two处拍摄,覆盖核心和边缘肿瘤区域。Multi-OSCC数据集对六个关键的临床任务进行了丰富的注释:复发预测(REC)、淋巴结转移(LNM)、肿瘤分化(TD)、肿瘤侵袭(TI)、癌栓塞(CE)和神经周围侵袭(PI)。我们系统地评估了不同视觉编码器、多图像融合技术、染色归一化和多任务学习框架对该数据集的影响。为了加速未来的研究,我们公开发布了Multi-OSCC数据集:https://github.com/guanjinquan/OSCC-PathologyImageDataset。
{"title":"A High Magnifications Histopathology Image Dataset for Oral Squamous Cell Carcinoma Diagnosis and Prognosis.","authors":"Jinquan Guan, Junhong Guo, Qi Chen, Jian Chen, Yongkang Cai, Yilin He, Zhiquan Huang, Yan Wang, Yutong Xie","doi":"10.1038/s41597-026-06736-z","DOIUrl":"https://doi.org/10.1038/s41597-026-06736-z","url":null,"abstract":"<p><p>Oral Squamous Cell Carcinoma (OSCC) is a prevalent and aggressive malignancy where deep learning-based computer-aided diagnosis and prognosis can enhance clinical assessments. However, existing publicly available OSCC datasets often suffer from limited patient cohorts and a restricted focus on either diagnostic or prognostic tasks, limiting the development of comprehensive and generalizable models. To bridge this gap, we introduce Multi-OSCC, a new histopathology image dataset comprising 1,325 OSCC patients, integrating both diagnostic and prognostic information to expand existing public resources. Each patient is represented by six high resolution histopathology images captured at ×200, ×400, and ×1000-two per magnification-covering both the core and edge tumor regions. The Multi-OSCC dataset is richly annotated for six critical clinical tasks: recurrence prediction (REC), lymph node metastasis (LNM), tumor differentiation (TD), tumor invasion (TI), cancer embolus (CE), and perineural invasion (PI). We systematically evaluate the impact of different visual encoders, multi-image fusion techniques, stain normalization, and multi-task learning frameworks to benchmark this dataset. To accelerate future research, we publicly release the Multi-OSCC dataset at: https://github.com/guanjinquan/OSCC-PathologyImageDataset.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multidimensional transcriptome dataset for systematic evaluation of Jakyakgamcho-tang-induced cell signatures. 用于系统评估jakyakgamcho -tang诱导的细胞特征的多维转录组数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-06 DOI: 10.1038/s41597-026-06759-6
Su-Jin Baek, Haeseung Lee, Sang-Min Park, Aeyung Kim, No Soo Kim, Eun-Hye Seo, A Yeong Lee, Yu Ri Kim, Wook Jin Kim, Kyu-Won Seo, Musun Park, Jin-Mu Yi, Seongwon Cha

Jakyakgamcho-tang (JGT), the simplest form of herbal medicine, comprises Paeoniae Radix (PR) and Glycyrrhizae Radix et Rhizoma (GR). It has been used to treat muscle-related diseases and inflammation. However, its pharmacological effects may vary with the proportions of ingredients and preparatory factors such as the extraction method. Nevertheless, gene expression datasets systematically reflecting these variables are lacking. A total of 513 transcriptome profiles were created with three concentrations and three replicates of RNA-seq data. This dataset structure will enable multidimensional analysis of the effects of various JGT preparation factors on gene expression; these factors include the PR to GR proportional ratio (2:1, 1:1, and 1:2), solvent (water or 70% ethanol), and extraction method (combined or individual extraction method). The HepG2, C2C12, and PC12 cell lines were targeted. All raw and preprocessed data are available through GEO. Standardized metadata and ingredient data are also provided. This dataset provides a foundation for exploring traditional herbal formulations effects on cellular transcriptomic responses and can facilitate the scientific optimization of herbal medicines.

最简单形式的中草药“百草汤”(JGT)由芍药(Paeoniae Radix et Rhizoma, PR)和甘草(Glycyrrhizae Radix et Rhizoma, GR)组成。它已被用于治疗肌肉相关疾病和炎症。但其药理作用可能因成分比例和提取方法等预备因素而异。然而,缺乏系统地反映这些变量的基因表达数据集。用三种浓度和三次重复的RNA-seq数据共创建了513个转录组谱。该数据集结构将能够多维度分析各种JGT制备因子对基因表达的影响;这些因素包括PR与GR的比例(2:1,1:1和1:2),溶剂(水或70%乙醇)和提取方法(联合或单独提取法)。以HepG2、C2C12和PC12细胞系为靶点。所有原始和预处理数据都可以通过GEO获得。还提供了标准化的元数据和成分数据。该数据集为探索传统中药配方对细胞转录组反应的影响提供了基础,并为中药的科学优化提供了便利。
{"title":"Multidimensional transcriptome dataset for systematic evaluation of Jakyakgamcho-tang-induced cell signatures.","authors":"Su-Jin Baek, Haeseung Lee, Sang-Min Park, Aeyung Kim, No Soo Kim, Eun-Hye Seo, A Yeong Lee, Yu Ri Kim, Wook Jin Kim, Kyu-Won Seo, Musun Park, Jin-Mu Yi, Seongwon Cha","doi":"10.1038/s41597-026-06759-6","DOIUrl":"https://doi.org/10.1038/s41597-026-06759-6","url":null,"abstract":"<p><p>Jakyakgamcho-tang (JGT), the simplest form of herbal medicine, comprises Paeoniae Radix (PR) and Glycyrrhizae Radix et Rhizoma (GR). It has been used to treat muscle-related diseases and inflammation. However, its pharmacological effects may vary with the proportions of ingredients and preparatory factors such as the extraction method. Nevertheless, gene expression datasets systematically reflecting these variables are lacking. A total of 513 transcriptome profiles were created with three concentrations and three replicates of RNA-seq data. This dataset structure will enable multidimensional analysis of the effects of various JGT preparation factors on gene expression; these factors include the PR to GR proportional ratio (2:1, 1:1, and 1:2), solvent (water or 70% ethanol), and extraction method (combined or individual extraction method). The HepG2, C2C12, and PC12 cell lines were targeted. All raw and preprocessed data are available through GEO. Standardized metadata and ingredient data are also provided. This dataset provides a foundation for exploring traditional herbal formulations effects on cellular transcriptomic responses and can facilitate the scientific optimization of herbal medicines.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Scientific Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1