Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review

Masoud Tafavvoghi , Lars Ailo Bongo , Nikita Shvetsov , Lill-Tove Rasmussen Busund , Kajsa Møllersen
{"title":"Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review","authors":"Masoud Tafavvoghi ,&nbsp;Lars Ailo Bongo ,&nbsp;Nikita Shvetsov ,&nbsp;Lill-Tove Rasmussen Busund ,&nbsp;Kajsa Møllersen","doi":"10.1016/j.jpi.2024.100363","DOIUrl":null,"url":null,"abstract":"<div><p>Advancements in digital pathology and computing resources have made a significant impact in the field of computational pathology for breast cancer diagnosis and treatment. However, access to high-quality labeled histopathological images of breast cancer is a big challenge that limits the development of accurate and robust deep learning models. In this scoping review, we identified the publicly available datasets of breast H&amp;E-stained whole-slide images (WSIs) that can be used to develop deep learning algorithms. We systematically searched 9 scientific literature databases and 9 research data repositories and found 17 publicly available datasets containing 10 385 H&amp;E WSIs of breast cancer. Moreover, we reported image metadata and characteristics for each dataset to assist researchers in selecting proper datasets for specific tasks in breast cancer computational pathology. In addition, we compiled 2 lists of breast H&amp;E patches and private datasets as supplementary resources for researchers. Notably, only 28% of the included articles utilized multiple datasets, and only 14% used an external validation set, suggesting that the performance of other developed models may be susceptible to overestimation. The TCGA-BRCA was used in 52% of the selected studies. This dataset has a considerable selection bias that can impact the robustness and generalizability of the trained algorithms. There is also a lack of consistent metadata reporting of breast WSI datasets that can be an issue in developing accurate deep learning models, indicating the necessity of establishing explicit guidelines for documenting breast WSI dataset characteristics and metadata.</p></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2153353924000026/pdfft?md5=e1d6b199f5ede66427075250c84de4c0&pid=1-s2.0-S2153353924000026-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pathology Informatics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2153353924000026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Advancements in digital pathology and computing resources have made a significant impact in the field of computational pathology for breast cancer diagnosis and treatment. However, access to high-quality labeled histopathological images of breast cancer is a big challenge that limits the development of accurate and robust deep learning models. In this scoping review, we identified the publicly available datasets of breast H&E-stained whole-slide images (WSIs) that can be used to develop deep learning algorithms. We systematically searched 9 scientific literature databases and 9 research data repositories and found 17 publicly available datasets containing 10 385 H&E WSIs of breast cancer. Moreover, we reported image metadata and characteristics for each dataset to assist researchers in selecting proper datasets for specific tasks in breast cancer computational pathology. In addition, we compiled 2 lists of breast H&E patches and private datasets as supplementary resources for researchers. Notably, only 28% of the included articles utilized multiple datasets, and only 14% used an external validation set, suggesting that the performance of other developed models may be susceptible to overestimation. The TCGA-BRCA was used in 52% of the selected studies. This dataset has a considerable selection bias that can impact the robustness and generalizability of the trained algorithms. There is also a lack of consistent metadata reporting of breast WSI datasets that can be an issue in developing accurate deep learning models, indicating the necessity of establishing explicit guidelines for documenting breast WSI dataset characteristics and metadata.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
可公开获取的乳腺组织病理学 H&E 全切片图像数据集:范围审查
数字病理学和计算资源的进步对用于乳腺癌诊断和治疗的计算病理学领域产生了重大影响。然而,获取高质量的乳腺癌标记组织病理学图像是一个巨大的挑战,限制了准确、稳健的深度学习模型的开发。在这篇范围综述中,我们确定了可用于开发深度学习算法的公开可用的乳腺H&E染色全切片图像(WSI)数据集。我们系统地搜索了 9 个科学文献数据库和 9 个研究数据存储库,发现了 17 个公开可用的数据集,包含 10 385 张乳腺癌 H&E WSIs。此外,我们还报告了每个数据集的图像元数据和特征,以帮助研究人员为乳腺癌计算病理学的特定任务选择合适的数据集。此外,我们还编制了两份乳腺 H&E 补丁和私人数据集列表,作为研究人员的补充资源。值得注意的是,只有28%的收录文章使用了多个数据集,只有14%的文章使用了外部验证集,这表明其他已开发模型的性能可能容易被高估。52%的入选研究使用了 TCGA-BRCA。该数据集存在相当大的选择偏差,可能会影响训练算法的稳健性和普适性。此外,乳腺 WSI 数据集缺乏一致的元数据报告,这可能会成为开发精确深度学习模型的一个问题,这表明有必要制定明确的指南来记录乳腺 WSI 数据集的特征和元数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Pathology Informatics
Journal of Pathology Informatics Medicine-Pathology and Forensic Medicine
CiteScore
3.70
自引率
0.00%
发文量
2
审稿时长
18 weeks
期刊介绍: The Journal of Pathology Informatics (JPI) is an open access peer-reviewed journal dedicated to the advancement of pathology informatics. This is the official journal of the Association for Pathology Informatics (API). The journal aims to publish broadly about pathology informatics and freely disseminate all articles worldwide. This journal is of interest to pathologists, informaticians, academics, researchers, health IT specialists, information officers, IT staff, vendors, and anyone with an interest in informatics. We encourage submissions from anyone with an interest in the field of pathology informatics. We publish all types of papers related to pathology informatics including original research articles, technical notes, reviews, viewpoints, commentaries, editorials, symposia, meeting abstracts, book reviews, and correspondence to the editors. All submissions are subject to rigorous peer review by the well-regarded editorial board and by expert referees in appropriate specialties.
期刊最新文献
Digital mapping of resected cancer specimens: The visual pathology report A precise machine learning model: Detecting cervical cancer using feature selection and explainable AI ViCE: An automated and quantitative program to assess intestinal tissue morphology Deep feature batch correction using ComBat for machine learning applications in computational pathology LVI-PathNet: Segmentation-classification pipeline for detection of lymphovascular invasion in whole slide images of lung adenocarcinoma
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1