Wooyoung Jang, Jonghyun Lee, Kyong Hwa Park, Aeree Kim, Sung Hak Lee, Sangjeong Ahn
{"title":"Molecular Classification of Breast Cancer Using Weakly Supervised Learning.","authors":"Wooyoung Jang, Jonghyun Lee, Kyong Hwa Park, Aeree Kim, Sung Hak Lee, Sangjeong Ahn","doi":"10.4143/crt.2024.113","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The molecular classification of breast cancer is crucial for effective treatment. The emergence of digital pathology has ushered in a new era in which weakly supervised learning leveraging whole-slide images has gained prominence in developing deep learning models because this approach alleviates the need for extensive manual annotation. Weakly supervised learning was employed to classify the molecular subtypes of breast cancer.</p><p><strong>Materials and methods: </strong>Our approach capitalizes on two whole-slide image datasets: one consisting of breast cancer cases from the Korea University Guro Hospital (KG) and the other originating from The Cancer Genomic Atlas dataset (TCGA). Furthermore, we visualized the inferred results using an attention-based heat map and reviewed the histomorphological features of the most attentive patches.</p><p><strong>Results: </strong>The KG+TCGA-trained model achieved an area under the receiver operating characteristics value of 0.749. An inherent challenge lies in the imbalance among subtypes. Additionally, discrepancies between the two datasets resulted in different molecular subtype proportions. To mitigate this imbalance, we merged the two datasets, and the resulting model exhibited improved performance. The attentive patches correlated well with widely recognized histomorphologic features. The triple-negative subtype has a high incidence of high-grade nuclei, tumor necrosis, and intratumoral tumor-infiltrating lymphocytes. The luminal A subtype showed a high incidence of collagen fibers.</p><p><strong>Conclusion: </strong>The artificial intelligence (AI) model based on weakly supervised learning showed promising performance. A review of the most attentive patches provided insights into the predictions of the AI model. AI models can become invaluable screening tools that reduce costs and workloads in practice.</p>","PeriodicalId":49094,"journal":{"name":"Cancer Research and Treatment","volume":" ","pages":"116-125"},"PeriodicalIF":4.1000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11729310/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Research and Treatment","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4143/crt.2024.113","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/25 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: The molecular classification of breast cancer is crucial for effective treatment. The emergence of digital pathology has ushered in a new era in which weakly supervised learning leveraging whole-slide images has gained prominence in developing deep learning models because this approach alleviates the need for extensive manual annotation. Weakly supervised learning was employed to classify the molecular subtypes of breast cancer.
Materials and methods: Our approach capitalizes on two whole-slide image datasets: one consisting of breast cancer cases from the Korea University Guro Hospital (KG) and the other originating from The Cancer Genomic Atlas dataset (TCGA). Furthermore, we visualized the inferred results using an attention-based heat map and reviewed the histomorphological features of the most attentive patches.
Results: The KG+TCGA-trained model achieved an area under the receiver operating characteristics value of 0.749. An inherent challenge lies in the imbalance among subtypes. Additionally, discrepancies between the two datasets resulted in different molecular subtype proportions. To mitigate this imbalance, we merged the two datasets, and the resulting model exhibited improved performance. The attentive patches correlated well with widely recognized histomorphologic features. The triple-negative subtype has a high incidence of high-grade nuclei, tumor necrosis, and intratumoral tumor-infiltrating lymphocytes. The luminal A subtype showed a high incidence of collagen fibers.
Conclusion: The artificial intelligence (AI) model based on weakly supervised learning showed promising performance. A review of the most attentive patches provided insights into the predictions of the AI model. AI models can become invaluable screening tools that reduce costs and workloads in practice.
目的:乳腺癌的分子分类对于有效治疗至关重要。数字病理学的出现开创了一个新时代,利用整张幻灯片图像的弱监督学习在开发深度学习模型方面取得了突出的成绩,因为这种方法减轻了大量人工标注的需要。弱监督学习被用来对乳腺癌分子亚型进行分类:我们的方法利用了两个全滑动图像数据集:一个由韩国大学九老医院(KG)的乳腺癌病例组成,另一个来自癌症基因组图谱数据集(TCGA)。此外,我们还利用基于注意力的热图对推断结果进行了可视化,并回顾了注意力最集中的斑块的组织形态特征:结果:KG+TCGA 训练模型的接收者操作特征下面积值为 0.749。一个固有的挑战在于亚型之间的不平衡。此外,两个数据集之间的差异导致分子亚型比例不同。为了缓解这种不平衡,我们合并了两个数据集,结果模型的性能有所提高。受试斑块与公认的组织形态学特征有很好的相关性。三阴性亚型的高级别细胞核、肿瘤坏死和瘤内肿瘤浸润淋巴细胞的发生率较高。腔内 A 亚型的胶原纤维发生率较高:基于弱监督学习的人工智能(AI)模型表现良好。对最受关注的补丁进行回顾,有助于深入了解人工智能模型的预测结果。人工智能模型可以成为宝贵的筛查工具,在实践中降低成本和工作量。
期刊介绍:
Cancer Research and Treatment is a peer-reviewed open access publication of the Korean Cancer Association. It is published quarterly, one volume per year. Abbreviated title is Cancer Res Treat. It accepts manuscripts relevant to experimental and clinical cancer research. Subjects include carcinogenesis, tumor biology, molecular oncology, cancer genetics, tumor immunology, epidemiology, predictive markers and cancer prevention, pathology, cancer diagnosis, screening and therapies including chemotherapy, surgery, radiation therapy, immunotherapy, gene therapy, multimodality treatment and palliative care.