Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A Review

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-12-18 DOI:10.1109/TASE.2024.3515839

Lingchao Mao;Hairong Wang;Leland S. Hu;Nhan L. Tran;Peter D. Canoll;Kristin R. Swanson;Jing Li

{"title":"Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A Review","authors":"Lingchao Mao;Hairong Wang;Leland S. Hu;Nhan L. Tran;Peter D. Canoll;Kristin R. Swanson;Jing Li","doi":"10.1109/TASE.2024.3515839","DOIUrl":null,"url":null,"abstract":"Cancer remains one of the most challenging diseases to treat in the medical field. Machine learning (ML) has enabled in-depth analysis of complex patterns from large, diverse datasets, greatly facilitating “healthcare automation” in cancer diagnosis and prognosis. Despite these advancements, ML models face challenges stemming from limited labeled sample sizes, the intricate interplay of high-dimensionality data types, the inherent heterogeneity observed among patients and within tumors, and concerns about interpretability and consistency with existing biomedical knowledge. One approach to address these challenges is to integrate biomedical knowledge into data-driven models, which has proven potential to improve the accuracy, robustness, and interpretability of model results. Here, we review the state-of-the-art ML studies that leverage the fusion of biomedical knowledge and data, termed knowledge-informed machine learning (KIML), to advance cancer diagnosis and prognosis. We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines with concrete examples. We conclude the review article by discussing future directions aimed at leveraging KIML to advance cancer research and healthcare automation. A live summary of the review is hosted at <uri>https://lingchm.github.io/kinformed-machine-learning-cancer/</uri> offering an evolving resource to support research in this field.Note to Practitioners—Recognizing the challenges posed by inter-patient and intratumoral heterogeneity, constrained sample size, and interpretability requirements in cancer applications, practitioners should consider integration of existing biomedical knowledge into their modeling frameworks. This strategy holds promise for enhancing model performance, robustness, and interpretability. We review classic machine learning and deep learning models that incorporated domain knowledge in their cancer diagnosis and prognosis models spanning models that use clinical, imaging, molecular, and treatment data. Pros and cons of each integration approach are discussed. Key design questions such as which knowledge to leverage, how to represent it effectively, and how to seamlessly integrate it into their models need be examined for each case. Collaboration between modeling scientists and medical experts is essential in this endeavor.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"10008-10028"},"PeriodicalIF":6.4000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10806835","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10806835/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Cancer remains one of the most challenging diseases to treat in the medical field. Machine learning (ML) has enabled in-depth analysis of complex patterns from large, diverse datasets, greatly facilitating “healthcare automation” in cancer diagnosis and prognosis. Despite these advancements, ML models face challenges stemming from limited labeled sample sizes, the intricate interplay of high-dimensionality data types, the inherent heterogeneity observed among patients and within tumors, and concerns about interpretability and consistency with existing biomedical knowledge. One approach to address these challenges is to integrate biomedical knowledge into data-driven models, which has proven potential to improve the accuracy, robustness, and interpretability of model results. Here, we review the state-of-the-art ML studies that leverage the fusion of biomedical knowledge and data, termed knowledge-informed machine learning (KIML), to advance cancer diagnosis and prognosis. We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines with concrete examples. We conclude the review article by discussing future directions aimed at leveraging KIML to advance cancer research and healthcare automation. A live summary of the review is hosted at https://lingchm.github.io/kinformed-machine-learning-cancer/ offering an evolving resource to support research in this field.Note to Practitioners—Recognizing the challenges posed by inter-patient and intratumoral heterogeneity, constrained sample size, and interpretability requirements in cancer applications, practitioners should consider integration of existing biomedical knowledge into their modeling frameworks. This strategy holds promise for enhancing model performance, robustness, and interpretability. We review classic machine learning and deep learning models that incorporated domain knowledge in their cancer diagnosis and prognosis models spanning models that use clinical, imaging, molecular, and treatment data. Pros and cons of each integration approach are discussed. Key design questions such as which knowledge to leverage, how to represent it effectively, and how to seamlessly integrate it into their models need be examined for each case. Collaboration between modeling scientists and medical experts is essential in this endeavor.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于知识的机器学习在癌症诊断和预后中的应用综述

癌症仍然是医学领域最具挑战性的疾病之一。机器学习（ML）已经能够对来自大型、不同数据集的复杂模式进行深入分析，极大地促进了癌症诊断和预后的“医疗保健自动化”。尽管取得了这些进步，但机器学习模型仍面临着挑战，这些挑战源于有限的标记样本量、高维数据类型之间复杂的相互作用、在患者之间和肿瘤内部观察到的固有异质性，以及对可解释性和与现有生物医学知识的一致性的担忧。解决这些挑战的一种方法是将生物医学知识整合到数据驱动的模型中，这已被证明有可能提高模型结果的准确性、稳健性和可解释性。在这里，我们回顾了利用生物医学知识和数据融合的最先进的机器学习研究，称为知识知情机器学习（KIML），以推进癌症诊断和预后。我们通过具体的例子概述了各种形式的知识表示和当前将知识集成到机器学习管道中的策略。我们通过讨论旨在利用KIML推进癌症研究和医疗保健自动化的未来方向来总结这篇综述文章。该评论的实时摘要托管在https://lingchm.github.io/kinformed-machine-learning-cancer/上，为支持该领域的研究提供了不断发展的资源。从业人员注意：认识到患者间和肿瘤内异质性、受限制的样本量以及癌症应用中的可解释性要求所带来的挑战，从业人员应考虑将现有的生物医学知识整合到他们的建模框架中。该策略有望增强模型性能、健壮性和可解释性。我们回顾了经典的机器学习和深度学习模型，这些模型将领域知识纳入其癌症诊断和预后模型，涵盖使用临床、成像、分子和治疗数据的模型。讨论了每种集成方法的优缺点。关键的设计问题，如利用哪些知识，如何有效地表示它，以及如何无缝地将其集成到模型中，都需要针对每种情况进行检查。在这项努力中，建模科学家和医学专家之间的合作至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.