Lingchao Mao;Hairong Wang;Leland S. Hu;Nhan L. Tran;Peter D. Canoll;Kristin R. Swanson;Jing Li
{"title":"Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A Review","authors":"Lingchao Mao;Hairong Wang;Leland S. Hu;Nhan L. Tran;Peter D. Canoll;Kristin R. Swanson;Jing Li","doi":"10.1109/TASE.2024.3515839","DOIUrl":null,"url":null,"abstract":"Cancer remains one of the most challenging diseases to treat in the medical field. Machine learning (ML) has enabled in-depth analysis of complex patterns from large, diverse datasets, greatly facilitating “healthcare automation” in cancer diagnosis and prognosis. Despite these advancements, ML models face challenges stemming from limited labeled sample sizes, the intricate interplay of high-dimensionality data types, the inherent heterogeneity observed among patients and within tumors, and concerns about interpretability and consistency with existing biomedical knowledge. One approach to address these challenges is to integrate biomedical knowledge into data-driven models, which has proven potential to improve the accuracy, robustness, and interpretability of model results. Here, we review the state-of-the-art ML studies that leverage the fusion of biomedical knowledge and data, termed knowledge-informed machine learning (KIML), to advance cancer diagnosis and prognosis. We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines with concrete examples. We conclude the review article by discussing future directions aimed at leveraging KIML to advance cancer research and healthcare automation. A live summary of the review is hosted at <uri>https://lingchm.github.io/kinformed-machine-learning-cancer/</uri> offering an evolving resource to support research in this field.Note to Practitioners—Recognizing the challenges posed by inter-patient and intratumoral heterogeneity, constrained sample size, and interpretability requirements in cancer applications, practitioners should consider integration of existing biomedical knowledge into their modeling frameworks. This strategy holds promise for enhancing model performance, robustness, and interpretability. We review classic machine learning and deep learning models that incorporated domain knowledge in their cancer diagnosis and prognosis models spanning models that use clinical, imaging, molecular, and treatment data. Pros and cons of each integration approach are discussed. Key design questions such as which knowledge to leverage, how to represent it effectively, and how to seamlessly integrate it into their models need be examined for each case. Collaboration between modeling scientists and medical experts is essential in this endeavor.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"10008-10028"},"PeriodicalIF":6.4000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10806835","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10806835/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Cancer remains one of the most challenging diseases to treat in the medical field. Machine learning (ML) has enabled in-depth analysis of complex patterns from large, diverse datasets, greatly facilitating “healthcare automation” in cancer diagnosis and prognosis. Despite these advancements, ML models face challenges stemming from limited labeled sample sizes, the intricate interplay of high-dimensionality data types, the inherent heterogeneity observed among patients and within tumors, and concerns about interpretability and consistency with existing biomedical knowledge. One approach to address these challenges is to integrate biomedical knowledge into data-driven models, which has proven potential to improve the accuracy, robustness, and interpretability of model results. Here, we review the state-of-the-art ML studies that leverage the fusion of biomedical knowledge and data, termed knowledge-informed machine learning (KIML), to advance cancer diagnosis and prognosis. We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines with concrete examples. We conclude the review article by discussing future directions aimed at leveraging KIML to advance cancer research and healthcare automation. A live summary of the review is hosted at https://lingchm.github.io/kinformed-machine-learning-cancer/ offering an evolving resource to support research in this field.Note to Practitioners—Recognizing the challenges posed by inter-patient and intratumoral heterogeneity, constrained sample size, and interpretability requirements in cancer applications, practitioners should consider integration of existing biomedical knowledge into their modeling frameworks. This strategy holds promise for enhancing model performance, robustness, and interpretability. We review classic machine learning and deep learning models that incorporated domain knowledge in their cancer diagnosis and prognosis models spanning models that use clinical, imaging, molecular, and treatment data. Pros and cons of each integration approach are discussed. Key design questions such as which knowledge to leverage, how to represent it effectively, and how to seamlessly integrate it into their models need be examined for each case. Collaboration between modeling scientists and medical experts is essential in this endeavor.
期刊介绍:
The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.