Xingjian Xiao , Bo Hong , Kubra Maqsood , Xiaohan Yi , Guoqun Xie , Hailei Zhao , Bo Sun , Jianying Mao , Shiyou Liu , Xianglong Xu
{"title":"Using machine learning algorithms to predict colorectal cancer","authors":"Xingjian Xiao , Bo Hong , Kubra Maqsood , Xiaohan Yi , Guoqun Xie , Hailei Zhao , Bo Sun , Jianying Mao , Shiyou Liu , Xianglong Xu","doi":"10.1016/j.lanwpc.2024.101355","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Colorectal cancer (CRC) is the second most common type of cancer in China, with middle-aged and elderly adults being at high risk. However, the colonoscopy examination rate among middle-aged and elderly adults is very low. As of 2020, the colonoscopy examination rate in China was 914.8 per 100,000 people, and the distribution across regions was extremely uneven. Given the high incidence and mortality rates of colorectal cancer and the low screening rate of colonoscopies in the initial screening positive population for colorectal cancer, further interventions will be needed. The objective of this study was to use machine learning and 0.2 million consultation data to predict colorectal cancer and identify important predictors.</div></div><div><h3>Methods</h3><div>Our study was based on a population-based cross-sectional survey. We used data from 5,664 cases with colonoscopy results out of 49,701 initial positive consultations in the colorectal cancer screening project in Baoshan District, Shanghai, from 2013 to 2021. Multiple machine learning models including adaptive boosting classifier and gradient boosting machine were established to predict colorectal cancer. In the setting of outcome indicators, patients diagnosed with colorectal cancer through clinical colonoscopy results are considered to have colorectal cancer. An area under the curve (AUC) of each established model exceeding 0.7 was considered acceptable for predicting colorectal cancer. The optimal model was used to identify predictors of colorectal cancer.</div></div><div><h3>Findings</h3><div>The incidence of colorectal cancer and the colonoscopy rate is 3.58% (203/5664) and 11.4% (5664/49,701). Non-invasive predictors such as sociodemographic information, behavioural history, and medical history were used to predict the current occurrence of colorectal cancer. In our study, the accuracy of Gradient Boosting Machine, Support Vector Machine, and Light Gradient Boosting Machine reached 0.86, while the accuracy of eXtreme Gradient Boosting reached 0.84 in predicting the occurrence of colorectal cancer. Among the variables predicting colorectal cancer, age, occupation, education, history of bowel cancer in first-degree relatives, history of cholecystitis are important predictors.</div></div><div><h3>Interpretation</h3><div>Using machine learning methods and non-invasive predictors can accurately predict colorectal cancer in individuals with positive initial screening results for colorectal cancer. Our machine learning predictive models can provide further risk for colorectal cancer, which may help increase the colonoscopy examination rate among individuals with positive initial screening results. In individuals with positive colorectal cancer screenings, colonoscopy rates are low. Our machine learning models can enhance screening rates, aiding in disease prevention.</div></div><div><h3>Funding</h3><div>This study was supported by Health Promotion and Education of the Key medical Specialty of Baoshan District, shanghai (BSZK-2023-BZ14), Traditional Chinese medicine research project of Shanghai Municipal Health Commission (20240N108), and Construction of Traditional Chinese Medicine Inheritance and innovation Development Demonstration Pilot Projects in Pudong New Area - High-Level Research-Oriented Traditional Chinese Medicine Hospital Construction (C-2023-0901).</div></div>","PeriodicalId":22792,"journal":{"name":"The Lancet Regional Health: Western Pacific","volume":"55 ","pages":"Article 101355"},"PeriodicalIF":7.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Lancet Regional Health: Western Pacific","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666606524003493","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Colorectal cancer (CRC) is the second most common type of cancer in China, with middle-aged and elderly adults being at high risk. However, the colonoscopy examination rate among middle-aged and elderly adults is very low. As of 2020, the colonoscopy examination rate in China was 914.8 per 100,000 people, and the distribution across regions was extremely uneven. Given the high incidence and mortality rates of colorectal cancer and the low screening rate of colonoscopies in the initial screening positive population for colorectal cancer, further interventions will be needed. The objective of this study was to use machine learning and 0.2 million consultation data to predict colorectal cancer and identify important predictors.
Methods
Our study was based on a population-based cross-sectional survey. We used data from 5,664 cases with colonoscopy results out of 49,701 initial positive consultations in the colorectal cancer screening project in Baoshan District, Shanghai, from 2013 to 2021. Multiple machine learning models including adaptive boosting classifier and gradient boosting machine were established to predict colorectal cancer. In the setting of outcome indicators, patients diagnosed with colorectal cancer through clinical colonoscopy results are considered to have colorectal cancer. An area under the curve (AUC) of each established model exceeding 0.7 was considered acceptable for predicting colorectal cancer. The optimal model was used to identify predictors of colorectal cancer.
Findings
The incidence of colorectal cancer and the colonoscopy rate is 3.58% (203/5664) and 11.4% (5664/49,701). Non-invasive predictors such as sociodemographic information, behavioural history, and medical history were used to predict the current occurrence of colorectal cancer. In our study, the accuracy of Gradient Boosting Machine, Support Vector Machine, and Light Gradient Boosting Machine reached 0.86, while the accuracy of eXtreme Gradient Boosting reached 0.84 in predicting the occurrence of colorectal cancer. Among the variables predicting colorectal cancer, age, occupation, education, history of bowel cancer in first-degree relatives, history of cholecystitis are important predictors.
Interpretation
Using machine learning methods and non-invasive predictors can accurately predict colorectal cancer in individuals with positive initial screening results for colorectal cancer. Our machine learning predictive models can provide further risk for colorectal cancer, which may help increase the colonoscopy examination rate among individuals with positive initial screening results. In individuals with positive colorectal cancer screenings, colonoscopy rates are low. Our machine learning models can enhance screening rates, aiding in disease prevention.
Funding
This study was supported by Health Promotion and Education of the Key medical Specialty of Baoshan District, shanghai (BSZK-2023-BZ14), Traditional Chinese medicine research project of Shanghai Municipal Health Commission (20240N108), and Construction of Traditional Chinese Medicine Inheritance and innovation Development Demonstration Pilot Projects in Pudong New Area - High-Level Research-Oriented Traditional Chinese Medicine Hospital Construction (C-2023-0901).
期刊介绍:
The Lancet Regional Health – Western Pacific, a gold open access journal, is an integral part of The Lancet's global initiative advocating for healthcare quality and access worldwide. It aims to advance clinical practice and health policy in the Western Pacific region, contributing to enhanced health outcomes. The journal publishes high-quality original research shedding light on clinical practice and health policy in the region. It also includes reviews, commentaries, and opinion pieces covering diverse regional health topics, such as infectious diseases, non-communicable diseases, child and adolescent health, maternal and reproductive health, aging health, mental health, the health workforce and systems, and health policy.