Identifying Data-Driven Clinical Subgroups for Cervical Cancer Prevention With Machine Learning: Population-Based, External, and Diagnostic Validation Study.

IF 3.9 2区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH JMIR Public Health and Surveillance Pub Date : 2025-03-19 DOI:10.2196/67840
Zhen Lu, Binhua Dong, Hongning Cai, Tian Tian, Junfeng Wang, Leiwen Fu, Bingyi Wang, Weijie Zhang, Shaomei Lin, Xunyuan Tuo, Juntao Wang, Tianjie Yang, Xinxin Huang, Zheng Zheng, Huifeng Xue, Shuxia Xu, Siyang Liu, Pengming Sun, Huachun Zou
{"title":"Identifying Data-Driven Clinical Subgroups for Cervical Cancer Prevention With Machine Learning: Population-Based, External, and Diagnostic Validation Study.","authors":"Zhen Lu, Binhua Dong, Hongning Cai, Tian Tian, Junfeng Wang, Leiwen Fu, Bingyi Wang, Weijie Zhang, Shaomei Lin, Xunyuan Tuo, Juntao Wang, Tianjie Yang, Xinxin Huang, Zheng Zheng, Huifeng Xue, Shuxia Xu, Siyang Liu, Pengming Sun, Huachun Zou","doi":"10.2196/67840","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cervical cancer remains a major global health issue. Personalized, data-driven cervical cancer prevention (CCP) strategies tailored to phenotypic profiles may improve prevention and reduce disease burden.</p><p><strong>Objective: </strong>This study aimed to identify subgroups with differential cervical precancer or cancer risks using machine learning, validate subgroup predictions across datasets, and propose a computational phenomapping strategy to enhance global CCP efforts.</p><p><strong>Methods: </strong>We explored the data-driven CCP subgroups by applying unsupervised machine learning to a deeply phenotyped, population-based discovery cohort. We extracted CCP-specific risks of cervical intraepithelial neoplasia (CIN) and cervical cancer through weighted logistic regression analyses providing odds ratio (OR) estimates and 95% CIs. We trained a supervised machine learning model and developed pathways to classify individuals before evaluating its diagnostic validity and usability on an external cohort.</p><p><strong>Results: </strong>This study included 551,934 women (median age, 49 years) in the discovery cohort and 47,130 women (median age, 37 years) in the external cohort. Phenotyping identified 5 CCP subgroups, with CCP4 showing the highest carcinoma prevalence. CCP2-4 had significantly higher risks of CIN2+ (CCP2: OR 2.07 [95% CI: 2.03-2.12], CCP3: 3.88 [3.78-3.97], and CCP4: 4.47 [4.33-4.63]) and CIN3+ (CCP2: 2.10 [2.05-2.14], CCP3: 3.92 [3.82-4.02], and CCP4: 4.45 [4.31-4.61]) compared to CCP1 (P<.001), consistent with the direction of results observed in the external cohort. The proposed triple strategy was validated as clinically relevant, prioritizing high-risk subgroups (CCP3-4) for colposcopies and scaling human papillomavirus screening for CCP1-2.</p><p><strong>Conclusions: </strong>This study underscores the potential of leveraging machine learning algorithms and large-scale routine electronic health records to enhance CCP strategies. By identifying key determinants of CIN2+/CIN3+ risk and classifying 5 distinct subgroups, our study provides a robust, data-driven foundation for the proposed triple strategy. This approach prioritizes tailored prevention efforts for subgroups with varying risks, offering a novel and scalable tool to complement existing cervical cancer screening guidelines. Future work should focus on independent external and prospective validation to maximize the global impact of this strategy.</p>","PeriodicalId":14765,"journal":{"name":"JMIR Public Health and Surveillance","volume":"11 ","pages":"e67840"},"PeriodicalIF":3.9000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11939026/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Public Health and Surveillance","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/67840","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Cervical cancer remains a major global health issue. Personalized, data-driven cervical cancer prevention (CCP) strategies tailored to phenotypic profiles may improve prevention and reduce disease burden.

Objective: This study aimed to identify subgroups with differential cervical precancer or cancer risks using machine learning, validate subgroup predictions across datasets, and propose a computational phenomapping strategy to enhance global CCP efforts.

Methods: We explored the data-driven CCP subgroups by applying unsupervised machine learning to a deeply phenotyped, population-based discovery cohort. We extracted CCP-specific risks of cervical intraepithelial neoplasia (CIN) and cervical cancer through weighted logistic regression analyses providing odds ratio (OR) estimates and 95% CIs. We trained a supervised machine learning model and developed pathways to classify individuals before evaluating its diagnostic validity and usability on an external cohort.

Results: This study included 551,934 women (median age, 49 years) in the discovery cohort and 47,130 women (median age, 37 years) in the external cohort. Phenotyping identified 5 CCP subgroups, with CCP4 showing the highest carcinoma prevalence. CCP2-4 had significantly higher risks of CIN2+ (CCP2: OR 2.07 [95% CI: 2.03-2.12], CCP3: 3.88 [3.78-3.97], and CCP4: 4.47 [4.33-4.63]) and CIN3+ (CCP2: 2.10 [2.05-2.14], CCP3: 3.92 [3.82-4.02], and CCP4: 4.45 [4.31-4.61]) compared to CCP1 (P<.001), consistent with the direction of results observed in the external cohort. The proposed triple strategy was validated as clinically relevant, prioritizing high-risk subgroups (CCP3-4) for colposcopies and scaling human papillomavirus screening for CCP1-2.

Conclusions: This study underscores the potential of leveraging machine learning algorithms and large-scale routine electronic health records to enhance CCP strategies. By identifying key determinants of CIN2+/CIN3+ risk and classifying 5 distinct subgroups, our study provides a robust, data-driven foundation for the proposed triple strategy. This approach prioritizes tailored prevention efforts for subgroups with varying risks, offering a novel and scalable tool to complement existing cervical cancer screening guidelines. Future work should focus on independent external and prospective validation to maximize the global impact of this strategy.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用机器学习识别数据驱动的宫颈癌预防临床亚组:基于人群、外部和诊断验证研究。
背景:子宫颈癌仍然是一个主要的全球健康问题。个性化,数据驱动的宫颈癌预防(CCP)策略量身定制的表型特征可以改善预防和减轻疾病负担。目的:本研究旨在利用机器学习识别具有不同宫颈癌前病变或癌症风险的亚组,验证跨数据集的亚组预测,并提出一种计算现象映射策略,以加强全球CCP工作。方法:我们通过将无监督机器学习应用于深度表型,基于人群的发现队列,探索数据驱动的CCP亚组。我们通过加权逻辑回归分析提取了宫颈上皮内瘤变(CIN)和宫颈癌的ccp特异性风险,提供了优势比(OR)估计和95% ci。我们训练了一个有监督的机器学习模型,并开发了在评估其对外部队列的诊断有效性和可用性之前对个体进行分类的途径。结果:本研究纳入发现队列551,934名女性(中位年龄,49岁)和外部队列47130名女性(中位年龄,37岁)。表型分析鉴定出5个CCP亚组,其中CCP4显示出最高的癌患病率。与CCP1相比,CCP2-4的CIN2+风险(CCP2: OR 2.07 [95% CI: 2.03-2.12], CCP3: 3.88 [3.78-3.97], CCP4: 4.47[4.33-4.63])和CIN3+ (CCP2: 2.10 [2.05-2.14], CCP3: 3.92 [3.82-4.02], CCP4: 4.45[4.31-4.61])显著高于CCP1(结论:本研究强调了利用机器学习算法和大规模常规电子健康记录来增强CCP策略的潜力。通过确定CIN2+/CIN3+风险的关键决定因素并将5个不同的亚组分类,我们的研究为提出的三重策略提供了强大的数据驱动基础。这种方法优先针对不同风险的亚群体进行量身定制的预防工作,为补充现有的宫颈癌筛查指南提供了一种新颖和可扩展的工具。未来的工作应侧重于独立的外部和前瞻性验证,以最大限度地发挥这一战略的全球影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
13.70
自引率
2.40%
发文量
136
审稿时长
12 weeks
期刊介绍: JMIR Public Health & Surveillance (JPHS) is a renowned scholarly journal indexed on PubMed. It follows a rigorous peer-review process and covers a wide range of disciplines. The journal distinguishes itself by its unique focus on the intersection of technology and innovation in the field of public health. JPHS delves into diverse topics such as public health informatics, surveillance systems, rapid reports, participatory epidemiology, infodemiology, infoveillance, digital disease detection, digital epidemiology, electronic public health interventions, mass media and social media campaigns, health communication, and emerging population health analysis systems and tools.
期刊最新文献
The Impact on Audience Engagement of Coordinating a Public Health Campaign on Antimicrobial Resistance Through a Network of Health Content Creators: Longitudinal Observational Study. Trends and Core Competence Shifts in Nurses' Infectious Disease Emergency Response Competence Across COVID-19 Pandemic Phases: Repeated Cross-Sectional Survey and Network Analysis. Determining the Association Between Hearing Disability and Injury Risk in Older Adults Using Propensity Score Matching: Quasi-Experimental Study. Prevalence and Associated Factors of Renal Disease in Saudi Residents Attending Primary Health Care Centers in Riyadh, Saudi Arabia: Cross-Sectional Study. Social Networks and Their Influence on the Choice of Unassisted Smoking Cessation: Cross-Sectional Study in Six Cities in China.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1