Piyawan Conahan, Lary A Robinson, Trung Le, Gilmer Valdes, Matthew B Schabath, Margaret M Byrne, Lee Green, Issam El Naqa, Yi Luo
{"title":"EEC-GIFT: a fairness-aware machine learning framework for lung cancer screening eligibility using real-world data.","authors":"Piyawan Conahan, Lary A Robinson, Trung Le, Gilmer Valdes, Matthew B Schabath, Margaret M Byrne, Lee Green, Issam El Naqa, Yi Luo","doi":"10.1093/jncics/pkaf030","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>We use real-world data to develop a lung cancer screening (LCS) eligibility mechanism that is both accurate and free from racial bias.</p><p><strong>Methods: </strong>Our data came from the Prostate, Lung, Colorectal, and Ovarian (PLCO) cancer screening trial. We built a systematic fairness-aware machine learning framework by integrating a Group and Intersectional Fairness and Threshold (GIFT) strategy with an easy ensemble classifier- (EEC-) or logistic regression- (LR-) based model. The best LCS eligibility mechanism EEC-GIFT* and LR-GIFT* were applied to the testing dataset and their performances were compared to the 2021 US Preventive Services Task Force (USPSTF) criteria and PLCOM2012 model. The equal opportunity difference (EOD) of developing lung cancer between Black and White smokers was used to evaluate mechanism fairness.</p><p><strong>Results: </strong>The fairness of LR-GIFT* or EEC-GIFT* during training was notably greater than that of the LR or EEC models without greatly reducing their accuracy. During testing, the EEC-GIFT* (85.16% vs 78.08%, P < .001) and LR-GIFT* (85.98% vs 78.08%, P < .001) models significantly improved sensitivity without sacrificing specificity compared to the 2021 USPSTF criteria. The EEC-GIFT* (0.785 vs 0.788, P = .28) and LR-GIFT* (0.785 vs 0.788, P = .30) showed similar area under receiver operating characteristic curve (AUC) values compared to the PLCOM2012 model. While the average EODs between Blacks and Whites were significant for the 2021 USPSTF criteria (0.0673, P < .001), PLCOM2012 (0.0566, P < .001), and LR-GIFT* (0.0081, P < .001), the EEC-GIFT* model was unbiased (0.0034, P = .07).</p><p><strong>Conclusion: </strong>Our EEC-GIFT* LCS eligibility mechanism can significantly mitigate racial biases in eligibility determination without compromising its predictive performance.</p>","PeriodicalId":14681,"journal":{"name":"JNCI Cancer Spectrum","volume":" ","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JNCI Cancer Spectrum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jncics/pkaf030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: We use real-world data to develop a lung cancer screening (LCS) eligibility mechanism that is both accurate and free from racial bias.
Methods: Our data came from the Prostate, Lung, Colorectal, and Ovarian (PLCO) cancer screening trial. We built a systematic fairness-aware machine learning framework by integrating a Group and Intersectional Fairness and Threshold (GIFT) strategy with an easy ensemble classifier- (EEC-) or logistic regression- (LR-) based model. The best LCS eligibility mechanism EEC-GIFT* and LR-GIFT* were applied to the testing dataset and their performances were compared to the 2021 US Preventive Services Task Force (USPSTF) criteria and PLCOM2012 model. The equal opportunity difference (EOD) of developing lung cancer between Black and White smokers was used to evaluate mechanism fairness.
Results: The fairness of LR-GIFT* or EEC-GIFT* during training was notably greater than that of the LR or EEC models without greatly reducing their accuracy. During testing, the EEC-GIFT* (85.16% vs 78.08%, P < .001) and LR-GIFT* (85.98% vs 78.08%, P < .001) models significantly improved sensitivity without sacrificing specificity compared to the 2021 USPSTF criteria. The EEC-GIFT* (0.785 vs 0.788, P = .28) and LR-GIFT* (0.785 vs 0.788, P = .30) showed similar area under receiver operating characteristic curve (AUC) values compared to the PLCOM2012 model. While the average EODs between Blacks and Whites were significant for the 2021 USPSTF criteria (0.0673, P < .001), PLCOM2012 (0.0566, P < .001), and LR-GIFT* (0.0081, P < .001), the EEC-GIFT* model was unbiased (0.0034, P = .07).
Conclusion: Our EEC-GIFT* LCS eligibility mechanism can significantly mitigate racial biases in eligibility determination without compromising its predictive performance.