Sharon E Davis, Chad Dorn, Daniel J Park, Michael E Matheny
{"title":"Emerging algorithmic bias: fairness drift as the next dimension of model maintenance and sustainability.","authors":"Sharon E Davis, Chad Dorn, Daniel J Park, Michael E Matheny","doi":"10.1093/jamia/ocaf039","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>While performance drift of clinical prediction models is well-documented, the potential for algorithmic biases to emerge post-deployment has had limited characterization. A better understanding of how temporal model performance may shift across subpopulations is required to incorporate fairness drift into model maintenance strategies.</p><p><strong>Materials and methods: </strong>We explore fairness drift in a national population over 11 years, with and without model maintenance aimed at sustaining population-level performance. We trained random forest models predicting 30-day post-surgical readmission, mortality, and pneumonia using 2013 data from US Department of Veterans Affairs facilities. We evaluated performance quarterly from 2014 to 2023 by self-reported race and sex. We estimated discrimination, calibration, and accuracy, and operationalized fairness using metric parity measured as the gap between disadvantaged and advantaged groups.</p><p><strong>Results: </strong>Our cohort included 1 739 666 surgical cases. We observed fairness drift in both the original and temporally updated models. Model updating had a larger impact on overall performance than fairness gaps. During periods of stable fairness, updating models at the population level increased, decreased, or did not impact fairness gaps. During periods of fairness drift, updating models restored fairness in some cases and exacerbated fairness gaps in others.</p><p><strong>Discussion: </strong>This exploratory study highlights that algorithmic fairness cannot be assured through one-time assessments during model development. Temporal changes in fairness may take multiple forms and interact with model updating strategies in unanticipated ways.</p><p><strong>Conclusion: </strong>Equitable and sustainable clinical artificial intelligence deployments will require novel methods to monitor algorithmic fairness, detect emerging bias, and adopt model updates that promote fairness.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf039","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: While performance drift of clinical prediction models is well-documented, the potential for algorithmic biases to emerge post-deployment has had limited characterization. A better understanding of how temporal model performance may shift across subpopulations is required to incorporate fairness drift into model maintenance strategies.
Materials and methods: We explore fairness drift in a national population over 11 years, with and without model maintenance aimed at sustaining population-level performance. We trained random forest models predicting 30-day post-surgical readmission, mortality, and pneumonia using 2013 data from US Department of Veterans Affairs facilities. We evaluated performance quarterly from 2014 to 2023 by self-reported race and sex. We estimated discrimination, calibration, and accuracy, and operationalized fairness using metric parity measured as the gap between disadvantaged and advantaged groups.
Results: Our cohort included 1 739 666 surgical cases. We observed fairness drift in both the original and temporally updated models. Model updating had a larger impact on overall performance than fairness gaps. During periods of stable fairness, updating models at the population level increased, decreased, or did not impact fairness gaps. During periods of fairness drift, updating models restored fairness in some cases and exacerbated fairness gaps in others.
Discussion: This exploratory study highlights that algorithmic fairness cannot be assured through one-time assessments during model development. Temporal changes in fairness may take multiple forms and interact with model updating strategies in unanticipated ways.
Conclusion: Equitable and sustainable clinical artificial intelligence deployments will require novel methods to monitor algorithmic fairness, detect emerging bias, and adopt model updates that promote fairness.
期刊介绍:
JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.