利用机器学习模型预测产前抑郁症并评估模型偏差

Yongchao Huang , Suzanne Alvernaz , Sage J. Kim , Pauline Maki , Yang Dai , Beatriz Peñalver Bernabé
{"title":"利用机器学习模型预测产前抑郁症并评估模型偏差","authors":"Yongchao Huang ,&nbsp;Suzanne Alvernaz ,&nbsp;Sage J. Kim ,&nbsp;Pauline Maki ,&nbsp;Yang Dai ,&nbsp;Beatriz Peñalver Bernabé","doi":"10.1016/j.bpsgos.2024.100376","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Perinatal depression is one of the most common medical complications during pregnancy and postpartum period, affecting 10% to 20% of pregnant individuals, with higher rates among Black and Latina women who are also less likely to be diagnosed and treated. Machine learning (ML) models based on electronic medical records (EMRs) have effectively predicted postpartum depression in middle-class White women but have rarely included sufficient proportions of racial/ethnic minorities, which has contributed to biases in ML models. Our goal is to determine whether ML models could predict depression in early pregnancy in racial/ethnic minority women by leveraging EMR data.</div></div><div><h3>Methods</h3><div>We extracted EMRs from a large U.S. urban hospital serving mostly low-income Black and Hispanic women (<em>n</em> = 5875). Depressive symptom severity was assessed using the Patient Health Questionnaire-9 self-report questionnaire. We investigated multiple ML classifiers using Shapley additive explanations for model interpretation and determined prediction bias with 4 metrics: disparate impact, equal opportunity difference, and equalized odds (standard deviations of true positives and false positives).</div></div><div><h3>Results</h3><div>Although the best-performing ML model's (elastic net) performance was low (area under the receiver operating characteristic curve = 0.61), we identified known perinatal depression risk factors such as unplanned pregnancy and being single and underexplored factors such as self-reported pain, lower prenatal vitamin intake, asthma, carrying a male fetus, and lower platelet levels. Despite the sample comprising mostly low-income minority women (54% Black, 27% Latina), the model performed worse for these communities (area under the receiver operating characteristic curve: 57% Black, 59% Latina women vs. 64% White women).</div></div><div><h3>Conclusions</h3><div>EMR-based ML models could moderately predict early pregnancy depression but exhibited biased performance against low-income minority women.</div></div>","PeriodicalId":72373,"journal":{"name":"Biological psychiatry global open science","volume":"4 6","pages":"Article 100376"},"PeriodicalIF":4.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting Prenatal Depression and Assessing Model Bias Using Machine Learning Models\",\"authors\":\"Yongchao Huang ,&nbsp;Suzanne Alvernaz ,&nbsp;Sage J. Kim ,&nbsp;Pauline Maki ,&nbsp;Yang Dai ,&nbsp;Beatriz Peñalver Bernabé\",\"doi\":\"10.1016/j.bpsgos.2024.100376\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Perinatal depression is one of the most common medical complications during pregnancy and postpartum period, affecting 10% to 20% of pregnant individuals, with higher rates among Black and Latina women who are also less likely to be diagnosed and treated. Machine learning (ML) models based on electronic medical records (EMRs) have effectively predicted postpartum depression in middle-class White women but have rarely included sufficient proportions of racial/ethnic minorities, which has contributed to biases in ML models. Our goal is to determine whether ML models could predict depression in early pregnancy in racial/ethnic minority women by leveraging EMR data.</div></div><div><h3>Methods</h3><div>We extracted EMRs from a large U.S. urban hospital serving mostly low-income Black and Hispanic women (<em>n</em> = 5875). Depressive symptom severity was assessed using the Patient Health Questionnaire-9 self-report questionnaire. We investigated multiple ML classifiers using Shapley additive explanations for model interpretation and determined prediction bias with 4 metrics: disparate impact, equal opportunity difference, and equalized odds (standard deviations of true positives and false positives).</div></div><div><h3>Results</h3><div>Although the best-performing ML model's (elastic net) performance was low (area under the receiver operating characteristic curve = 0.61), we identified known perinatal depression risk factors such as unplanned pregnancy and being single and underexplored factors such as self-reported pain, lower prenatal vitamin intake, asthma, carrying a male fetus, and lower platelet levels. Despite the sample comprising mostly low-income minority women (54% Black, 27% Latina), the model performed worse for these communities (area under the receiver operating characteristic curve: 57% Black, 59% Latina women vs. 64% White women).</div></div><div><h3>Conclusions</h3><div>EMR-based ML models could moderately predict early pregnancy depression but exhibited biased performance against low-income minority women.</div></div>\",\"PeriodicalId\":72373,\"journal\":{\"name\":\"Biological psychiatry global open science\",\"volume\":\"4 6\",\"pages\":\"Article 100376\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biological psychiatry global open science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667174324000892\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"NEUROSCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biological psychiatry global open science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667174324000892","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

背景产后抑郁症是孕期和产后最常见的医疗并发症之一,影响 10%-20%的孕妇,其中黑人和拉丁裔妇女的发病率较高,而她们也较少得到诊断和治疗。基于电子病历(EMR)的机器学习(ML)模型可以有效预测中产阶级白人妇女的产后抑郁症,但很少包含足够比例的少数种族/族裔,这导致了 ML 模型的偏差。我们的目标是通过利用 EMR 数据来确定 ML 模型是否能预测少数种族/族裔妇女的孕早期抑郁症。方法我们从美国一家大型城市医院提取了 EMR,该医院主要服务于低收入的黑人和西班牙裔妇女(n = 5875)。抑郁症状严重程度通过患者健康问卷-9 自我报告问卷进行评估。我们使用沙普利加法解释对多个 ML 分类器进行了研究,并用 4 个指标确定了预测偏差:差异影响、机会均等差异和均等化几率(真阳性和假阳性的标准偏差)。结果虽然表现最好的 ML 模型(弹性网)性能较低(接收者操作特征曲线下面积 = 0.61),但我们发现了已知的围产期抑郁风险因素,如计划外怀孕和单身,以及未被充分探索的因素,如自我报告的疼痛、产前维生素摄入量较低、哮喘、怀有男胎和血小板水平较低。尽管样本中大多数是低收入的少数民族妇女(54% 黑人,27% 拉丁裔),但该模型在这些群体中的表现较差(接收者操作特征曲线下的面积:57% 黑人,59% 拉丁裔):结论基于 EMR 的 ML 模型可适度预测孕早期抑郁症,但对低收入少数民族妇女的预测表现出偏差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Predicting Prenatal Depression and Assessing Model Bias Using Machine Learning Models

Background

Perinatal depression is one of the most common medical complications during pregnancy and postpartum period, affecting 10% to 20% of pregnant individuals, with higher rates among Black and Latina women who are also less likely to be diagnosed and treated. Machine learning (ML) models based on electronic medical records (EMRs) have effectively predicted postpartum depression in middle-class White women but have rarely included sufficient proportions of racial/ethnic minorities, which has contributed to biases in ML models. Our goal is to determine whether ML models could predict depression in early pregnancy in racial/ethnic minority women by leveraging EMR data.

Methods

We extracted EMRs from a large U.S. urban hospital serving mostly low-income Black and Hispanic women (n = 5875). Depressive symptom severity was assessed using the Patient Health Questionnaire-9 self-report questionnaire. We investigated multiple ML classifiers using Shapley additive explanations for model interpretation and determined prediction bias with 4 metrics: disparate impact, equal opportunity difference, and equalized odds (standard deviations of true positives and false positives).

Results

Although the best-performing ML model's (elastic net) performance was low (area under the receiver operating characteristic curve = 0.61), we identified known perinatal depression risk factors such as unplanned pregnancy and being single and underexplored factors such as self-reported pain, lower prenatal vitamin intake, asthma, carrying a male fetus, and lower platelet levels. Despite the sample comprising mostly low-income minority women (54% Black, 27% Latina), the model performed worse for these communities (area under the receiver operating characteristic curve: 57% Black, 59% Latina women vs. 64% White women).

Conclusions

EMR-based ML models could moderately predict early pregnancy depression but exhibited biased performance against low-income minority women.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biological psychiatry global open science
Biological psychiatry global open science Psychiatry and Mental Health
CiteScore
4.00
自引率
0.00%
发文量
0
审稿时长
91 days
期刊最新文献
Table of Contents Editorial Board Page Subscribers Page Guide for Authors In This Issue – November
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1