Investigation of bias in the automated assessment of school violence

IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Biomedical Informatics Pub Date : 2024-08-15 DOI:10.1016/j.jbi.2024.104709
Lara J. Kanbar , Anagh Mishra , Alexander Osborn , Andrew Cifuentes , Jennifer Combs , Michael Sorter , Drew Barzman , Judith W. Dexheimer
{"title":"Investigation of bias in the automated assessment of school violence","authors":"Lara J. Kanbar ,&nbsp;Anagh Mishra ,&nbsp;Alexander Osborn ,&nbsp;Andrew Cifuentes ,&nbsp;Jennifer Combs ,&nbsp;Michael Sorter ,&nbsp;Drew Barzman ,&nbsp;Judith W. Dexheimer","doi":"10.1016/j.jbi.2024.104709","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><p>Natural language processing and machine learning have the potential to lead to biased predictions. We designed a novel Automated RIsk Assessment (ARIA) machine learning algorithm that assesses risk of violence and aggression in adolescents using natural language processing of transcribed student interviews. This work evaluated the possible sources of bias in the study design and the algorithm, tested how much of a prediction was explained by demographic covariates, and investigated the misclassifications based on demographic variables.</p></div><div><h3>Methods</h3><p>We recruited students 10–18 years of age and enrolled in middle or high schools in Ohio, Kentucky, Indiana, and Tennessee. The reference standard outcome was determined by a forensic psychiatrist as either a “high” or “low” risk level. ARIA used L2-regularized logistic regression to predict a risk level for each student using contextual and semantic features. We conducted three analyses: a PROBAST analysis of risk in study design; analysis of demographic variables as covariates; and a prediction analysis. Covariates were included in the linear regression analyses and comprised of race, sex, ethnicity, household education, annual household income, age at the time of visit, and utilization of public assistance.</p></div><div><h3>Results</h3><p>We recruited 412 students from 204 schools. ARIA performed with an AUC of 0.92, sensitivity of 71%, NPV of 77%, and specificity of 95%. Of these, 387 students with complete demographic information were included in the analysis. Individual linear regressions resulted in a coefficient of determination less than 0.08 across all demographic variables. When using all demographic variables to predict ARIA’s risk assessment score, the multiple linear regression model resulted in a coefficient of determination of 0.189. ARIA performed with a lower False Negative Rate (FNR) of 15.2% (CI [0 – 40]) for the Black subgroup and 12.7%, CI [0 – 41.4] for Other races, compared to an FNR of 26.1% (CI [14.1 – 41.8]) in the White subgroup.</p></div><div><h3>Conclusions</h3><p>Bias assessment is needed to address shortcomings within machine learning. In our work, student race, ethnicity, sex, use of public assistance, and annual household income did not explain ARIA’s risk assessment score of students. ARIA will continue to be evaluated regularly with increased subject recruitment.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104709"},"PeriodicalIF":4.0000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046424001278","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives

Natural language processing and machine learning have the potential to lead to biased predictions. We designed a novel Automated RIsk Assessment (ARIA) machine learning algorithm that assesses risk of violence and aggression in adolescents using natural language processing of transcribed student interviews. This work evaluated the possible sources of bias in the study design and the algorithm, tested how much of a prediction was explained by demographic covariates, and investigated the misclassifications based on demographic variables.

Methods

We recruited students 10–18 years of age and enrolled in middle or high schools in Ohio, Kentucky, Indiana, and Tennessee. The reference standard outcome was determined by a forensic psychiatrist as either a “high” or “low” risk level. ARIA used L2-regularized logistic regression to predict a risk level for each student using contextual and semantic features. We conducted three analyses: a PROBAST analysis of risk in study design; analysis of demographic variables as covariates; and a prediction analysis. Covariates were included in the linear regression analyses and comprised of race, sex, ethnicity, household education, annual household income, age at the time of visit, and utilization of public assistance.

Results

We recruited 412 students from 204 schools. ARIA performed with an AUC of 0.92, sensitivity of 71%, NPV of 77%, and specificity of 95%. Of these, 387 students with complete demographic information were included in the analysis. Individual linear regressions resulted in a coefficient of determination less than 0.08 across all demographic variables. When using all demographic variables to predict ARIA’s risk assessment score, the multiple linear regression model resulted in a coefficient of determination of 0.189. ARIA performed with a lower False Negative Rate (FNR) of 15.2% (CI [0 – 40]) for the Black subgroup and 12.7%, CI [0 – 41.4] for Other races, compared to an FNR of 26.1% (CI [14.1 – 41.8]) in the White subgroup.

Conclusions

Bias assessment is needed to address shortcomings within machine learning. In our work, student race, ethnicity, sex, use of public assistance, and annual household income did not explain ARIA’s risk assessment score of students. ARIA will continue to be evaluated regularly with increased subject recruitment.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
调查校园暴力自动评估中的偏差。
目的:自然语言处理和机器学习有可能导致有偏差的预测。我们设计了一种新颖的自动风险评估(ARIA)机器学习算法,该算法通过对学生访谈记录进行自然语言处理来评估青少年的暴力和攻击风险。这项工作评估了研究设计和算法中可能存在的偏差来源,测试了人口统计学协变量对预测结果的解释程度,并调查了基于人口统计学变量的错误分类:我们招募了俄亥俄州、肯塔基州、印第安纳州和田纳西州 10-18 岁的初中或高中学生。参考标准结果由法医精神病学家确定为 "高 "或 "低 "风险水平。ARIA 采用 L2 规则化逻辑回归,利用上下文和语义特征预测每个学生的风险等级。我们进行了三项分析:研究设计中的风险 PROBAST 分析;作为协变量的人口统计学变量分析;以及预测分析。协变量包括种族、性别、民族、家庭教育程度、家庭年收入、就诊时的年龄以及公共援助的使用情况:我们从 204 所学校招募了 412 名学生。ARIA的AUC为0.92,灵敏度为71%,NPV为77%,特异度为95%。其中,387 名具有完整人口统计学信息的学生被纳入分析。在所有人口统计学变量中,单个线性回归的决定系数均小于 0.08。当使用所有人口统计学变量预测 ARIA 风险评估得分时,多元线性回归模型的决定系数为 0.189。黑人亚组的假阴性率(FNR)为 15.2%(CI [0 - 40]),其他种族的假阴性率(FNR)为 12.7%(CI [0 - 41.4]),而白人亚组的假阴性率(FNR)为 26.1%(CI [14.1 - 41.8]):需要进行偏差评估,以解决机器学习中的不足。在我们的工作中,学生的种族、民族、性别、使用公共援助和家庭年收入并不能解释 ARIA 对学生的风险评估得分。我们将继续定期对 ARIA 进行评估,并增加实验对象的招募。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Biomedical Informatics
Journal of Biomedical Informatics 医学-计算机:跨学科应用
CiteScore
8.90
自引率
6.70%
发文量
243
审稿时长
32 days
期刊介绍: The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.
期刊最新文献
Enhanced heart failure mortality prediction through model-independent hybrid feature selection and explainable machine learning Precision Drug Repurposing (PDR): Patient-level modeling and prediction combining foundational knowledge graph with biobank data Enhancing clinical data warehousing with provenance data to support longitudinal analyses and large file management : The gitOmmix approach for genomic and image data. From GPT to DeepSeek: Significant gaps remain in realizing AI in healthcare. Improving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1