Preprocessing of natural language process variables using a data-driven method improves the association with suicide risk in a large veterans affairs population

IF 6.3 2区 医学 Q1 BIOLOGY Computers in biology and medicine Pub Date : 2025-05-01 Epub Date: 2025-03-05 DOI:10.1016/j.compbiomed.2025.109939
Siting Li , Maxwell Levis , Monica DiMambro , Weiyi Wu , Joshua Levy , Brian Shiner , Jiang Gui
{"title":"Preprocessing of natural language process variables using a data-driven method improves the association with suicide risk in a large veterans affairs population","authors":"Siting Li ,&nbsp;Maxwell Levis ,&nbsp;Monica DiMambro ,&nbsp;Weiyi Wu ,&nbsp;Joshua Levy ,&nbsp;Brian Shiner ,&nbsp;Jiang Gui","doi":"10.1016/j.compbiomed.2025.109939","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Suicide risk assessment has historically relied heavily on clinical evaluations and patient self-reports. Natural language processing (NLP) of electronic health records (EHRs) provides an alternative approach for extracting risk predictors from clinical notes. Modeling NLP variables, however, is challenging because of zero inflation and skewed distributions. Therefore, we evaluated whether an adaptive-mixture-categorization (AMC) method could optimize the suicide risk predictive capacity of NLP data extracted from Veterans Affairs (VA) EHR notes.</div></div><div><h3>Methods</h3><div>NLP variables for 25,342 patients were analyzed using the SÉANCE python package. The AMC method was employed to categorize NLP measures into distinct groups to maximize the between-category variance. Associations between suicide outcomes and AMC-categorized NLP variables were compared to those between the original and quantile-categorized NLP variables.</div></div><div><h3>Results</h3><div>AMC-categorized variables showed stronger associations with suicide risk than other approaches did in the full cohort analysis and sensitivity analyses by subsampling bootstrapping. Additionally, over 90 % of the NLP variables were significantly associated with suicide risk in univariate analyses, indicating the relevance of clinical notes in suicide prevention.</div></div><div><h3>Conclusion</h3><div>AMC-based categorization substantially enhanced the suicide predictive capacity of NLP variables extracted from clinical text. Transforming skewed NLP data with the AMC method holds promise for improving risk prediction models.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"189 ","pages":"Article 109939"},"PeriodicalIF":6.3000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525002902","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/5 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

Suicide risk assessment has historically relied heavily on clinical evaluations and patient self-reports. Natural language processing (NLP) of electronic health records (EHRs) provides an alternative approach for extracting risk predictors from clinical notes. Modeling NLP variables, however, is challenging because of zero inflation and skewed distributions. Therefore, we evaluated whether an adaptive-mixture-categorization (AMC) method could optimize the suicide risk predictive capacity of NLP data extracted from Veterans Affairs (VA) EHR notes.

Methods

NLP variables for 25,342 patients were analyzed using the SÉANCE python package. The AMC method was employed to categorize NLP measures into distinct groups to maximize the between-category variance. Associations between suicide outcomes and AMC-categorized NLP variables were compared to those between the original and quantile-categorized NLP variables.

Results

AMC-categorized variables showed stronger associations with suicide risk than other approaches did in the full cohort analysis and sensitivity analyses by subsampling bootstrapping. Additionally, over 90 % of the NLP variables were significantly associated with suicide risk in univariate analyses, indicating the relevance of clinical notes in suicide prevention.

Conclusion

AMC-based categorization substantially enhanced the suicide predictive capacity of NLP variables extracted from clinical text. Transforming skewed NLP data with the AMC method holds promise for improving risk prediction models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用数据驱动的方法对自然语言过程变量进行预处理,改善了与大量退伍军人事务人群自杀风险的关联
目的自杀风险评估历来严重依赖临床评估和患者自我报告。电子健康记录(EHRs)的自然语言处理(NLP)为从临床记录中提取风险预测因子提供了一种替代方法。然而,由于零膨胀和偏态分布,对NLP变量建模是具有挑战性的。因此,我们评估了自适应混合分类(AMC)方法是否可以优化从退伍军人事务(VA) EHR记录中提取的NLP数据的自杀风险预测能力。方法使用SÉANCE python软件包对25,342例患者的snlp变量进行分析。采用AMC方法将NLP测度分成不同的组,使类间方差最大化。将自杀结果与amc分类的NLP变量之间的关联与原始和分位数分类的NLP变量之间的关联进行比较。结果samc分类变量在全队列分析和次抽样自举敏感性分析中与自杀风险的相关性强于其他方法。此外,在单变量分析中,超过90%的NLP变量与自杀风险显著相关,表明临床记录与自杀预防的相关性。结论基于amc的分类显著提高了临床文本NLP变量的自杀预测能力。用AMC方法转换偏斜的NLP数据有望改善风险预测模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
期刊最新文献
Forecasting-based biomedical time-series data synthesis for open data and robust AI Precise oxygen therapy to emphysema patients by fuzzy-based gain tuning control of set-point regulated MRAC MIPHEI-ViT: Multiplex immunofluorescence prediction from H&E images using ViT foundation models Deep learning with limited data: a transfer learning approach for transcriptomic survival prediction Noninvasive heart rate estimation using semantic segmentation and parameter optimization on 4K UAV videos
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1