从临床笔记中大规模识别健康的社会和行为决定因素:潜在语义索引和生成式预训练转换器 (GPT) 模型的比较。

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC ACS Applied Electronic Materials Pub Date : 2024-10-10 DOI:10.1186/s12911-024-02705-x
Sujoy Roy, Shane Morrell, Lili Zhao, Ramin Homayouni
{"title":"从临床笔记中大规模识别健康的社会和行为决定因素:潜在语义索引和生成式预训练转换器 (GPT) 模型的比较。","authors":"Sujoy Roy, Shane Morrell, Lili Zhao, Ramin Homayouni","doi":"10.1186/s12911-024-02705-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Social and behavioral determinants of health (SBDH) are associated with a variety of health and utilization outcomes, yet these factors are not routinely documented in the structured fields of electronic health records (EHR). The objective of this study was to evaluate different machine learning approaches for detection of SBDH from the unstructured clinical notes in the EHR.</p><p><strong>Methods: </strong>Latent Semantic Indexing (LSI) was applied to 2,083,180 clinical notes corresponding to 46,146 patients in the MIMIC-III dataset. Using LSI, patients were ranked based on conceptual relevance to a set of keywords (lexicons) pertaining to 15 different SBDH categories. For Generative Pretrained Transformer (GPT) models, API requests were made with a Python script to connect to the OpenAI services in Azure, using gpt-3.5-turbo-1106 and gpt-4-1106-preview models. Prediction of SBDH categories were performed using a logistic regression model that included age, gender, race and SBDH ICD-9 codes.</p><p><strong>Results: </strong>LSI retrieved patients according to 15 SBDH domains, with an overall average PPV <math><mo>≥</mo></math> 83%. Using manually curated gold standard (GS) sets for nine SBDH categories, the macro-F1 score of LSI (0.74) was better than ICD-9 (0.71) and GPT-3.5 (0.54), but lower than GPT-4 (0.80). Due to document size limitations, only a subset of the GS cases could be processed by GPT-3.5 (55.8%) and GPT-4 (94.2%), compared to LSI (100%). Using common GS subsets for nine different SBDH categories, the macro-F1 of ICD-9 combined with either LSI (mean 0.88, 95% CI 0.82-0.93), GPT-3.5 (0.86, 0.82-0.91) or GPT-4 (0.88, 0.83-0.94) was not significantly different. After including age, gender, race and ICD-9 in a logistic regression model, the AUC for prediction of six out of the nine SBDH categories was higher for LSI compared to GPT-4.0.</p><p><strong>Conclusions: </strong>These results demonstrate that the LSI approach performs comparable to more recent large language models, such as GPT-3.5 and GPT-4.0, when using the same set of documents. Importantly, LSI is robust, deterministic, and does not have document-size limitations or cost implications, which make it more amenable to real-world applications in health systems.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11465786/pdf/","citationCount":"0","resultStr":"{\"title\":\"Large-scale identification of social and behavioral determinants of health from clinical notes: comparison of Latent Semantic Indexing and Generative Pretrained Transformer (GPT) models.\",\"authors\":\"Sujoy Roy, Shane Morrell, Lili Zhao, Ramin Homayouni\",\"doi\":\"10.1186/s12911-024-02705-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Social and behavioral determinants of health (SBDH) are associated with a variety of health and utilization outcomes, yet these factors are not routinely documented in the structured fields of electronic health records (EHR). The objective of this study was to evaluate different machine learning approaches for detection of SBDH from the unstructured clinical notes in the EHR.</p><p><strong>Methods: </strong>Latent Semantic Indexing (LSI) was applied to 2,083,180 clinical notes corresponding to 46,146 patients in the MIMIC-III dataset. Using LSI, patients were ranked based on conceptual relevance to a set of keywords (lexicons) pertaining to 15 different SBDH categories. For Generative Pretrained Transformer (GPT) models, API requests were made with a Python script to connect to the OpenAI services in Azure, using gpt-3.5-turbo-1106 and gpt-4-1106-preview models. Prediction of SBDH categories were performed using a logistic regression model that included age, gender, race and SBDH ICD-9 codes.</p><p><strong>Results: </strong>LSI retrieved patients according to 15 SBDH domains, with an overall average PPV <math><mo>≥</mo></math> 83%. Using manually curated gold standard (GS) sets for nine SBDH categories, the macro-F1 score of LSI (0.74) was better than ICD-9 (0.71) and GPT-3.5 (0.54), but lower than GPT-4 (0.80). Due to document size limitations, only a subset of the GS cases could be processed by GPT-3.5 (55.8%) and GPT-4 (94.2%), compared to LSI (100%). Using common GS subsets for nine different SBDH categories, the macro-F1 of ICD-9 combined with either LSI (mean 0.88, 95% CI 0.82-0.93), GPT-3.5 (0.86, 0.82-0.91) or GPT-4 (0.88, 0.83-0.94) was not significantly different. After including age, gender, race and ICD-9 in a logistic regression model, the AUC for prediction of six out of the nine SBDH categories was higher for LSI compared to GPT-4.0.</p><p><strong>Conclusions: </strong>These results demonstrate that the LSI approach performs comparable to more recent large language models, such as GPT-3.5 and GPT-4.0, when using the same set of documents. Importantly, LSI is robust, deterministic, and does not have document-size limitations or cost implications, which make it more amenable to real-world applications in health systems.</p>\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11465786/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-024-02705-x\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02705-x","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

背景:健康的社会和行为决定因素(SBDH)与各种健康和使用结果相关,但这些因素并未在电子健康记录(EHR)的结构化字段中得到常规记录。本研究的目的是评估不同的机器学习方法,以便从电子病历中的非结构化临床笔记中检测出 SBDH:方法:对 MIMIC-III 数据集中 46,146 名患者的 2,083,180 份临床笔记应用潜语义索引(LSI)。使用 LSI,根据与 15 个不同的 SBDH 类别相关的一组关键词(词库)的概念相关性对患者进行排序。对于生成式预训练转换器(GPT)模型,使用 gpt-3.5-turbo-1106 和 gpt-4-1106-preview 模型,通过 Python 脚本连接到 Azure 中的 OpenAI 服务,提出 API 请求。使用包含年龄、性别、种族和 SBDH ICD-9 代码的逻辑回归模型对 SBDH 类别进行预测:结果:LSI 可根据 15 个 SBDH 领域检索患者,总体平均 PPV ≥ 83%。使用人工策划的九个 SBDH 类别的金标准(GS)集,LSI 的宏观-F1 得分(0.74)优于 ICD-9(0.71)和 GPT-3.5(0.54),但低于 GPT-4(0.80)。由于文档大小的限制,与 LSI(100%)相比,GPT-3.5(55.8%)和 GPT-4 (94.2%)只能处理部分 GS 病例。使用九种不同的 SBDH 类别的通用 GS 子集,ICD-9 与 LSI(平均值 0.88,95% CI 0.82-0.93)、GPT-3.5(0.86,0.82-0.91)或 GPT-4 (0.88,0.83-0.94)相结合的宏 F1 没有显著差异。将年龄、性别、种族和 ICD-9 纳入逻辑回归模型后,与 GPT-4.0 相比,LSI 预测 9 个 SBDH 类别中 6 个类别的 AUC 更高:这些结果表明,在使用相同的文档集时,LSI 方法的性能可与 GPT-3.5 和 GPT-4.0 等最新的大型语言模型相媲美。重要的是,LSI 具有稳健性、确定性,而且没有文档大小的限制或成本影响,因此更适合在医疗系统中实际应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Large-scale identification of social and behavioral determinants of health from clinical notes: comparison of Latent Semantic Indexing and Generative Pretrained Transformer (GPT) models.

Background: Social and behavioral determinants of health (SBDH) are associated with a variety of health and utilization outcomes, yet these factors are not routinely documented in the structured fields of electronic health records (EHR). The objective of this study was to evaluate different machine learning approaches for detection of SBDH from the unstructured clinical notes in the EHR.

Methods: Latent Semantic Indexing (LSI) was applied to 2,083,180 clinical notes corresponding to 46,146 patients in the MIMIC-III dataset. Using LSI, patients were ranked based on conceptual relevance to a set of keywords (lexicons) pertaining to 15 different SBDH categories. For Generative Pretrained Transformer (GPT) models, API requests were made with a Python script to connect to the OpenAI services in Azure, using gpt-3.5-turbo-1106 and gpt-4-1106-preview models. Prediction of SBDH categories were performed using a logistic regression model that included age, gender, race and SBDH ICD-9 codes.

Results: LSI retrieved patients according to 15 SBDH domains, with an overall average PPV 83%. Using manually curated gold standard (GS) sets for nine SBDH categories, the macro-F1 score of LSI (0.74) was better than ICD-9 (0.71) and GPT-3.5 (0.54), but lower than GPT-4 (0.80). Due to document size limitations, only a subset of the GS cases could be processed by GPT-3.5 (55.8%) and GPT-4 (94.2%), compared to LSI (100%). Using common GS subsets for nine different SBDH categories, the macro-F1 of ICD-9 combined with either LSI (mean 0.88, 95% CI 0.82-0.93), GPT-3.5 (0.86, 0.82-0.91) or GPT-4 (0.88, 0.83-0.94) was not significantly different. After including age, gender, race and ICD-9 in a logistic regression model, the AUC for prediction of six out of the nine SBDH categories was higher for LSI compared to GPT-4.0.

Conclusions: These results demonstrate that the LSI approach performs comparable to more recent large language models, such as GPT-3.5 and GPT-4.0, when using the same set of documents. Importantly, LSI is robust, deterministic, and does not have document-size limitations or cost implications, which make it more amenable to real-world applications in health systems.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
期刊最新文献
Hyperbaric oxygen treatment promotes tendon-bone interface healing in a rabbit model of rotator cuff tears. Oxygen-ozone therapy for myocardial ischemic stroke and cardiovascular disorders. Comparative study on the anti-inflammatory and protective effects of different oxygen therapy regimens on lipopolysaccharide-induced acute lung injury in mice. Heme oxygenase/carbon monoxide system and development of the heart. Hyperbaric oxygen for moderate-to-severe traumatic brain injury: outcomes 5-8 years after injury.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1