首页 > 最新文献

JAMIA Open最新文献

英文 中文
Using routinely available electronic health record data elements to develop and validate a digital divide risk score.
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-02-04 eCollection Date: 2025-02-01 DOI: 10.1093/jamiaopen/ooaf004
Jamie M Faro, Emily Obermiller, Corey Obermiller, Katy E Trinkley, Garth Wright, Rajani S Sadasivam, Kristie L Foley, Sarah L Cutrona, Thomas K Houston

Background: Digital health (patient portals, remote monitoring devices, video visits) is a routine part of health care, though the digital divide may affect access.

Objectives: To test and validate an electronic health record (EHR) screening tool to identify patients at risk of the digital divide.

Materials and methods: We conducted a retrospective EHR data extraction and cross-sectional survey of participants within 1 health care system. We identified 4 potential digital divide markers from the EHR: (1) mobile phone number, (2) email address, (3) active patient portal, and (4) >2 patient portal logins in the last year. We mailed surveys to patients at higher risk (missing all 4 markers), intermediate risk (missing 1-3 markers), or lower risk (missing no markers). Combining EHR and survey data, we summarized the markers into risk scores and evaluated its association with patients' report of lack of Internet access. Then, we assessed the association of EHR markers and eHealth Literacy Scale survey outcomes.

Results: A total of 249 patients (39.4%) completed the survey (53%>65 years, 51% female, 50% minority race, 55% rural/small town residents, 46% private insurance, 45% Medicare). Individually, the 4 EHR markers had high sensitivity (range 81%-95%) and specificity (range 65%-79%) compared with survey responses. The EHR marker-based score (high risk, intermediate risk, low risk) predicted absence of Internet access (receiver operator characteristics c-statistic=0.77). Mean digital health literacy scores significantly decreased as her marker digital divide risk increased (P  <.001).

Discussion: Each of the four EHR markers (Cell phone, email address, patient portal active, and patient portal actively used) compared with self-report yielded high levels of sensitivity, specificity, and overall accuracy.

Conclusion: Using these markers, health care systems could target interventions and implementation strategies to support equitable patient access to digital health.

背景:数字医疗(患者门户网站、远程监控设备、视频访问)是医疗保健的常规组成部分,但数字鸿沟可能会影响患者的使用:测试并验证一种电子健康记录(EHR)筛查工具,以识别面临数字鸿沟风险的患者:我们对 1 个医疗保健系统的参与者进行了回顾性电子病历数据提取和横断面调查。我们从电子病历中确定了 4 个潜在的数字鸿沟标记:(1) 手机号码;(2) 电子邮件地址;(3) 活跃的患者门户网站;(4) 去年登录患者门户网站超过 2 次。我们向高风险(缺少所有 4 个标记)、中风险(缺少 1-3 个标记)或低风险(没有标记)患者邮寄了调查问卷。结合电子病历和调查数据,我们将标记总结为风险评分,并评估其与患者报告的无法访问互联网的关联性。然后,我们评估了电子健康记录标记与电子健康素养量表调查结果之间的关联:共有 249 名患者(39.4%)完成了调查(53% 年龄大于 65 岁,51% 为女性,50% 为少数民族,55% 为农村/小城镇居民,46% 有私人保险,45% 有医疗保险)。与调查回答相比,4 个电子病历标记具有较高的灵敏度(范围为 81%-95%)和特异性(范围为 65%-79%)。基于电子病历标记的评分(高风险、中度风险、低风险)可预测未上网情况(接收者运算特征 c 统计量=0.77)。随着标记数字鸿沟风险的增加,数字健康素养的平均得分显著下降(P 讨论):与自我报告相比,四种电子健康记录标记(手机、电子邮件地址、活跃的患者门户网站和积极使用的患者门户网站)均具有较高的灵敏度、特异性和总体准确性:利用这些指标,医疗保健系统可以有针对性地采取干预措施和实施策略,以支持患者公平地获得数字医疗服务。
{"title":"Using routinely available electronic health record data elements to develop and validate a digital divide risk score.","authors":"Jamie M Faro, Emily Obermiller, Corey Obermiller, Katy E Trinkley, Garth Wright, Rajani S Sadasivam, Kristie L Foley, Sarah L Cutrona, Thomas K Houston","doi":"10.1093/jamiaopen/ooaf004","DOIUrl":"10.1093/jamiaopen/ooaf004","url":null,"abstract":"<p><strong>Background: </strong>Digital health (patient portals, remote monitoring devices, video visits) is a routine part of health care, though the digital divide may affect access.</p><p><strong>Objectives: </strong>To test and validate an electronic health record (EHR) screening tool to identify patients at risk of the digital divide.</p><p><strong>Materials and methods: </strong>We conducted a retrospective EHR data extraction and cross-sectional survey of participants within 1 health care system. We identified 4 potential digital divide markers from the EHR: (1) mobile phone number, (2) email address, (3) active patient portal, and (4) >2 patient portal logins in the last year. We mailed surveys to patients at higher risk (missing all 4 markers), intermediate risk (missing 1-3 markers), or lower risk (missing no markers). Combining EHR and survey data, we summarized the markers into risk scores and evaluated its association with patients' report of lack of Internet access. Then, we assessed the association of EHR markers and eHealth Literacy Scale survey outcomes.</p><p><strong>Results: </strong>A total of 249 patients (39.4%) completed the survey (53%>65 years, 51% female, 50% minority race, 55% rural/small town residents, 46% private insurance, 45% Medicare). Individually, the 4 EHR markers had high sensitivity (range 81%-95%) and specificity (range 65%-79%) compared with survey responses. The EHR marker-based score (high risk, intermediate risk, low risk) predicted absence of Internet access (receiver operator characteristics <i>c</i>-statistic=0.77). Mean digital health literacy scores significantly decreased as her marker digital divide risk increased (<i>P</i>  <.001).</p><p><strong>Discussion: </strong>Each of the four EHR markers (Cell phone, email address, patient portal active, and patient portal actively used) compared with self-report yielded high levels of sensitivity, specificity, and overall accuracy.</p><p><strong>Conclusion: </strong>Using these markers, health care systems could target interventions and implementation strategies to support equitable patient access to digital health.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf004"},"PeriodicalIF":2.5,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11792649/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143190786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging deep learning to detect stance in Spanish tweets on COVID-19 vaccination.
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-01-31 eCollection Date: 2025-02-01 DOI: 10.1093/jamiaopen/ooaf007
Guillermo Blanco, Rubén Yáñez Martínez, Anália Lourenço

Objectives: The automatic detection of stance on social media is an important task for public health applications, especially in the context of health crises. Unfortunately, existing models are typically trained on English corpora. Considering the benefits of extending research to other widely spoken languages, the goal of this study is to develop stance detection models for social media posts in Spanish.

Materials and methods: A corpus of 6170 tweets about COVID-19 vaccination, posted between March 1, 2020 and January 4, 2022, was manually annotated by native speakers. Traditional predictive models were compared with deep learning models to ascertain a baseline performance for the detection of stance in Spanish tweets. The evaluation focused on the ability of multilingual and language-specific embeddings to contextualize the topic of those short texts adequately.

Results: The BERT-Multi+BiLSTM combination yielded the best results (macroaveraged F1 and Matthews correlation coefficient scores of 0.86 and 0.79, respectively; interpolated area under the receiver operating curve [AUC] of 0.95 for tweets against vaccination and 0.85 in favor of vaccination and a score of 0.97 for tweets containing no stance information), closely followed by the BETO+BiLSTM and RoBERTa BNE-LSTM Spanish models and the term frequency-inverse document frequency+SVM model (average AUC decrease of 0.01). The main differentiating factor among these models was the ability to predict tweets against vaccination.

Discussion: The BERT Multi+BILSTM model outperformed the other models in terms of per class prediction capacity. The main assumption is that language-specific embeddings do not outperform multilingual embeddings or TF-IDF features because of the context of the topic. The inherent context of BERT or RoBERTa embeddings is general. So, these embeddings are not familiar with the slang commonly used on Twitter and, more specifically, during the pandemic.

Conclusion: The best performing model detects tweet stance with performance high enough to ensure its usefulness for public health applications, namely awareness campaigns, misinformation detection and other early intervention and prevention actions seeking to improve an individual's well-being based on autoreported experiences and opinions. The dataset and code of the study are available on GitHub.

{"title":"Leveraging deep learning to detect stance in Spanish tweets on COVID-19 vaccination.","authors":"Guillermo Blanco, Rubén Yáñez Martínez, Anália Lourenço","doi":"10.1093/jamiaopen/ooaf007","DOIUrl":"10.1093/jamiaopen/ooaf007","url":null,"abstract":"<p><strong>Objectives: </strong>The automatic detection of stance on social media is an important task for public health applications, especially in the context of health crises. Unfortunately, existing models are typically trained on English corpora. Considering the benefits of extending research to other widely spoken languages, the goal of this study is to develop stance detection models for social media posts in Spanish.</p><p><strong>Materials and methods: </strong>A corpus of 6170 tweets about COVID-19 vaccination, posted between March 1, 2020 and January 4, 2022, was manually annotated by native speakers. Traditional predictive models were compared with deep learning models to ascertain a baseline performance for the detection of stance in Spanish tweets. The evaluation focused on the ability of multilingual and language-specific embeddings to contextualize the topic of those short texts adequately.</p><p><strong>Results: </strong>The BERT-Multi+BiLSTM combination yielded the best results (macroaveraged F1 and Matthews correlation coefficient scores of 0.86 and 0.79, respectively; interpolated area under the receiver operating curve [AUC] of 0.95 for tweets against vaccination and 0.85 in favor of vaccination and a score of 0.97 for tweets containing no stance information), closely followed by the BETO+BiLSTM and RoBERTa BNE-LSTM Spanish models and the term frequency-inverse document frequency+SVM model (average AUC decrease of 0.01). The main differentiating factor among these models was the ability to predict tweets against vaccination.</p><p><strong>Discussion: </strong>The BERT Multi+BILSTM model outperformed the other models in terms of per class prediction capacity. The main assumption is that language-specific embeddings do not outperform multilingual embeddings or TF-IDF features because of the context of the topic. The inherent context of BERT or RoBERTa embeddings is general. So, these embeddings are not familiar with the slang commonly used on Twitter and, more specifically, during the pandemic.</p><p><strong>Conclusion: </strong>The best performing model detects tweet stance with performance high enough to ensure its usefulness for public health applications, namely awareness campaigns, misinformation detection and other early intervention and prevention actions seeking to improve an individual's well-being based on autoreported experiences and opinions. The dataset and code of the study are available on GitHub.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf007"},"PeriodicalIF":2.5,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11854073/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143504472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
pyDeid: an improved, fast, flexible, and generalizable rule-based approach for deidentification of free-text medical records.
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-01-22 eCollection Date: 2025-02-01 DOI: 10.1093/jamiaopen/ooae152
Vaakesan Sundrelingam, Shireen Parimoo, Frances Pogacar, Radha Koppula, Saeha Shin, Chloe Pou-Prom, Surain B Roberts, Amol A Verma, Fahad Razak

Objectives: Deidentification of personally identifiable information in free-text clinical data is fundamental to making these data broadly available for research. However, there exist gaps in the deidentification landscape with regard to the functionality and flexibility of extant tools, as well as suboptimal tradeoffs between deidentification accuracy and speed. To address these gaps and tradeoffs, we develop a new Python-based deidentification software, pyDeid.

Materials and methods: pyDeid uses a combination of regular expression-based rules, fixed exclusion lists and inclusion lists to deidentify free-text data. Additional configurations of pyDeid include optional named entity recognition and custom name lists. We measure its deidentification performance and speed on 700 admission notes from a Canadian hospital, the publicly available n2c2 benchmark dataset of American discharge notes, as well as a synthetic dataset of artificial intelligence (AI) generated admission notes. We also compare its performance with the Physionet De-identification Software and the popular open-source Philter tool.

Results: Different configurations of pyDeid outperformed other tools on various metrics, with a "best" accuracy value of 0.988, best precision of 0.889, best recall of 0.950, and best F1 score of 0.904. All configurations of pyDeid were significantly faster than Philter and Physionet De-identification Software, with the fastest deidentification speed of 0.48 s per note.

Discussion and conclusions: pyDeid allows the flexibility to prioritize between performance and speed, as well as precision and recall, while addressing some of the gaps in functionality left by other tools. pyDeid is also generalizable to domains outside of clinical data and can be further customized for specific contexts or for particular workflows.

{"title":"pyDeid: an improved, fast, flexible, and generalizable rule-based approach for deidentification of free-text medical records.","authors":"Vaakesan Sundrelingam, Shireen Parimoo, Frances Pogacar, Radha Koppula, Saeha Shin, Chloe Pou-Prom, Surain B Roberts, Amol A Verma, Fahad Razak","doi":"10.1093/jamiaopen/ooae152","DOIUrl":"10.1093/jamiaopen/ooae152","url":null,"abstract":"<p><strong>Objectives: </strong>Deidentification of personally identifiable information in free-text clinical data is fundamental to making these data broadly available for research. However, there exist gaps in the deidentification landscape with regard to the functionality and flexibility of extant tools, as well as suboptimal tradeoffs between deidentification accuracy and speed. To address these gaps and tradeoffs, we develop a new Python-based deidentification software, pyDeid.</p><p><strong>Materials and methods: </strong>pyDeid uses a combination of regular expression-based rules, fixed exclusion lists and inclusion lists to deidentify free-text data. Additional configurations of pyDeid include optional named entity recognition and custom name lists. We measure its deidentification performance and speed on 700 admission notes from a Canadian hospital, the publicly available n2c2 benchmark dataset of American discharge notes, as well as a synthetic dataset of artificial intelligence (AI) generated admission notes. We also compare its performance with the Physionet De-identification Software and the popular open-source Philter tool.</p><p><strong>Results: </strong>Different configurations of pyDeid outperformed other tools on various metrics, with a \"best\" accuracy value of 0.988, best precision of 0.889, best recall of 0.950, and best F1 score of 0.904. All configurations of pyDeid were significantly faster than Philter and Physionet De-identification Software, with the fastest deidentification speed of 0.48 s per note.</p><p><strong>Discussion and conclusions: </strong>pyDeid allows the flexibility to prioritize between performance and speed, as well as precision and recall, while addressing some of the gaps in functionality left by other tools. pyDeid is also generalizable to domains outside of clinical data and can be further customized for specific contexts or for particular workflows.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooae152"},"PeriodicalIF":2.5,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11752853/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating dimensionality reduction of comorbidities for predictive modeling in individuals with neurofibromatosis type 1.
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-01-22 eCollection Date: 2025-02-01 DOI: 10.1093/jamiaopen/ooae157
Aditi Gupta, Ethan Hillis, Inez Y Oh, Stephanie M Morris, Zach Abrams, Randi E Foraker, David H Gutmann, Philip R O Payne

Objective: Dimensionality reduction techniques aim to enhance the performance of machine learning (ML) models by reducing noise and mitigating overfitting. We sought to compare the effect of different dimensionality reduction methods for comorbidity features extracted from electronic health records (EHRs) on the performance of ML models for predicting the development of various sub-phenotypes in children with Neurofibromatosis type 1 (NF1).

Materials and methods: EHR-derived data from pediatric subjects with a confirmed clinical diagnosis of NF1 were used to create 10 unique comorbidities code-derived feature sets by incorporating dimensionality reduction techniques using raw International Classification of Diseases codes, Clinical Classifications Software Refined, and Phecode mapping schemes. We compared the performance of logistic regression, XGBoost, and random forest models utilizing each feature set.

Results: XGBoost-based predictive models were most successful at predicting NF1 sub-phenotypes. Overall, features based on domain knowledge-informed mapping schema performed better than unsupervised feature reduction methods. High-level features exhibited the worst performance across models and outcomes, suggesting excessive information loss with over-aggregation of features.

Discussion: Model performance is significantly impacted by dimensionality reduction techniques and varies by specific ML algorithm and outcome being predicted. Automated methods using existing knowledge and ontology databases can effectively aggregate features extracted from EHRs.

Conclusion: Dimensionality reduction through feature aggregation can enhance the performance of ML models, particularly in high-dimensional datasets with small sample sizes, commonly found in EHRs health applications. However, if not carefully optimized, it can lead to information loss and data oversimplification, potentially adversely affecting model performance.

{"title":"Evaluating dimensionality reduction of comorbidities for predictive modeling in individuals with neurofibromatosis type 1.","authors":"Aditi Gupta, Ethan Hillis, Inez Y Oh, Stephanie M Morris, Zach Abrams, Randi E Foraker, David H Gutmann, Philip R O Payne","doi":"10.1093/jamiaopen/ooae157","DOIUrl":"10.1093/jamiaopen/ooae157","url":null,"abstract":"<p><strong>Objective: </strong>Dimensionality reduction techniques aim to enhance the performance of machine learning (ML) models by reducing noise and mitigating overfitting. We sought to compare the effect of different dimensionality reduction methods for comorbidity features extracted from electronic health records (EHRs) on the performance of ML models for predicting the development of various sub-phenotypes in children with Neurofibromatosis type 1 (NF1).</p><p><strong>Materials and methods: </strong>EHR-derived data from pediatric subjects with a confirmed clinical diagnosis of NF1 were used to create 10 unique comorbidities code-derived feature sets by incorporating dimensionality reduction techniques using raw International Classification of Diseases codes, Clinical Classifications Software Refined, and Phecode mapping schemes. We compared the performance of logistic regression, XGBoost, and random forest models utilizing each feature set.</p><p><strong>Results: </strong>XGBoost-based predictive models were most successful at predicting NF1 sub-phenotypes. Overall, features based on domain knowledge-informed mapping schema performed better than unsupervised feature reduction methods. High-level features exhibited the worst performance across models and outcomes, suggesting excessive information loss with over-aggregation of features.</p><p><strong>Discussion: </strong>Model performance is significantly impacted by dimensionality reduction techniques and varies by specific ML algorithm and outcome being predicted. Automated methods using existing knowledge and ontology databases can effectively aggregate features extracted from EHRs.</p><p><strong>Conclusion: </strong>Dimensionality reduction through feature aggregation can enhance the performance of ML models, particularly in high-dimensional datasets with small sample sizes, commonly found in EHRs health applications. However, if not carefully optimized, it can lead to information loss and data oversimplification, potentially adversely affecting model performance.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooae157"},"PeriodicalIF":2.5,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11752863/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scoping review of the reporting quality of reviews of commercially and publicly available mobile health apps. 对商业和公开提供的移动健康应用程序的报告质量进行范围审查。
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-01-13 eCollection Date: 2025-02-01 DOI: 10.1093/jamiaopen/ooae159
Norina Gasteiger, Gill Norman, Rebecca Grainger, Sabine N van der Veer, Lisa McGarrigle, Debra Jones, Charlotte Eost-Telling, Amy Vercell, Claire R Ford, Syed Mustafa Ali, Kate Law, Qimeng Zhao, Matthew Byerly, Chunhu Shi, Alan Davies, Alex Hall, Dawn Dowding

Objectives: There is no guidance to support the reporting of systematic reviews of mobile health (mhealth) apps (app reviews), so authors attempt to use/modify the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). There is a need for reporting guidance, building on PRISMA where appropriate, tailored to app reviews. The objectives were to describe the reporting quality of published mHealth app reviews, identify the need for, and develop potential candidate items for a reporting guideline.

Materials and methods: A scoping review following the Joanna Briggs Institute and Arksey and O'Malley approaches. App reviews were identified in January 2024 from SCOPUS, CINAHL, AMED, EMBASE, Medline, PsycINFO, ACM Digital Library, snowballing reference lists, and forward citation searches. Data were extracted into Excel and analyzed using descriptive statistics and content synthesis, using PRISMA items as a framework.

Results: One hundred and seventy-one app reviews were identified, published from 2013 to 2024. Protocols were developed for 11% of the reviews, and only 52% reported the geographical location of the app markets. Few reported the duplicate removal process (12%), device and operating system used (30%), or made clear recommendations for the best-rated apps (18%). Nineteen PRISMA items were not reported by most (>85%) reviews, and 4 were modified by >30% of the reviews. Involvement of patient/public contributors (4%) or other stakeholders (11%) was infrequent. Overall, 34 candidate items and 10 subitems were identified to be considered for a new guideline.

Discussion and conclusion: App reviews were inconsistently reported, and many PRISMA items were not deemed relevant. Consensus work is needed to revise and prioritize the candidate items for a reporting guideline for systematic app reviews.

目的:没有指南来支持移动健康(mhealth)应用程序(应用程序评论)的系统评论报告,因此作者试图使用/修改系统评论和荟萃分析的首选报告项目(PRISMA)。我们需要报告指南,在适当的时候建立在PRISMA的基础上,并根据应用程序评论进行调整。目的是描述已发布的移动健康应用程序评论的报告质量,确定报告指南的需求,并开发潜在的候选项目。材料和方法:根据乔安娜布里格斯研究所和阿克西和奥马利的方法进行范围审查。应用程序评论于2024年1月从SCOPUS, CINAHL, AMED, EMBASE, Medline, PsycINFO, ACM数字图书馆,滚雪球参考文献列表和转发引文检索中确定。将数据提取到Excel中,并以PRISMA项目为框架,采用描述性统计和内容综合的方法进行分析。结果:从2013年到2024年,共发现了171条应用评论。只有11%的评论制定了协议,只有52%的评论报告了应用市场的地理位置。很少有人报告了重复删除过程(12%),使用的设备和操作系统(30%),或者明确推荐评价最高的应用(18%)。19个PRISMA项目未被大多数(>85%)评论报道,4个项目被>30%的评论修改。患者/公众贡献者(4%)或其他利益相关者(11%)的参与并不常见。总的来说,34个候选项目和10个次级项目被确定为新的准则。讨论和结论:应用程序评论的报告不一致,许多PRISMA项目被认为不相关。需要达成共识的工作来修改和优先考虑系统应用评论报告指南的候选项目。
{"title":"A scoping review of the reporting quality of reviews of commercially and publicly available mobile health apps.","authors":"Norina Gasteiger, Gill Norman, Rebecca Grainger, Sabine N van der Veer, Lisa McGarrigle, Debra Jones, Charlotte Eost-Telling, Amy Vercell, Claire R Ford, Syed Mustafa Ali, Kate Law, Qimeng Zhao, Matthew Byerly, Chunhu Shi, Alan Davies, Alex Hall, Dawn Dowding","doi":"10.1093/jamiaopen/ooae159","DOIUrl":"10.1093/jamiaopen/ooae159","url":null,"abstract":"<p><strong>Objectives: </strong>There is no guidance to support the reporting of systematic reviews of mobile health (mhealth) apps (app reviews), so authors attempt to use/modify the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). There is a need for reporting guidance, building on PRISMA where appropriate, tailored to app reviews. The objectives were to describe the reporting quality of published mHealth app reviews, identify the need for, and develop potential candidate items for a reporting guideline.</p><p><strong>Materials and methods: </strong>A scoping review following the Joanna Briggs Institute and Arksey and O'Malley approaches. App reviews were identified in January 2024 from SCOPUS, CINAHL, AMED, EMBASE, Medline, PsycINFO, ACM Digital Library, snowballing reference lists, and forward citation searches. Data were extracted into Excel and analyzed using descriptive statistics and content synthesis, using PRISMA items as a framework.</p><p><strong>Results: </strong>One hundred and seventy-one app reviews were identified, published from 2013 to 2024. Protocols were developed for 11% of the reviews, and only 52% reported the geographical location of the app markets. Few reported the duplicate removal process (12%), device and operating system used (30%), or made clear recommendations for the best-rated apps (18%). Nineteen PRISMA items were not reported by most (>85%) reviews, and 4 were modified by >30% of the reviews. Involvement of patient/public contributors (4%) or other stakeholders (11%) was infrequent. Overall, 34 candidate items and 10 subitems were identified to be considered for a new guideline.</p><p><strong>Discussion and conclusion: </strong>App reviews were inconsistently reported, and many PRISMA items were not deemed relevant. Consensus work is needed to revise and prioritize the candidate items for a reporting guideline for systematic app reviews.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooae159"},"PeriodicalIF":2.5,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11729727/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142984990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty estimation in diagnosis generation from large language models: next-word probability is not pre-test probability. 大型语言模型诊断生成中的不确定性估计:下一词概率不是测试前概率。
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-01-10 eCollection Date: 2025-02-01 DOI: 10.1093/jamiaopen/ooae154
Yanjun Gao, Skatje Myers, Shan Chen, Dmitriy Dligach, Timothy Miller, Danielle S Bitterman, Guanhua Chen, Anoop Mayampurath, Matthew M Churpek, Majid Afshar

Objective: To evaluate large language models (LLMs) for pre-test diagnostic probability estimation and compare their uncertainty estimation performance with a traditional machine learning classifier.

Materials and methods: We assessed 2 instruction-tuned LLMs, Mistral-7B-Instruct and Llama3-70B-chat-hf, on predicting binary outcomes for Sepsis, Arrhythmia, and Congestive Heart Failure (CHF) using electronic health record (EHR) data from 660 patients. Three uncertainty estimation methods-Verbalized Confidence, Token Logits, and LLM Embedding+XGB-were compared against an eXtreme Gradient Boosting (XGB) classifier trained on raw EHR data. Performance metrics included AUROC and Pearson correlation between predicted probabilities.

Results: The XGB classifier outperformed the LLM-based methods across all tasks. LLM Embedding+XGB showed the closest performance to the XGB baseline, while Verbalized Confidence and Token Logits underperformed.

Discussion: These findings, consistent across multiple models and demographic groups, highlight the limitations of current LLMs in providing reliable pre-test probability estimations and underscore the need for improved calibration and bias mitigation strategies. Future work should explore hybrid approaches that integrate LLMs with numerical reasoning modules and calibrated embeddings to enhance diagnostic accuracy and ensure fairer predictions across diverse populations.

Conclusions: LLMs demonstrate potential but currently fall short in estimating diagnostic probabilities compared to traditional machine learning classifiers trained on structured EHR data. Further improvements are needed for reliable clinical use.

目的:评估大型语言模型(LLMs)在测试前诊断概率估计中的应用,并将其不确定性估计性能与传统机器学习分类器进行比较。材料和方法:我们利用660例患者的电子健康记录(EHR)数据,评估了2种指令调整LLMs (Mistral-7B-Instruct和Llama3-70B-chat-hf)预测败血症、心律失常和充血性心力衰竭(CHF)的二元预后。将三种不确定性估计方法——语言置信度、令牌Logits和LLM嵌入+XGB——与基于原始EHR数据训练的极限梯度增强(XGB)分类器进行了比较。性能指标包括AUROC和预测概率之间的Pearson相关性。结果:XGB分类器在所有任务中都优于基于llm的方法。LLM嵌入+XGB显示出最接近XGB基线的性能,而Verbalized Confidence和Token Logits表现不佳。讨论:这些发现在多个模型和人口群体中是一致的,突出了当前llm在提供可靠的检验前概率估计方面的局限性,并强调了改进校准和减少偏差策略的必要性。未来的工作应该探索将法学硕士与数值推理模块和校准嵌入相结合的混合方法,以提高诊断准确性,并确保在不同人群中进行更公平的预测。结论:与基于结构化电子病历数据训练的传统机器学习分类器相比,llm显示出了潜力,但目前在估计诊断概率方面还存在不足。为了可靠的临床应用,需要进一步的改进。
{"title":"Uncertainty estimation in diagnosis generation from large language models: next-word probability is not pre-test probability.","authors":"Yanjun Gao, Skatje Myers, Shan Chen, Dmitriy Dligach, Timothy Miller, Danielle S Bitterman, Guanhua Chen, Anoop Mayampurath, Matthew M Churpek, Majid Afshar","doi":"10.1093/jamiaopen/ooae154","DOIUrl":"10.1093/jamiaopen/ooae154","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate large language models (LLMs) for pre-test diagnostic probability estimation and compare their uncertainty estimation performance with a traditional machine learning classifier.</p><p><strong>Materials and methods: </strong>We assessed 2 instruction-tuned LLMs, Mistral-7B-Instruct and Llama3-70B-chat-hf, on predicting binary outcomes for Sepsis, Arrhythmia, and Congestive Heart Failure (CHF) using electronic health record (EHR) data from 660 patients. Three uncertainty estimation methods-Verbalized Confidence, Token Logits, and LLM Embedding+XGB-were compared against an eXtreme Gradient Boosting (XGB) classifier trained on raw EHR data. Performance metrics included AUROC and Pearson correlation between predicted probabilities.</p><p><strong>Results: </strong>The XGB classifier outperformed the LLM-based methods across all tasks. LLM Embedding+XGB showed the closest performance to the XGB baseline, while Verbalized Confidence and Token Logits underperformed.</p><p><strong>Discussion: </strong>These findings, consistent across multiple models and demographic groups, highlight the limitations of current LLMs in providing reliable pre-test probability estimations and underscore the need for improved calibration and bias mitigation strategies. Future work should explore hybrid approaches that integrate LLMs with numerical reasoning modules and calibrated embeddings to enhance diagnostic accuracy and ensure fairer predictions across diverse populations.</p><p><strong>Conclusions: </strong>LLMs demonstrate potential but currently fall short in estimating diagnostic probabilities compared to traditional machine learning classifiers trained on structured EHR data. Further improvements are needed for reliable clinical use.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooae154"},"PeriodicalIF":2.5,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11723528/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142972440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing Artificial Intelligence (AI) Implementation for Assisting Gene Linking (at the National Library of Medicine). 评估人工智能(AI)实施协助基因链接(在国家医学图书馆)。
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-01-07 eCollection Date: 2025-02-01 DOI: 10.1093/jamiaopen/ooae129
Rezarta Islamaj, Chih-Hsuan Wei, Po-Ting Lai, Melanie Huston, Cathleen Coss, Preeti Gokal Kochar, Nicholas Miliaras, James G Mork, Oleg Rodionov, Keiko Sekiya, Dorothy Trinh, Deborah Whitman, Craig Wallin, Zhiyong Lu

Objectives: The National Library of Medicine (NLM) currently indexes close to a million articles each year pertaining to more than 5300 medicine and life sciences journals. Of these, a significant number of articles contain critical information about the structure, genetics, and function of genes and proteins in normal and disease states. These articles are identified by the NLM curators, and a manual link is created between these articles and the corresponding gene records at the NCBI Gene database. Thus, the information is interconnected with all the NLM resources, services which bring considerable value to life sciences. National Library of Medicine aims to provide timely access to all metadata, and this necessitates that the article indexing scales to the volume of the published literature. On the other hand, although automatic information extraction methods have been shown to achieve accurate results in biomedical text mining research, it remains difficult to evaluate them on established pipelines and integrate them within the daily workflows.

Materials and methods: Here, we demonstrate how our machine learning model, GNorm2, which achieved state-of-the art performance on identifying genes and their corresponding species at the same time handling innate textual ambiguities, could be integrated with the established daily workflow at the NLM and evaluated for its performance in this new environment.

Results: We worked with 8 biomedical curator experts and evaluated the integration using these parameters: (1) gene identification accuracy, (2) interannotator agreement with and without GNorm2, (3) GNorm2 potential bias, and (4) indexing consistency and efficiency. We identified key interface changes that significantly helped the curators to maximize the GNorm2 benefit, and further improved the GNorm2 algorithm to cover 135 species of genes including viral and bacterial genes, based on the biocurator expert survey.

Conclusion: GNorm2 is currently in the process of being fully integrated into the regular curator's workflow.

目标:国家医学图书馆(NLM)目前每年索引近100万篇文章,涉及5300多种医学和生命科学期刊。其中,相当数量的文章包含关于正常和疾病状态下基因和蛋白质的结构、遗传学和功能的关键信息。这些文章由NLM管理员识别,并在这些文章和NCBI基因数据库中相应的基因记录之间创建手动链接。因此,这些信息与所有NLM资源和服务相互关联,这些资源和服务为生命科学带来了可观的价值。国家医学图书馆的目标是提供对所有元数据的及时访问,这就要求文章索引与已发表文献的数量相匹配。另一方面,尽管自动信息提取方法在生物医学文本挖掘研究中获得了准确的结果,但在已建立的管道上对其进行评估并将其集成到日常工作流程中仍然很困难。材料和方法:在这里,我们展示了我们的机器学习模型GNorm2,它在识别基因及其相应物种的同时处理先天文本歧歧方面取得了最先进的性能,可以与NLM建立的日常工作流程相结合,并评估其在这个新环境中的性能。结果:我们与8位生物医学管理员专家合作,使用以下参数对整合进行了评估:(1)基因鉴定准确性,(2)使用和不使用GNorm2的注释者一致性,(3)GNorm2潜在偏倚,(4)索引一致性和效率。我们确定了关键的界面变化,显著地帮助管理员最大化GNorm2的利益,并进一步改进GNorm2算法,以覆盖135种基因,包括病毒和细菌基因,基于生物管理员专家调查。结论:GNorm2目前正处于完全集成到常规策展人工作流程的过程中。
{"title":"Assessing Artificial Intelligence (AI) Implementation for Assisting Gene Linking (at the National Library of Medicine).","authors":"Rezarta Islamaj, Chih-Hsuan Wei, Po-Ting Lai, Melanie Huston, Cathleen Coss, Preeti Gokal Kochar, Nicholas Miliaras, James G Mork, Oleg Rodionov, Keiko Sekiya, Dorothy Trinh, Deborah Whitman, Craig Wallin, Zhiyong Lu","doi":"10.1093/jamiaopen/ooae129","DOIUrl":"10.1093/jamiaopen/ooae129","url":null,"abstract":"<p><strong>Objectives: </strong>The National Library of Medicine (NLM) currently indexes close to a million articles each year pertaining to more than 5300 medicine and life sciences journals. Of these, a significant number of articles contain critical information about the structure, genetics, and function of genes and proteins in normal and disease states. These articles are identified by the NLM curators, and a manual link is created between these articles and the corresponding gene records at the NCBI Gene database. Thus, the information is interconnected with all the NLM resources, services which bring considerable value to life sciences. National Library of Medicine aims to provide timely access to all metadata, and this necessitates that the article indexing scales to the volume of the published literature. On the other hand, although automatic information extraction methods have been shown to achieve accurate results in biomedical text mining research, it remains difficult to evaluate them on established pipelines and integrate them within the daily workflows.</p><p><strong>Materials and methods: </strong>Here, we demonstrate how our machine learning model, GNorm2, which achieved state-of-the art performance on identifying genes and their corresponding species at the same time handling innate textual ambiguities, could be integrated with the established daily workflow at the NLM and evaluated for its performance in this new environment.</p><p><strong>Results: </strong>We worked with 8 biomedical curator experts and evaluated the integration using these parameters: (1) gene identification accuracy, (2) interannotator agreement with and without GNorm2, (3) GNorm2 potential bias, and (4) indexing consistency and efficiency. We identified key interface changes that significantly helped the curators to maximize the GNorm2 benefit, and further improved the GNorm2 algorithm to cover 135 species of genes including viral and bacterial genes, based on the biocurator expert survey.</p><p><strong>Conclusion: </strong>GNorm2 is currently in the process of being fully integrated into the regular curator's workflow.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooae129"},"PeriodicalIF":2.5,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11706533/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and validation of computable social phenotypes for health-related social needs. 与健康相关的社会需求的可计算社会表型的发展和验证。
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-01-07 eCollection Date: 2025-02-01 DOI: 10.1093/jamiaopen/ooae150
Megan E Gregory, Suranga N Kasthurirathne, Tanja Magoc, Cassidy McNamee, Christopher A Harle, Joshua R Vest

Objective: Measurement of health-related social needs (HRSNs) is complex. We sought to develop and validate computable phenotypes (CPs) using structured electronic health record (EHR) data for food insecurity, housing instability, financial insecurity, transportation barriers, and a composite-type measure of these, using human-defined rule-based and machine learning (ML) classifier approaches.

Materials and methods: We collected HRSN surveys as the reference standard and obtained EHR data from 1550 patients in 3 health systems from 2 states. We followed a Delphi-like approach to develop the human-defined rule-based CP. For the ML classifier approach, we trained supervised ML (XGBoost) models using 78 features. Using surveys as the reference standard, we calculated sensitivity, specificity, positive predictive values, and area under the curve (AUC). We compared AUCs using the Delong test and other performance measures using McNemar's test, and checked for differential performance.

Results: Most patients (63%) reported at least one HRSN on the reference standard survey. Human-defined rule-based CPs exhibited poor performance (AUCs=.52 to .68). ML classifier CPs performed significantly better, but still poor-to-fair (AUCs = .68 to .75). Significant differences for race/ethnicity were found for ML classifier CPs (higher AUCs for White non-Hispanic patients). Important features included number of encounters and Medicaid insurance.

Discussion: Using a supervised ML classifier approach, HRSN CPs approached thresholds of fair performance, but exhibited differential performance by race/ethnicity.

Conclusion: CPs may help to identify patients who may benefit from additional social needs screening. Future work should explore the use of area-level features via geospatial data and natural language processing to improve model performance.

目的:健康相关社会需求(HRSNs)的测量是复杂的。我们试图使用结构化电子健康记录(EHR)数据开发和验证可计算表型(CPs),用于食品不安全、住房不稳定、金融不安全、运输障碍,以及使用人类定义的基于规则和机器学习(ML)分类器方法对这些进行复合测量。材料与方法:以HRSN调查为参考标准,获取2个州3个卫生系统1550例患者的电子病历数据。我们采用了类似于delphi的方法来开发人类定义的基于规则的CP。对于机器学习分类器方法,我们使用78个特征训练有监督的机器学习(XGBoost)模型。以调查为参考标准,计算敏感性、特异性、阳性预测值和曲线下面积(AUC)。我们使用Delong测试比较了auc,使用McNemar测试比较了其他性能指标,并检查了差异性能。结果:大多数患者(63%)在参考标准调查中报告了至少一次HRSN。人类定义的基于规则的CPs表现出较差的性能(auc =。52到0.68)。ML分类器CPs的表现明显更好,但仍然差于公平(auc = 0.68至0.75)。ML分类器CPs的种族差异显著(白人非西班牙裔患者的auc较高)。重要的特征包括就诊次数和医疗补助保险。讨论:使用有监督的ML分类器方法,HRSN CPs接近公平表现的阈值,但表现出种族/民族的差异。结论:CPs可能有助于识别可能从额外的社会需求筛查中受益的患者。未来的工作应该探索通过地理空间数据和自然语言处理来使用区域级特征来提高模型性能。
{"title":"Development and validation of computable social phenotypes for health-related social needs.","authors":"Megan E Gregory, Suranga N Kasthurirathne, Tanja Magoc, Cassidy McNamee, Christopher A Harle, Joshua R Vest","doi":"10.1093/jamiaopen/ooae150","DOIUrl":"10.1093/jamiaopen/ooae150","url":null,"abstract":"<p><strong>Objective: </strong>Measurement of health-related social needs (HRSNs) is complex. We sought to develop and validate computable phenotypes (CPs) using structured electronic health record (EHR) data for food insecurity, housing instability, financial insecurity, transportation barriers, and a composite-type measure of these, using human-defined rule-based and machine learning (ML) classifier approaches.</p><p><strong>Materials and methods: </strong>We collected HRSN surveys as the reference standard and obtained EHR data from 1550 patients in 3 health systems from 2 states. We followed a Delphi-like approach to develop the human-defined rule-based CP. For the ML classifier approach, we trained supervised ML (XGBoost) models using 78 features. Using surveys as the reference standard, we calculated sensitivity, specificity, positive predictive values, and area under the curve (AUC). We compared AUCs using the Delong test and other performance measures using McNemar's test, and checked for differential performance.</p><p><strong>Results: </strong>Most patients (63%) reported at least one HRSN on the reference standard survey. Human-defined rule-based CPs exhibited poor performance (AUCs=.52 to .68). ML classifier CPs performed significantly better, but still poor-to-fair (AUCs = .68 to .75). Significant differences for race/ethnicity were found for ML classifier CPs (higher AUCs for White non-Hispanic patients). Important features included number of encounters and Medicaid insurance.</p><p><strong>Discussion: </strong>Using a supervised ML classifier approach, HRSN CPs approached thresholds of fair performance, but exhibited differential performance by race/ethnicity.</p><p><strong>Conclusion: </strong>CPs may help to identify patients who may benefit from additional social needs screening. Future work should explore the use of area-level features via geospatial data and natural language processing to improve model performance.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooae150"},"PeriodicalIF":2.5,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11706536/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-modal prediction of extracorporeal support-a resource intensive therapy, utilizing a large national database.
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-01-06 eCollection Date: 2025-02-01 DOI: 10.1093/jamiaopen/ooae158
Daoyi Zhu, Bing Xue, Neel Shah, Philip Richard Orrin Payne, Chenyang Lu, Ahmed Sameh Said

Objective: Extracorporeal membrane oxygenation (ECMO) is among the most resource-intensive therapies in critical care. The COVID-19 pandemic highlighted the lack of ECMO resource allocation tools. We aimed to develop a continuous ECMO risk prediction model to enhance patient triage and resource allocation.

Material and methods: We leveraged multimodal data from the National COVID Cohort Collaborative (N3C) to develop a hierarchical deep learning model, labeled "PreEMPT-ECMO" (Prediction, Early Monitoring, and Proactive Triage for ECMO) which integrates static and multi-granularity time series features to generate continuous predictions of ECMO utilization. Model performance was assessed across time points ranging from 0 to 96 hours prior to ECMO initiation, using both accuracy and precision metrics.

Results: Between January 2020 and May 2023, 101 400 patients were included, with 1298 (1.28%) supported on ECMO. PreEMPT-ECMO outperformed established predictive models, including Logistic Regression, Support Vector Machine, Random Forest, and Extreme Gradient Boosting Tree, in both accuracy and precision at all time points. Model interpretation analysis also highlighted variations in feature contributions through each patient's clinical course.

Discussion and conclusions: We developed a hierarchical model for continuous ECMO use prediction, utilizing a large multicenter dataset incorporating both static and time series variables of various granularities. This novel approach reflects the nuanced decision-making process inherent in ECMO initiation and has the potential to be used as an early alert tool to guide patient triage and ECMO resource allocation. Future directions include prospective validation and generalizability on non-COVID-19 refractory respiratory failure, aiming to improve patient outcomes.

{"title":"Multi-modal prediction of extracorporeal support-a resource intensive therapy, utilizing a large national database.","authors":"Daoyi Zhu, Bing Xue, Neel Shah, Philip Richard Orrin Payne, Chenyang Lu, Ahmed Sameh Said","doi":"10.1093/jamiaopen/ooae158","DOIUrl":"10.1093/jamiaopen/ooae158","url":null,"abstract":"<p><strong>Objective: </strong>Extracorporeal membrane oxygenation (ECMO) is among the most resource-intensive therapies in critical care. The COVID-19 pandemic highlighted the lack of ECMO resource allocation tools. We aimed to develop a continuous ECMO risk prediction model to enhance patient triage and resource allocation.</p><p><strong>Material and methods: </strong>We leveraged multimodal data from the National COVID Cohort Collaborative (N3C) to develop a hierarchical deep learning model, labeled \"PreEMPT-ECMO\" (Prediction, Early Monitoring, and Proactive Triage for ECMO) which integrates static and multi-granularity time series features to generate continuous predictions of ECMO utilization. Model performance was assessed across time points ranging from 0 to 96 hours prior to ECMO initiation, using both accuracy and precision metrics.</p><p><strong>Results: </strong>Between January 2020 and May 2023, 101 400 patients were included, with 1298 (1.28%) supported on ECMO. PreEMPT-ECMO outperformed established predictive models, including Logistic Regression, Support Vector Machine, Random Forest, and Extreme Gradient Boosting Tree, in both accuracy and precision at all time points. Model interpretation analysis also highlighted variations in feature contributions through each patient's clinical course.</p><p><strong>Discussion and conclusions: </strong>We developed a hierarchical model for continuous ECMO use prediction, utilizing a large multicenter dataset incorporating both static and time series variables of various granularities. This novel approach reflects the nuanced decision-making process inherent in ECMO initiation and has the potential to be used as an early alert tool to guide patient triage and ECMO resource allocation. Future directions include prospective validation and generalizability on non-COVID-19 refractory respiratory failure, aiming to improve patient outcomes.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooae158"},"PeriodicalIF":2.5,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11702361/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trustworthiness of a machine learning early warning model in medical and surgical inpatients.
IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-01-06 eCollection Date: 2025-02-01 DOI: 10.1093/jamiaopen/ooae156
Pedro J Caraballo, Anne M Meehan, Karen M Fischer, Parvez Rahman, Gyorgy J Simon, Genevieve B Melton, Hojjat Salehinejad, Bijan J Borah

Objectives: In the general hospital wards, machine learning (ML)-based early warning systems (EWSs) can identify patients at risk of deterioration to facilitate rescue interventions. We assess subpopulation performance of a ML-based EWS on medical and surgical adult patients admitted to general hospital wards.

Materials and methods: We assessed the scores of an EWS integrated into the electronic health record and calculated every 15 minutes to predict a composite adverse event (AE): all-cause mortality, transfer to intensive care, cardiac arrest, or rapid response team evaluation. The distributions of the First Score 3 hours after admission, the Highest Score at any time during the hospitalization, and the Last Score just before an AE or dismissal without an AE were calculated. The Last Score was used to calculate the area under the receiver operating characteristic curve (ROC-AUC) and the precision-recall curve (PRC-AUC).

Results: From August 23, 2021 to March 31, 2022, 35 937 medical admissions had 2173 (6.05%) AE compared to 25 214 surgical admissions with 4984 (19.77%) AE. Medical and surgical admissions had significant different (P <.001) distributions of the First Score, Highest Score, and Last Score among those with an AE and without an AE. The model performed better in the medical group when compared to the surgical group, ROC-AUC 0.869 versus 0.677, and RPC-AUC 0.988 versus 0.878, respectively.

Discussion: Heterogeneity of medical and surgical patients can significantly impact the performance of a ML-based EWS, changing the model validity and clinical discernment.

Conclusions: Characterization of the target patient subpopulations has clinical implications and should be considered when developing models to be used in general hospital wards.

{"title":"Trustworthiness of a machine learning early warning model in medical and surgical inpatients.","authors":"Pedro J Caraballo, Anne M Meehan, Karen M Fischer, Parvez Rahman, Gyorgy J Simon, Genevieve B Melton, Hojjat Salehinejad, Bijan J Borah","doi":"10.1093/jamiaopen/ooae156","DOIUrl":"10.1093/jamiaopen/ooae156","url":null,"abstract":"<p><strong>Objectives: </strong>In the general hospital wards, machine learning (ML)-based early warning systems (EWSs) can identify patients at risk of deterioration to facilitate rescue interventions. We assess subpopulation performance of a ML-based EWS on medical and surgical adult patients admitted to general hospital wards.</p><p><strong>Materials and methods: </strong>We assessed the scores of an EWS integrated into the electronic health record and calculated every 15 minutes to predict a composite adverse event (AE): all-cause mortality, transfer to intensive care, cardiac arrest, or rapid response team evaluation. The distributions of the First Score 3 hours after admission, the Highest Score at any time during the hospitalization, and the Last Score just before an AE or dismissal without an AE were calculated. The Last Score was used to calculate the area under the receiver operating characteristic curve (ROC-AUC) and the precision-recall curve (PRC-AUC).</p><p><strong>Results: </strong>From August 23, 2021 to March 31, 2022, 35 937 medical admissions had 2173 (6.05%) AE compared to 25 214 surgical admissions with 4984 (19.77%) AE. Medical and surgical admissions had significant different (<i>P</i> <.001) distributions of the First Score, Highest Score, and Last Score among those with an AE and without an AE. The model performed better in the medical group when compared to the surgical group, ROC-AUC 0.869 versus 0.677, and RPC-AUC 0.988 versus 0.878, respectively.</p><p><strong>Discussion: </strong>Heterogeneity of medical and surgical patients can significantly impact the performance of a ML-based EWS, changing the model validity and clinical discernment.</p><p><strong>Conclusions: </strong>Characterization of the target patient subpopulations has clinical implications and should be considered when developing models to be used in general hospital wards.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooae156"},"PeriodicalIF":2.5,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11702360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JAMIA Open
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1