首页 > 最新文献

JAMIA Open最新文献

英文 中文
Development of a risk factor framework to inform machine learning prediction of young people's mental health problems: a Delphi study. 开发一个风险因素框架,为年轻人心理健康问题的机器学习预测提供信息:德尔菲研究。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-23 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf166
Katherine Parkin, Ryan Crowley, Rachel Sippy, Shabina Hayat, Yi Zhang, Emily Brewis, Nicole Marshall, Tara Ramsay-Patel, Vahgisha Thirugnanasampanthan, Guy Skinner, Peter Fonagy, Carol Brayne, Anna Moore

Objectives: To create a theoretical framework of mental health risk factors to inform the development of prediction models for young people's mental health problems.

Materials and methods: We created an initial prototype theoretical framework using a rapid literature search and stakeholder discussion. A snowball sampling approach identified experts for the Delphi study. Round 1 sought consensus on the overall approach, framework domains, and life course stages. Round 2 aimed to establish the points in the life course where exposure to specific risk factors would be most influential. Round 3 ranked risk factors within domains by their predictive importance for young people's mental health problems.

Results: The final framework reached consensus after 3 rounds and included 287 risk factors across 8 domains and 5 life course stages. Twenty-five experts completed round 3. Domains ranked as most important were "Social and Environmental" and "Psychological and Mental Health." Ranked lists of risk factors within domains and heat maps showing the salience of risk factors across life course stages were generated.

Discussion: The study integrated multidisciplinary expert perspectives and prioritized health equity throughout the framework's development. The ranked risk factor lists and life stage heat maps support the targeted inclusion of risk factors across developmental stages in prediction models.

Conclusion: This theoretical framework provides a roadmap of important risk factors for inclusion in early identification models to enhance the predictive accuracy of childhood mental health problems. It offers a useful theoretical reference point to support model building for those without domain expertise.

目的:建立青少年心理健康危险因素的理论框架,为青少年心理健康问题预测模型的建立提供理论依据。材料和方法:我们使用快速文献检索和利益相关者讨论创建了一个初始原型理论框架。雪球抽样方法确定了德尔菲研究的专家。第一轮在总体方法、框架领域和生命历程阶段上寻求共识。第二轮旨在确定生命历程中暴露于特定风险因素影响最大的时间点。第三轮根据对年轻人心理健康问题的预测重要性对各领域的风险因素进行排名。结果:最终框架经过3轮协商达成共识,包括8个领域、5个生命历程阶段的287个危险因素。25位专家完成了第三轮。排名最重要的领域是“社会与环境”和“心理与精神健康”。生成了领域内风险因素的排名列表和热图,显示了生命过程中各个阶段风险因素的显著性。讨论:该研究综合了多学科专家的观点,并在整个框架的发展过程中优先考虑卫生公平。风险因素排名表和生命阶段热图支持在预测模型中有针对性地包括各发育阶段的风险因素。结论:该理论框架为将重要危险因素纳入早期识别模型提供了路线图,以提高儿童心理健康问题的预测准确性。它为那些没有领域专业知识的人提供了一个有用的理论参考点来支持模型的构建。
{"title":"Development of a risk factor framework to inform machine learning prediction of young people's mental health problems: a Delphi study.","authors":"Katherine Parkin, Ryan Crowley, Rachel Sippy, Shabina Hayat, Yi Zhang, Emily Brewis, Nicole Marshall, Tara Ramsay-Patel, Vahgisha Thirugnanasampanthan, Guy Skinner, Peter Fonagy, Carol Brayne, Anna Moore","doi":"10.1093/jamiaopen/ooaf166","DOIUrl":"10.1093/jamiaopen/ooaf166","url":null,"abstract":"<p><strong>Objectives: </strong>To create a theoretical framework of mental health risk factors to inform the development of prediction models for young people's mental health problems.</p><p><strong>Materials and methods: </strong>We created an initial prototype theoretical framework using a rapid literature search and stakeholder discussion. A snowball sampling approach identified experts for the Delphi study. Round 1 sought consensus on the overall approach, framework domains, and life course stages. Round 2 aimed to establish the points in the life course where exposure to specific risk factors would be most influential. Round 3 ranked risk factors within domains by their predictive importance for young people's mental health problems.</p><p><strong>Results: </strong>The final framework reached consensus after 3 rounds and included 287 risk factors across 8 domains and 5 life course stages. Twenty-five experts completed round 3. Domains ranked as most important were \"Social and Environmental\" and \"Psychological and Mental Health.\" Ranked lists of risk factors within domains and heat maps showing the salience of risk factors across life course stages were generated.</p><p><strong>Discussion: </strong>The study integrated multidisciplinary expert perspectives and prioritized health equity throughout the framework's development. The ranked risk factor lists and life stage heat maps support the targeted inclusion of risk factors across developmental stages in prediction models.</p><p><strong>Conclusion: </strong>This theoretical framework provides a roadmap of important risk factors for inclusion in early identification models to enhance the predictive accuracy of childhood mental health problems. It offers a useful theoretical reference point to support model building for those without domain expertise.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf166"},"PeriodicalIF":3.4,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12726920/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145828723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Higher electronic health record burden among women physicians in academic ambulatory medicine. 学术门诊女医生电子病历负担加重
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-17 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf164
Sarah Y Bessen, Sean Tackett, Kimberly S Peairs, Lisa Christopher-Stine, Charles M Stewart, Lee D Biddison, Maria Oliva-Hemker, Jennifer K Lee

Objectives: Electronic health record (EHR) work may differently affect women and men physicians. Identifying gender discrepancies in EHR work across different specialties may inform strategies to reduce EHR burdens.

Materials and methods: We retrospectively evaluated EHR use by ambulatory physicians in 4 specialties (2 procedural [cardiology and gastroenterology] and 2 nonprocedural [internal medicine and rheumatology]) during 1 year at a large academic medical institution. Gender differences in EHR and clinical workload across specialties were evaluated by analysis of variance. Mixed-effects linear regression models analyzed gender differences in EHR workload controlling for specialty. Significant differences were additionally examined by stratifying procedural and nonprocedural specialties.

Results: Clinical and EHR workload varied across specialties (P <.05), though scheduled clinical workload did not differ by gender. Controlling for specialty, women physicians spent more time per appointment on In Basket messages (P =.001), sent more Secure Chat messages per appointment (P =.003), and spent more time in the EHR outside 7:00 AM-7:00 PM (P <.001) than men. Gender differences in messaging were concentrated among the procedural physicians. Women procedural physicians spent more time on In Basket messages (P <.001) and sent more Secure Chat messages (P =.007) than men, whereas these differences did not occur among nonprocedural physicians.

Discussion: Women physicians had greater EHR burdens despite similar scheduled clinical workloads as men. The greater messaging workload predominantly affected women procedural physicians.

Conclusion: Gender disparities in EHR burden in ambulatory specialties vary between procedural and nonprocedural fields. Future research is needed to mitigate gender inequity in EHR workloads.

目的:电子健康记录(EHR)工作对女性和男性医生的影响可能不同。确定不同专业电子病历工作中的性别差异,可以为减轻电子病历负担的策略提供信息。材料和方法:我们回顾性评估了一家大型学术医疗机构4个专业(2个程序性[心脏病学和胃肠病学]和2个非程序性[内科和风湿病学])的门诊医生在1年内使用电子病历的情况。通过方差分析评估各专科在电子病历和临床工作量方面的性别差异。混合效应线性回归模型分析了性别在电子病历工作量控制方面的差异。此外,通过对程序性和非程序性专业进行分层来检验显著差异。结果:临床和电子病历工作量在不同专业之间存在差异(P =.001),每次预约发送更多的安全聊天信息(P =.003),并且在上午7点至下午7点以外花费更多的时间(P =.007),而这些差异在非程序性医生中没有发生。讨论:尽管计划的临床工作量与男性相似,但女性医生的电子病历负担更大。更大的信息传递工作量主要影响到妇女手术医生。结论:门诊专科电子病历负担的性别差异在程序性和非程序性领域存在差异。未来的研究需要减轻电子病历工作量中的性别不平等。
{"title":"Higher electronic health record burden among women physicians in academic ambulatory medicine.","authors":"Sarah Y Bessen, Sean Tackett, Kimberly S Peairs, Lisa Christopher-Stine, Charles M Stewart, Lee D Biddison, Maria Oliva-Hemker, Jennifer K Lee","doi":"10.1093/jamiaopen/ooaf164","DOIUrl":"10.1093/jamiaopen/ooaf164","url":null,"abstract":"<p><strong>Objectives: </strong>Electronic health record (EHR) work may differently affect women and men physicians. Identifying gender discrepancies in EHR work across different specialties may inform strategies to reduce EHR burdens.</p><p><strong>Materials and methods: </strong>We retrospectively evaluated EHR use by ambulatory physicians in 4 specialties (2 procedural [cardiology and gastroenterology] and 2 nonprocedural [internal medicine and rheumatology]) during 1 year at a large academic medical institution. Gender differences in EHR and clinical workload across specialties were evaluated by analysis of variance. Mixed-effects linear regression models analyzed gender differences in EHR workload controlling for specialty. Significant differences were additionally examined by stratifying procedural and nonprocedural specialties.</p><p><strong>Results: </strong>Clinical and EHR workload varied across specialties (<i>P</i> <.05), though scheduled clinical workload did not differ by gender. Controlling for specialty, women physicians spent more time per appointment on In Basket messages (<i>P</i> =.001), sent more Secure Chat messages per appointment (<i>P</i> =.003), and spent more time in the EHR outside 7:00 AM-7:00 PM (<i>P</i> <.001) than men. Gender differences in messaging were concentrated among the procedural physicians. Women procedural physicians spent more time on In Basket messages (<i>P</i> <.001) and sent more Secure Chat messages (<i>P</i> =.007) than men, whereas these differences did not occur among nonprocedural physicians.</p><p><strong>Discussion: </strong>Women physicians had greater EHR burdens despite similar scheduled clinical workloads as men. The greater messaging workload predominantly affected women procedural physicians.</p><p><strong>Conclusion: </strong>Gender disparities in EHR burden in ambulatory specialties vary between procedural and nonprocedural fields. Future research is needed to mitigate gender inequity in EHR workloads.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf164"},"PeriodicalIF":3.4,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715314/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145805614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring common data model coverage of nursing flowsheet data: a pilot study using SNOMED CT and LOINC mapping. 探索护理流程数据的通用数据模型覆盖范围:使用SNOMED CT和LOINC映射的试点研究。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-14 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf168
Robin Austin, Malin Britt Lalich, Katy Stewart, Jonna Zarbano, Matthew Byrne, Melissa D Pinto, Elizabeth E Umberfield

Objectives: The primary objective of this research is to assess the content coverage of nursing data within a publicly available common data model (CDM), focusing on how nursing data, documented in flowsheets, are represented within the model.

Materials and methods: This mapping study was informed by previous evaluation studies and serves as a framework for evaluating information resources, including to guide development and implementation. The overall research process consists of 4 steps: (1) identify a CDM; (2) define evaluation criteria; (3) map nursing flowsheet data; and (4) apply evaluation criteria.

Results: Overall, 65.5% (n = 1170) of the flowsheet concepts were mapped to Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and Logical Observation Identifiers Names and Codes (LOINC) target codes and 56.0% (n = 1831) of the flowsheet values were mapped to SNOMED CT and LOINC target codes. The flowsheet concepts had a higher average mapping time per concept/reviewer (1.19 min) as compared to the average mapping time per value/reviewer (0.64 min).

Discussion: This mapping study demonstrated the progress and ongoing challenges of mapping nursing data to a national common data model. However, the ability to use nursing data at scale in a national CDM remains limited until more comprehensive mapping is completed.

Conclusion: This mapping study identifies a significant gap in integrating nursing data into a national common data model, highlighting an opportunity to enhance patient care through improved real-time insights and evidence-based nursing practices. Addressing this gap can help shape policies that prioritize the inclusion of nursing data. Additionally, aligning nursing data at scale can advance research, increase efficiency, and optimize nurse-sensitive patient outcomes.

目的:本研究的主要目的是评估公共数据模型(CDM)中护理数据的内容覆盖范围,重点关注以流程图记录的护理数据如何在模型中表示。材料和方法:这项测绘研究是根据以前的评估研究得出的,并作为评估信息资源的框架,包括指导开发和实施。整个研究过程包括4个步骤:(1)确定清洁发展机制;(2)确定评价标准;(3)绘制护理流程数据;(4)应用评价标准。结果:总体而言,65.5% (n = 1170)的流程图概念被映射到《医学临床术语系统化命名法》(SNOMED CT)和《逻辑观察标识名称与代码》(LOINC)目标代码中,56.0% (n = 1831)的流程图值被映射到SNOMED CT和LOINC目标代码中。与每个值/审阅者的平均映射时间(0.64分钟)相比,流程图概念具有更高的每个概念/审阅者的平均映射时间(1.19分钟)。讨论:该测绘研究展示了将护理数据映射到国家通用数据模型的进展和持续挑战。然而,在更全面的绘图完成之前,在国家清洁发展机制中大规模使用护理数据的能力仍然有限。结论:该测绘研究确定了将护理数据整合到国家通用数据模型中的重大差距,强调了通过改进实时洞察和循证护理实践来加强患者护理的机会。解决这一差距有助于制定优先纳入护理数据的政策。此外,大规模调整护理数据可以推进研究,提高效率,并优化护士敏感的患者结果。
{"title":"Exploring common data model coverage of nursing flowsheet data: a pilot study using SNOMED CT and LOINC mapping.","authors":"Robin Austin, Malin Britt Lalich, Katy Stewart, Jonna Zarbano, Matthew Byrne, Melissa D Pinto, Elizabeth E Umberfield","doi":"10.1093/jamiaopen/ooaf168","DOIUrl":"10.1093/jamiaopen/ooaf168","url":null,"abstract":"<p><strong>Objectives: </strong>The primary objective of this research is to assess the content coverage of nursing data within a publicly available common data model (CDM), focusing on how nursing data, documented in flowsheets, are represented within the model.</p><p><strong>Materials and methods: </strong>This mapping study was informed by previous evaluation studies and serves as a framework for evaluating information resources, including to guide development and implementation. The overall research process consists of 4 steps: (1) identify a CDM; (2) define evaluation criteria; (3) map nursing flowsheet data; and (4) apply evaluation criteria.</p><p><strong>Results: </strong>Overall, 65.5% (<i>n</i> = 1170) of the flowsheet concepts were mapped to Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and Logical Observation Identifiers Names and Codes (LOINC) target codes and 56.0% (<i>n</i> = 1831) of the flowsheet values were mapped to SNOMED CT and LOINC target codes. The flowsheet concepts had a higher average mapping time per concept/reviewer (1.19 min) as compared to the average mapping time per value/reviewer (0.64 min).</p><p><strong>Discussion: </strong>This mapping study demonstrated the progress and ongoing challenges of mapping nursing data to a national common data model. However, the ability to use nursing data at scale in a national CDM remains limited until more comprehensive mapping is completed.</p><p><strong>Conclusion: </strong>This mapping study identifies a significant gap in integrating nursing data into a national common data model, highlighting an opportunity to enhance patient care through improved real-time insights and evidence-based nursing practices. Addressing this gap can help shape policies that prioritize the inclusion of nursing data. Additionally, aligning nursing data at scale can advance research, increase efficiency, and optimize nurse-sensitive patient outcomes.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf168"},"PeriodicalIF":3.4,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701890/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145763949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Response to survey directed to patient portal members differs by age, race, and healthcare utilization. 更正:针对患者门户网站成员的调查结果因年龄、种族和医疗保健利用情况而异。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-10 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf124

[This corrects the article DOI: 10.1093/jamiaopen/ooz061.].

[这更正了文章DOI: 10.1093/jamiaopen/ooz061.]。
{"title":"Correction to: Response to survey directed to patient portal members differs by age, race, and healthcare utilization.","authors":"","doi":"10.1093/jamiaopen/ooaf124","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooaf124","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.1093/jamiaopen/ooz061.].</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf124"},"PeriodicalIF":3.4,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12706857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated classification of exposure and encourage events in speech data from pediatric OCD treatment. 儿童强迫症治疗的语音数据中暴露和鼓励事件的自动分类。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-09 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf127
Juan Antonio Lossio-Ventura, Samuel Frank, Grace Ringlein, Kirsten Bonson, Ardyn Olszko, Abbey Knobel, Daniel S Pine, Jennifer B Freeman, Kristen Benito, David C Jangraw, Francisco Pereira

Objective: To develop and evaluate an automated classification system for labeling Exposure Process Coding System (EPCS) quality codes-specifically exposure and encourage events-during in-person exposure therapy sessions using automatic speech recognition (ASR) and natural language processing techniques.

Materials and methods: The system was trained and tested on 360 manually labeled pediatric Obsessive-Compulsive Disorder (OCD) therapy sessions from 3 clinical trials. Audio recordings were transcribed using ASR tools (OpenAI's Whisper and Google Speech-to-Text). Transcription accuracy was evaluated via word error rate (WER) on manual transcriptions of 2-minute audio segments compared against ASR-generated transcripts. The resulting text was analyzed with transformer-based models, including Bidirectional Encoder Representations from Transformers (BERT), Sentence-BERT, and Meta Llama 3. Models were trained to predict EPCS codes in 2 classification settings: sequence-level classification, where events are labeled in delimited text chunks, and token-level classification, where event boundaries are unknown. Classification was performed either with fine-tuned transformer-based models, or with logistic regression on embeddings produced by each model.

Results: With respect to transcription accuracy, Whisper outperformed Google Speech-to-Text with a lower WER (0.31 vs 0.51). For sequence classification setting, Llama 3 models achieved high performance with area under the ROC curve (AUC) scores of 0.95 for exposures and 0.75 for encourage events, outperforming traditional methods and standard BERT models. In the token-level setting, fine-tuned BERT models performed best, achieving AUC scores of 0.85 for exposures and 0.75 for encourage events.

Discussion and conclusion: Current ASR and transformer-based models enable automated quality coding of in-person exposure therapy sessions. These findings demonstrate potential for real-time assessment in clinical practice and scalable research on effective therapy methods. Future work should focus on optimization, including improvements in ASR accuracy, expanding training datasets, and multimodal data integration.

目的:利用自动语音识别(ASR)和自然语言处理技术,开发和评估一种用于标记暴露过程编码系统(EPCS)质量代码(特别是暴露和鼓励事件)的自动分类系统。材料与方法:对该系统进行了3个临床试验的360个手动标记的儿童强迫症(OCD)治疗疗程的训练和测试。使用ASR工具(OpenAI的Whisper和谷歌Speech-to-Text)转录音频记录。转录准确性通过人工转录2分钟音频片段的单词错误率(WER)与asr生成的转录进行比较。结果文本使用基于变压器的模型进行分析,包括来自变压器的双向编码器表示(BERT)、句子-BERT和Meta Llama 3。训练模型在两种分类设置下预测EPCS代码:序列级分类,其中事件在分隔的文本块中标记,以及标记级分类,其中事件边界未知。通过微调变压器模型或对每个模型产生的嵌入进行逻辑回归进行分类。结果:在转录准确性方面,Whisper优于谷歌Speech-to-Text, WER较低(0.31 vs 0.51)。在序列分类设置方面,Llama 3模型的ROC曲线下面积(area under ROC curve, AUC)得分在曝光和鼓励事件下分别为0.95和0.75,优于传统方法和标准BERT模型。在令牌级别设置中,微调BERT模型表现最佳,暴露的AUC得分为0.85,鼓励事件的AUC得分为0.75。讨论和结论:当前的ASR和基于变压器的模型能够实现面对面暴露治疗过程的自动质量编码。这些发现显示了在临床实践和有效治疗方法的可扩展研究中进行实时评估的潜力。未来的工作应侧重于优化,包括提高ASR的准确性、扩展训练数据集和多模态数据集成。
{"title":"Automated classification of exposure and encourage events in speech data from pediatric OCD treatment.","authors":"Juan Antonio Lossio-Ventura, Samuel Frank, Grace Ringlein, Kirsten Bonson, Ardyn Olszko, Abbey Knobel, Daniel S Pine, Jennifer B Freeman, Kristen Benito, David C Jangraw, Francisco Pereira","doi":"10.1093/jamiaopen/ooaf127","DOIUrl":"10.1093/jamiaopen/ooaf127","url":null,"abstract":"<p><strong>Objective: </strong>To develop and evaluate an automated classification system for labeling Exposure Process Coding System (EPCS) quality codes-specifically exposure and encourage events-during in-person exposure therapy sessions using automatic speech recognition (ASR) and natural language processing techniques.</p><p><strong>Materials and methods: </strong>The system was trained and tested on 360 manually labeled pediatric Obsessive-Compulsive Disorder (OCD) therapy sessions from 3 clinical trials. Audio recordings were transcribed using ASR tools (OpenAI's Whisper and Google Speech-to-Text). Transcription accuracy was evaluated via word error rate (WER) on manual transcriptions of 2-minute audio segments compared against ASR-generated transcripts. The resulting text was analyzed with transformer-based models, including Bidirectional Encoder Representations from Transformers (BERT), Sentence-BERT, and Meta Llama 3. Models were trained to predict EPCS codes in 2 classification settings: sequence-level classification, where events are labeled in delimited text chunks, and token-level classification, where event boundaries are unknown. Classification was performed either with fine-tuned transformer-based models, or with logistic regression on embeddings produced by each model.</p><p><strong>Results: </strong>With respect to transcription accuracy, Whisper outperformed Google Speech-to-Text with a lower WER (0.31 vs 0.51). For sequence classification setting, Llama 3 models achieved high performance with area under the ROC curve (AUC) scores of 0.95 for exposures and 0.75 for encourage events, outperforming traditional methods and standard BERT models. In the token-level setting, fine-tuned BERT models performed best, achieving AUC scores of 0.85 for exposures and 0.75 for encourage events.</p><p><strong>Discussion and conclusion: </strong>Current ASR and transformer-based models enable automated quality coding of in-person exposure therapy sessions. These findings demonstrate potential for real-time assessment in clinical practice and scalable research on effective therapy methods. Future work should focus on optimization, including improvements in ASR accuracy, expanding training datasets, and multimodal data integration.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf127"},"PeriodicalIF":3.4,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696644/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145757883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Utilizing natural language processing to identify cancer-relevant publications at a National Cancer Institute-designated cancer center. 利用自然语言处理在国家癌症研究所指定的癌症中心识别癌症相关出版物。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-09 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf156
Whitney Shae, Md Saiful Islam Saif, John Fife, Dinesh Pal Mudaranthakam, Dong Pei, Lisa Harlan-Williams, Jeffrey A Thompson, Devin C Koestler

Objectives: The objective of this study was to develop and test natural language processing (NLP) methods for screening and, ultimately, predicting the cancer relevance of peer-reviewed publications.

Materials and methods: Two datasets were used: (1) manually curated publications labeled for cancer relevance, co-authored by members of The University of Kansas Cancer Center (KUCC) and (2) a derived dataset containing cancer-related abstracts from American Association for Cancer Research journals and noncancer-related abstracts from other medical journals. Two text encoding methods were explored: term frequency-inverse document frequency (TF-IDF) vectorization and various BERT embeddings. These representations served as inputs to 3 supervised machine learning classifiers: Support Vector Classification (SVC), Gradient Boosting Classification, and Multilayer Perceptron (MLP) neural networks. Model performance was evaluated by comparing predictions to the "true" cancer-relevant labels in a withheld test set.

Results: All machine learning models performed best when trained and tested within the derived dataset. Across the datasets, SVC and MLP both exhibited strong performance, with F1 scores as high as 0.976 and 0.997, respectively. BioBERT embeddings resulted in slightly higher metrics when compared to TF-IDF vectorization across most models.

Discussion: Models trained on the derived data performed very well internally; however, weaker performance was noted when these models were tested on the KUCC dataset. This finding highlights the subjective nature of cancer-relevant determinations. In contrast, KUCC trained models had high predictive performance when tested on the derived-specific classifications, showing that models trained on the KUCC dataset may be suitable for wider cancer-relevant prediction.

Conclusions: Overall, our results suggest that NLP can effectively automate the classification of cancer-relevant publications, enhancing research productivity tracking; however, great care should be taken in selecting the appropriate data, text representation approach, and machine learning approach.

目的:本研究的目的是开发和测试用于筛选和最终预测同行评审出版物的癌症相关性的自然语言处理(NLP)方法。材料和方法:使用了两个数据集:(1)由堪萨斯大学癌症中心(KUCC)成员共同撰写的标记为癌症相关的人工整理出版物;(2)包含来自美国癌症研究协会期刊的癌症相关摘要和来自其他医学期刊的非癌症相关摘要的衍生数据集。研究了两种文本编码方法:词频逆文档频率(TF-IDF)矢量化和各种BERT嵌入。这些表示作为3个监督机器学习分类器的输入:支持向量分类(SVC),梯度增强分类和多层感知器(MLP)神经网络。通过将预测结果与保留测试集中的“真实”癌症相关标签进行比较,来评估模型的性能。结果:所有机器学习模型在派生数据集中进行训练和测试时表现最佳。在所有数据集中,SVC和MLP均表现出较强的性能,F1得分分别高达0.976和0.997。在大多数模型中,与TF-IDF矢量化相比,BioBERT嵌入的指标略高。讨论:在导出数据上训练的模型在内部表现非常好;然而,当这些模型在KUCC数据集上进行测试时,发现性能较差。这一发现强调了癌症相关决定的主观性。相比之下,在对衍生的特定分类进行测试时,KUCC训练的模型具有较高的预测性能,这表明在KUCC数据集上训练的模型可能适用于更广泛的癌症相关预测。结论:总体而言,我们的研究结果表明,NLP可以有效地自动化癌症相关出版物的分类,增强研究生产力跟踪;然而,在选择合适的数据、文本表示方法和机器学习方法时应该非常小心。
{"title":"Utilizing natural language processing to identify cancer-relevant publications at a National Cancer Institute-designated cancer center.","authors":"Whitney Shae, Md Saiful Islam Saif, John Fife, Dinesh Pal Mudaranthakam, Dong Pei, Lisa Harlan-Williams, Jeffrey A Thompson, Devin C Koestler","doi":"10.1093/jamiaopen/ooaf156","DOIUrl":"10.1093/jamiaopen/ooaf156","url":null,"abstract":"<p><strong>Objectives: </strong>The objective of this study was to develop and test natural language processing (NLP) methods for screening and, ultimately, predicting the cancer relevance of peer-reviewed publications.</p><p><strong>Materials and methods: </strong>Two datasets were used: (1) manually curated publications labeled for cancer relevance, co-authored by members of The University of Kansas Cancer Center (KUCC) and (2) a derived dataset containing cancer-related abstracts from American Association for Cancer Research journals and noncancer-related abstracts from other medical journals. Two text encoding methods were explored: term frequency-inverse document frequency (TF-IDF) vectorization and various BERT embeddings. These representations served as inputs to 3 supervised machine learning classifiers: Support Vector Classification (SVC), Gradient Boosting Classification, and Multilayer Perceptron (MLP) neural networks. Model performance was evaluated by comparing predictions to the \"true\" cancer-relevant labels in a withheld test set.</p><p><strong>Results: </strong>All machine learning models performed best when trained and tested within the derived dataset. Across the datasets, SVC and MLP both exhibited strong performance, with F1 scores as high as 0.976 and 0.997, respectively. BioBERT embeddings resulted in slightly higher metrics when compared to TF-IDF vectorization across most models.</p><p><strong>Discussion: </strong>Models trained on the derived data performed very well internally; however, weaker performance was noted when these models were tested on the KUCC dataset. This finding highlights the subjective nature of cancer-relevant determinations. In contrast, KUCC trained models had high predictive performance when tested on the derived-specific classifications, showing that models trained on the KUCC dataset may be suitable for wider cancer-relevant prediction.</p><p><strong>Conclusions: </strong>Overall, our results suggest that NLP can effectively automate the classification of cancer-relevant publications, enhancing research productivity tracking; however, great care should be taken in selecting the appropriate data, text representation approach, and machine learning approach.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf156"},"PeriodicalIF":3.4,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696645/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145757887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal feature analysis for automated neonatal jaundice assessment using machine learning. 使用机器学习进行新生儿黄疸自动评估的多模态特征分析。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-06 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf165
Yunfeng Liang, Lin Zou, Millie Ming Rong Goh, Alvin Jia Hao Ngeow, Ngiap Chuan Tan, Andy Wee An Ta, Han Leong Goh

Objective: Neonatal jaundice monitoring is resource-intensive. Existing artificial intelligence methods use image or clinical data, but none systematically combine both or compare feature contributions. This study fills that gap by extracting and analyzing multimodal features on a large dataset, identifying an optimal feature set for accurate, accessible jaundice assessment.

Materials and methods: This study collected clinical data and skin images from 3 body regions of 633 neonates, generating 460 features across 4 categories. Four tree-based models were used to predict total serum bilirubin levels and feature importance analysis guided the selection of an optimal feature set.

Results: The optimal performance was achieved using the Light Gradient Boosting Machine (LGBM) model with 140 selected features, yielding a root mean square error (RMSE) of 2.0477 mg/dL and a Pearson correlation of 0.8435. This represents a performance gain of over 10% in RMSE compared to models using only a single data modality. Moreover, selecting the top 30 features based on SHapley Additive exPlanation (SHAP) allows for a substantial reduction in data dimensionality, while maintaining performance within 5% of the optimal model.

Discussion: Color features contributed over 60% of the total importance, with clinical data adding more than 25%, led by hour of life. Light temperature also affected predictions, while texture features had minimal impact. Among body regions, the abdomen provided the most informative signals for jaundice severity.

Conclusion: The proposed algorithm shows promise for real-world use by enabling timely, automated jaundice assessment for families, while also offering insights for future research and broader medical applications.

目的:新生儿黄疸监测是资源密集型的。现有的人工智能方法使用图像或临床数据,但没有系统地结合两者或比较特征贡献。本研究通过在大型数据集上提取和分析多模态特征来填补这一空白,为准确、可访问的黄疸评估确定最佳特征集。材料与方法:本研究收集633例新生儿3个身体区域的临床资料和皮肤图像,生成4类460个特征。四种基于树的模型用于预测血清总胆红素水平,特征重要性分析指导了最佳特征集的选择。结果:选择140个特征的光梯度增强机(LGBM)模型获得最佳性能,其均方根误差(RMSE)为2.0477 mg/dL, Pearson相关系数为0.8435。这表示与仅使用单一数据模式的模型相比,RMSE的性能提高了10%以上。此外,基于SHapley加性解释(SHAP)选择前30个特征,可以大幅降低数据维度,同时将性能保持在最优模型的5%以内。讨论:颜色特征占总重要性的60%以上,临床数据占25%以上,以生命小时为单位。光照温度也会影响预测,而纹理特征的影响最小。在身体区域中,腹部提供了黄疸严重程度的最信息信号。结论:该算法有望在现实世界中使用,为家庭提供及时、自动的黄疸评估,同时也为未来的研究和更广泛的医疗应用提供了见解。
{"title":"Multimodal feature analysis for automated neonatal jaundice assessment using machine learning.","authors":"Yunfeng Liang, Lin Zou, Millie Ming Rong Goh, Alvin Jia Hao Ngeow, Ngiap Chuan Tan, Andy Wee An Ta, Han Leong Goh","doi":"10.1093/jamiaopen/ooaf165","DOIUrl":"10.1093/jamiaopen/ooaf165","url":null,"abstract":"<p><strong>Objective: </strong>Neonatal jaundice monitoring is resource-intensive. Existing artificial intelligence methods use image or clinical data, but none systematically combine both or compare feature contributions. This study fills that gap by extracting and analyzing multimodal features on a large dataset, identifying an optimal feature set for accurate, accessible jaundice assessment.</p><p><strong>Materials and methods: </strong>This study collected clinical data and skin images from 3 body regions of 633 neonates, generating 460 features across 4 categories. Four tree-based models were used to predict total serum bilirubin levels and feature importance analysis guided the selection of an optimal feature set.</p><p><strong>Results: </strong>The optimal performance was achieved using the Light Gradient Boosting Machine (LGBM) model with 140 selected features, yielding a root mean square error (RMSE) of 2.0477 mg/dL and a Pearson correlation of 0.8435. This represents a performance gain of over 10% in RMSE compared to models using only a single data modality. Moreover, selecting the top 30 features based on SHapley Additive exPlanation (SHAP) allows for a substantial reduction in data dimensionality, while maintaining performance within 5% of the optimal model.</p><p><strong>Discussion: </strong>Color features contributed over 60% of the total importance, with clinical data adding more than 25%, led by hour of life. Light temperature also affected predictions, while texture features had minimal impact. Among body regions, the abdomen provided the most informative signals for jaundice severity.</p><p><strong>Conclusion: </strong>The proposed algorithm shows promise for real-world use by enabling timely, automated jaundice assessment for families, while also offering insights for future research and broader medical applications.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf165"},"PeriodicalIF":3.4,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12687590/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145726113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using machine learning algorithms to optimize treatment with high-cost biologics in a national cohort of patients with inflammatory bowel disease. 使用机器学习算法在全国炎症性肠病患者队列中优化高成本生物制剂治疗。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-03 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf162
Jason K Hou, Tiffany M Tang, Shubhada Sansgiry, Tony Van, Peter A Richardson, Codey Pham, Francesca Cunningham, Jessica A Baker, Ji Zhu, Akbar K Waljee

Objectives: Prediction models using statistical or machine learning (ML) approaches can enhance clinical decision support tools. Infliximab (IFX), a biologic with a newly introduced biosimilar for Crohn's disease (CD) and ulcerative colitis (UC), presents an opportunity to evaluate these tools at time of biosimilar switch to predict disease flares. This study sought to evaluate real-world safety and effectiveness of nonmedical IFX biosimilar switch in a national US cohort of CD and UC patients, and to develop and compare interpretable models for predicting adverse clinical events among patients on maintenance IFX.

Materials and methods: This retrospective cohort study used administrative and clinical data from the National Veterans Health Administration Corporate Data Warehouse. It included 2529 Veterans with CD or UC on maintenance IFX (2017-2020), either continuing originator IFX or switching to a biosimilar. The primary outcome was disease-related flare. Classification and survival models were developed using traditional and ML methods and assessed via receiver operating characteristic curve, precision-recall curve, and decision curve analysis.

Results: In 2529 Veterans with CD or UC, biosimilar switch had low predictive importance across survival models. Objective laboratory-related information yielded the highest validation. Random forest+ (RF+) outperformed all other statistical and ML models. Prior flares and total health-care encounters were the 2 most important predictors, while hemoglobin was the top laboratory predictor.

Conclusions: Prediction models, particularly RF+, may aid in optimizing biologic therapy for CD and UC by identifying patients at higher risk of flare following a biosimilar switch.

目的:使用统计或机器学习(ML)方法的预测模型可以增强临床决策支持工具。英夫利昔单抗(IFX)是一种新推出的用于克罗恩病(CD)和溃疡性结肠炎(UC)的生物仿制药,为在生物仿制药切换时评估这些工具以预测疾病发作提供了机会。本研究旨在评估非医疗IFX生物仿制药在美国CD和UC患者队列中的实际安全性和有效性,并开发和比较可解释的模型,以预测维持IFX患者的不良临床事件。材料和方法:这项回顾性队列研究使用了来自国家退伍军人健康管理局公司数据仓库的行政和临床数据。该研究包括2529名患有CD或UC的退伍军人,在维持IFX(2017-2020)期间,要么继续原研IFX,要么切换到生物仿制药。主要终点是疾病相关的爆发。采用传统方法和机器学习方法建立分类和生存模型,并通过受试者工作特征曲线、精确召回率曲线和决策曲线分析进行评估。结果:在2529名患有CD或UC的退伍军人中,生物仿制药切换在生存模型中的预测重要性较低。客观的实验室相关信息产生了最高的有效性。随机森林+ (RF+)优于所有其他统计和ML模型。先前的耀斑和总就诊次数是两个最重要的预测因子,而血红蛋白是最重要的实验室预测因子。结论:预测模型,特别是RF+,可能有助于优化CD和UC的生物治疗,通过识别生物仿制药切换后爆发风险较高的患者。
{"title":"Using machine learning algorithms to optimize treatment with high-cost biologics in a national cohort of patients with inflammatory bowel disease.","authors":"Jason K Hou, Tiffany M Tang, Shubhada Sansgiry, Tony Van, Peter A Richardson, Codey Pham, Francesca Cunningham, Jessica A Baker, Ji Zhu, Akbar K Waljee","doi":"10.1093/jamiaopen/ooaf162","DOIUrl":"10.1093/jamiaopen/ooaf162","url":null,"abstract":"<p><strong>Objectives: </strong>Prediction models using statistical or machine learning (ML) approaches can enhance clinical decision support tools. Infliximab (IFX), a biologic with a newly introduced biosimilar for Crohn's disease (CD) and ulcerative colitis (UC), presents an opportunity to evaluate these tools at time of biosimilar switch to predict disease flares. This study sought to evaluate real-world safety and effectiveness of nonmedical IFX biosimilar switch in a national US cohort of CD and UC patients, and to develop and compare interpretable models for predicting adverse clinical events among patients on maintenance IFX.</p><p><strong>Materials and methods: </strong>This retrospective cohort study used administrative and clinical data from the National Veterans Health Administration Corporate Data Warehouse. It included 2529 Veterans with CD or UC on maintenance IFX (2017-2020), either continuing originator IFX or switching to a biosimilar. The primary outcome was disease-related flare. Classification and survival models were developed using traditional and ML methods and assessed via receiver operating characteristic curve, precision-recall curve, and decision curve analysis.</p><p><strong>Results: </strong>In 2529 Veterans with CD or UC, biosimilar switch had low predictive importance across survival models. Objective laboratory-related information yielded the highest validation. Random forest+ (RF+) outperformed all other statistical and ML models. Prior flares and total health-care encounters were the 2 most important predictors, while hemoglobin was the top laboratory predictor.</p><p><strong>Conclusions: </strong>Prediction models, particularly RF+, may aid in optimizing biologic therapy for CD and UC by identifying patients at higher risk of flare following a biosimilar switch.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf162"},"PeriodicalIF":3.4,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12681052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating sociodemographic bias in a deployed machine-learned patient deterioration model. 在部署的机器学习患者恶化模型中评估社会人口统计学偏差。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-01 DOI: 10.1093/jamiaopen/ooaf158
Michael Colacci, Chloe Pou-Prom, Arjumand Siddiqi, Muhammad Mamdani, Amol A Verma

Background: Bias evaluations of machine learning (ML) models often focus on performance in research settings, with limited assessment of downstream bias following clinical deployment. The objective of this study was to evaluate whether CHARTwatch, a real-time ML early warning system for inpatient deterioration, demonstrated algorithmic bias in model performance, or produced disparities in care processes, and outcomes across patient sociodemographic groups.

Methods: We evaluated CHARTwatch implementation on the internal medicine service at a large academic hospital. Patient outcomes during the intervention period (November 1, 2020-June 1, 2022) were compared to the control period (November 1, 2016-December 31, 2019) using propensity score overlap weighting. We evaluated differences across key sociodemographic subgroups, including age, sex, homelessness, and neighborhood-level socioeconomic and racialized composition. Outcomes included model performance (sensitivity and specificity), processes of care, and patient outcomes (non-palliative in-hospital death).

Results: Among 12 877 patients (9079 control, 3798 intervention), 13.3% were experiencing homelessness and 36.9% lived in the quintile with the highest neighborhood racialized and newcomer populations. Model sensitivity was 70.1% overall, with no significant variation across subgroups. Model specificity varied by age, <60 years: 93% (95% Confidence Interval [CI] 91-95%), 60-80 years: 90% (95%CI 87-92%), and >80 years: 84% (95%CI 79-88%), P < .001, but not other subgroups. CHARTwatch implementation was associated with an increase in code status documentation among patients experiencing homelessness, without significant differences in other care processes or outcomes.

Conclusion: CHARTwatch model performance and impact were generally consistent across measured sociodemographic subgroups. ML-based clinical decision support tools, and associated standardization of care, may reduce existing inequities, as was observed for code status orders among patients experiencing homelessness. This evaluation provides a framework for future bias assessments of deployed ML-CDS tools.

背景:机器学习(ML)模型的偏倚评估通常集中在研究环境中的表现,在临床部署后对下游偏倚的评估有限。本研究的目的是评估CHARTwatch(一种用于住院患者病情恶化的实时ML早期预警系统)是否在模型性能上表现出算法偏差,或者在不同患者社会人口统计学群体的护理过程和结果中是否产生差异。方法:评价charwatch在某大型学术医院内科服务中的实施情况。使用倾向评分重叠加权将干预期(2020年11月1日- 2022年6月1日)的患者结果与对照期(2016年11月1日- 2019年12月31日)的患者结果进行比较。我们评估了主要社会人口亚组的差异,包括年龄、性别、无家可归者、社区社会经济和种族构成。结果包括模型性能(敏感性和特异性)、护理过程和患者结果(非姑息性院内死亡)。结果:在12877例患者中(对照组9079例,干预组3798例),有13.3%的人无家可归,36.9%的人生活在社区种族化和新移民人口最多的五分位数。模型敏感性总体为70.1%,亚组间无显著差异。模型特异性因年龄而异,80岁:84% (95%CI 79-88%), P结论:charwatch模型的性能和影响在测量的社会人口亚组中基本一致。基于机器学习的临床决策支持工具和相关的护理标准化可能会减少现有的不公平现象,正如在无家可归的患者中观察到的代码状态命令一样。该评估为部署ML-CDS工具的未来偏差评估提供了一个框架。
{"title":"Evaluating sociodemographic bias in a deployed machine-learned patient deterioration model.","authors":"Michael Colacci, Chloe Pou-Prom, Arjumand Siddiqi, Muhammad Mamdani, Amol A Verma","doi":"10.1093/jamiaopen/ooaf158","DOIUrl":"10.1093/jamiaopen/ooaf158","url":null,"abstract":"<p><strong>Background: </strong>Bias evaluations of machine learning (ML) models often focus on performance in research settings, with limited assessment of downstream bias following clinical deployment. The objective of this study was to evaluate whether CHARTwatch, a real-time ML early warning system for inpatient deterioration, demonstrated algorithmic bias in model performance, or produced disparities in care processes, and outcomes across patient sociodemographic groups.</p><p><strong>Methods: </strong>We evaluated CHARTwatch implementation on the internal medicine service at a large academic hospital. Patient outcomes during the intervention period (November 1, 2020-June 1, 2022) were compared to the control period (November 1, 2016-December 31, 2019) using propensity score overlap weighting. We evaluated differences across key sociodemographic subgroups, including age, sex, homelessness, and neighborhood-level socioeconomic and racialized composition. Outcomes included model performance (sensitivity and specificity), processes of care, and patient outcomes (non-palliative in-hospital death).</p><p><strong>Results: </strong>Among 12 877 patients (9079 control, 3798 intervention), 13.3% were experiencing homelessness and 36.9% lived in the quintile with the highest neighborhood racialized and newcomer populations. Model sensitivity was 70.1% overall, with no significant variation across subgroups. Model specificity varied by age, <60 years: 93% (95% Confidence Interval [CI] 91-95%), 60-80 years: 90% (95%CI 87-92%), and >80 years: 84% (95%CI 79-88%), <i>P</i> < .001, but not other subgroups. CHARTwatch implementation was associated with an increase in code status documentation among patients experiencing homelessness, without significant differences in other care processes or outcomes.</p><p><strong>Conclusion: </strong>CHARTwatch model performance and impact were generally consistent across measured sociodemographic subgroups. ML-based clinical decision support tools, and associated standardization of care, may reduce existing inequities, as was observed for code status orders among patients experiencing homelessness. This evaluation provides a framework for future bias assessments of deployed ML-CDS tools.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf158"},"PeriodicalIF":3.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12668680/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145661688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biomedical data repositories require governance for artificial intelligence/machine learning applications at every step. 生物医学数据存储库在每一步都需要对人工智能/机器学习应用程序进行治理。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-01 DOI: 10.1093/jamiaopen/ooaf134
Ellen Wright Clayton, Susannah Rose, Camille Nebecker, Laurie Novak, Yael Bensoussan, You Chen, Benjamin X Collins, Ashley Cordes, Barbara J Evans, Kadija S Ferryman, Samantha Hurst, Xiaoqian Jiang, Aaron Y Lee, Shannon McWeeney, Jillian Parker, Jean-Christophe Bélisle-Pipon, Eric Rosenthal, Zhijun Yin, Joseph Yracheta, Bradley Adam Malin

Objectives: The NIH's Bridge2AI Program has funded 4 "new flagship biomedical and behavioral datasets that are properly documented and ready for use with AI [artificial intelligence] or ML [machine learning] technologies" to promote the adoption of AI. This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use.

Materials and methods: We outline major steps involved in creating and using these datasets in ethically acceptable ways, including (1) data selection-what data are being selected and why, (2) increasing attention to public concerns, (3) the role of participant consent depending on data source, (4) ensuring responsible use, (5) where and how data are stored, (6) what control participants have over data sharing, (7) data access, and (8) data download.

Results: We discuss ethical, legal, social, and practical challenges raised at each step of creating AI-ready datasets, noting the importance of addressing issues of future data storage and use. We identify some of the many choices that these projects have made, including how to incorporate public input, where to store data, and defining criteria for access to and downloading data.

Discussion: The processes involved in the establishment and governance of the Bridge2AI datasets vary widely but have common elements, suggesting opportunities for future programs to lean upon Bridge2AI strategies.

Conclusions: This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use, particularly as confronted by the 4 distinct projects funded by this program.

目标:NIH的Bridge2AI计划资助了4个“新的旗舰生物医学和行为数据集,这些数据集已被妥善记录并准备好与AI[人工智能]或ML[机器学习]技术一起使用”,以促进人工智能的采用。本文讨论了数据收集和治理中的挑战和经验教训,以确保负责任地使用它们。材料和方法:我们概述了以道德上可接受的方式创建和使用这些数据集所涉及的主要步骤,包括(1)数据选择-选择哪些数据及其原因,(2)增加对公众关注的关注,(3)参与者同意取决于数据源的作用,(4)确保负责任的使用,(5)数据存储的位置和方式,(6)参与者对数据共享的控制,(7)数据访问,(8)数据下载。结果:我们讨论了在创建人工智能就绪数据集的每个步骤中提出的道德、法律、社会和实践挑战,并注意到解决未来数据存储和使用问题的重要性。我们确定了这些项目所做的许多选择中的一些,包括如何纳入公共输入,在哪里存储数据,以及定义访问和下载数据的标准。讨论:建立和治理Bridge2AI数据集所涉及的流程差异很大,但有共同的要素,这为未来的项目提供了依赖Bridge2AI战略的机会。结论:本文讨论了数据收集和治理方面的挑战和经验教训,以确保负责任地使用数据,特别是该计划资助的4个不同项目所面临的挑战和经验教训。
{"title":"Biomedical data repositories require governance for artificial intelligence/machine learning applications at every step.","authors":"Ellen Wright Clayton, Susannah Rose, Camille Nebecker, Laurie Novak, Yael Bensoussan, You Chen, Benjamin X Collins, Ashley Cordes, Barbara J Evans, Kadija S Ferryman, Samantha Hurst, Xiaoqian Jiang, Aaron Y Lee, Shannon McWeeney, Jillian Parker, Jean-Christophe Bélisle-Pipon, Eric Rosenthal, Zhijun Yin, Joseph Yracheta, Bradley Adam Malin","doi":"10.1093/jamiaopen/ooaf134","DOIUrl":"10.1093/jamiaopen/ooaf134","url":null,"abstract":"<p><strong>Objectives: </strong>The NIH's Bridge2AI Program has funded 4 \"new flagship biomedical and behavioral datasets that are properly documented and ready for use with AI [artificial intelligence] or ML [machine learning] technologies\" to promote the adoption of AI. This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use.</p><p><strong>Materials and methods: </strong>We outline major steps involved in creating and using these datasets in ethically acceptable ways, including (1) data selection-what data are being selected and why, (2) increasing attention to public concerns, (3) the role of participant consent depending on data source, (4) ensuring responsible use, (5) where and how data are stored, (6) what control participants have over data sharing, (7) data access, and (8) data download.</p><p><strong>Results: </strong>We discuss ethical, legal, social, and practical challenges raised at each step of creating AI-ready datasets, noting the importance of addressing issues of future data storage and use. We identify some of the many choices that these projects have made, including how to incorporate public input, where to store data, and defining criteria for access to and downloading data.</p><p><strong>Discussion: </strong>The processes involved in the establishment and governance of the Bridge2AI datasets vary widely but have common elements, suggesting opportunities for future programs to lean upon Bridge2AI strategies.</p><p><strong>Conclusions: </strong>This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use, particularly as confronted by the 4 distinct projects funded by this program.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf134"},"PeriodicalIF":3.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12668681/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145661265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JAMIA Open
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1