首页 > 最新文献

JAMIA Open最新文献

英文 中文
Correction to: Response to survey directed to patient portal members differs by age, race, and healthcare utilization. 更正:针对患者门户网站成员的调查结果因年龄、种族和医疗保健利用情况而异。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-10 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf124

[This corrects the article DOI: 10.1093/jamiaopen/ooz061.].

[这更正了文章DOI: 10.1093/jamiaopen/ooz061.]。
{"title":"Correction to: Response to survey directed to patient portal members differs by age, race, and healthcare utilization.","authors":"","doi":"10.1093/jamiaopen/ooaf124","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooaf124","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.1093/jamiaopen/ooz061.].</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf124"},"PeriodicalIF":3.4,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12706857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated classification of exposure and encourage events in speech data from pediatric OCD treatment. 儿童强迫症治疗的语音数据中暴露和鼓励事件的自动分类。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-09 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf127
Juan Antonio Lossio-Ventura, Samuel Frank, Grace Ringlein, Kirsten Bonson, Ardyn Olszko, Abbey Knobel, Daniel S Pine, Jennifer B Freeman, Kristen Benito, David C Jangraw, Francisco Pereira

Objective: To develop and evaluate an automated classification system for labeling Exposure Process Coding System (EPCS) quality codes-specifically exposure and encourage events-during in-person exposure therapy sessions using automatic speech recognition (ASR) and natural language processing techniques.

Materials and methods: The system was trained and tested on 360 manually labeled pediatric Obsessive-Compulsive Disorder (OCD) therapy sessions from 3 clinical trials. Audio recordings were transcribed using ASR tools (OpenAI's Whisper and Google Speech-to-Text). Transcription accuracy was evaluated via word error rate (WER) on manual transcriptions of 2-minute audio segments compared against ASR-generated transcripts. The resulting text was analyzed with transformer-based models, including Bidirectional Encoder Representations from Transformers (BERT), Sentence-BERT, and Meta Llama 3. Models were trained to predict EPCS codes in 2 classification settings: sequence-level classification, where events are labeled in delimited text chunks, and token-level classification, where event boundaries are unknown. Classification was performed either with fine-tuned transformer-based models, or with logistic regression on embeddings produced by each model.

Results: With respect to transcription accuracy, Whisper outperformed Google Speech-to-Text with a lower WER (0.31 vs 0.51). For sequence classification setting, Llama 3 models achieved high performance with area under the ROC curve (AUC) scores of 0.95 for exposures and 0.75 for encourage events, outperforming traditional methods and standard BERT models. In the token-level setting, fine-tuned BERT models performed best, achieving AUC scores of 0.85 for exposures and 0.75 for encourage events.

Discussion and conclusion: Current ASR and transformer-based models enable automated quality coding of in-person exposure therapy sessions. These findings demonstrate potential for real-time assessment in clinical practice and scalable research on effective therapy methods. Future work should focus on optimization, including improvements in ASR accuracy, expanding training datasets, and multimodal data integration.

目的:利用自动语音识别(ASR)和自然语言处理技术,开发和评估一种用于标记暴露过程编码系统(EPCS)质量代码(特别是暴露和鼓励事件)的自动分类系统。材料与方法:对该系统进行了3个临床试验的360个手动标记的儿童强迫症(OCD)治疗疗程的训练和测试。使用ASR工具(OpenAI的Whisper和谷歌Speech-to-Text)转录音频记录。转录准确性通过人工转录2分钟音频片段的单词错误率(WER)与asr生成的转录进行比较。结果文本使用基于变压器的模型进行分析,包括来自变压器的双向编码器表示(BERT)、句子-BERT和Meta Llama 3。训练模型在两种分类设置下预测EPCS代码:序列级分类,其中事件在分隔的文本块中标记,以及标记级分类,其中事件边界未知。通过微调变压器模型或对每个模型产生的嵌入进行逻辑回归进行分类。结果:在转录准确性方面,Whisper优于谷歌Speech-to-Text, WER较低(0.31 vs 0.51)。在序列分类设置方面,Llama 3模型的ROC曲线下面积(area under ROC curve, AUC)得分在曝光和鼓励事件下分别为0.95和0.75,优于传统方法和标准BERT模型。在令牌级别设置中,微调BERT模型表现最佳,暴露的AUC得分为0.85,鼓励事件的AUC得分为0.75。讨论和结论:当前的ASR和基于变压器的模型能够实现面对面暴露治疗过程的自动质量编码。这些发现显示了在临床实践和有效治疗方法的可扩展研究中进行实时评估的潜力。未来的工作应侧重于优化,包括提高ASR的准确性、扩展训练数据集和多模态数据集成。
{"title":"Automated classification of exposure and encourage events in speech data from pediatric OCD treatment.","authors":"Juan Antonio Lossio-Ventura, Samuel Frank, Grace Ringlein, Kirsten Bonson, Ardyn Olszko, Abbey Knobel, Daniel S Pine, Jennifer B Freeman, Kristen Benito, David C Jangraw, Francisco Pereira","doi":"10.1093/jamiaopen/ooaf127","DOIUrl":"10.1093/jamiaopen/ooaf127","url":null,"abstract":"<p><strong>Objective: </strong>To develop and evaluate an automated classification system for labeling Exposure Process Coding System (EPCS) quality codes-specifically exposure and encourage events-during in-person exposure therapy sessions using automatic speech recognition (ASR) and natural language processing techniques.</p><p><strong>Materials and methods: </strong>The system was trained and tested on 360 manually labeled pediatric Obsessive-Compulsive Disorder (OCD) therapy sessions from 3 clinical trials. Audio recordings were transcribed using ASR tools (OpenAI's Whisper and Google Speech-to-Text). Transcription accuracy was evaluated via word error rate (WER) on manual transcriptions of 2-minute audio segments compared against ASR-generated transcripts. The resulting text was analyzed with transformer-based models, including Bidirectional Encoder Representations from Transformers (BERT), Sentence-BERT, and Meta Llama 3. Models were trained to predict EPCS codes in 2 classification settings: sequence-level classification, where events are labeled in delimited text chunks, and token-level classification, where event boundaries are unknown. Classification was performed either with fine-tuned transformer-based models, or with logistic regression on embeddings produced by each model.</p><p><strong>Results: </strong>With respect to transcription accuracy, Whisper outperformed Google Speech-to-Text with a lower WER (0.31 vs 0.51). For sequence classification setting, Llama 3 models achieved high performance with area under the ROC curve (AUC) scores of 0.95 for exposures and 0.75 for encourage events, outperforming traditional methods and standard BERT models. In the token-level setting, fine-tuned BERT models performed best, achieving AUC scores of 0.85 for exposures and 0.75 for encourage events.</p><p><strong>Discussion and conclusion: </strong>Current ASR and transformer-based models enable automated quality coding of in-person exposure therapy sessions. These findings demonstrate potential for real-time assessment in clinical practice and scalable research on effective therapy methods. Future work should focus on optimization, including improvements in ASR accuracy, expanding training datasets, and multimodal data integration.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf127"},"PeriodicalIF":3.4,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696644/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145757883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Utilizing natural language processing to identify cancer-relevant publications at a National Cancer Institute-designated cancer center. 利用自然语言处理在国家癌症研究所指定的癌症中心识别癌症相关出版物。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-09 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf156
Whitney Shae, Md Saiful Islam Saif, John Fife, Dinesh Pal Mudaranthakam, Dong Pei, Lisa Harlan-Williams, Jeffrey A Thompson, Devin C Koestler

Objectives: The objective of this study was to develop and test natural language processing (NLP) methods for screening and, ultimately, predicting the cancer relevance of peer-reviewed publications.

Materials and methods: Two datasets were used: (1) manually curated publications labeled for cancer relevance, co-authored by members of The University of Kansas Cancer Center (KUCC) and (2) a derived dataset containing cancer-related abstracts from American Association for Cancer Research journals and noncancer-related abstracts from other medical journals. Two text encoding methods were explored: term frequency-inverse document frequency (TF-IDF) vectorization and various BERT embeddings. These representations served as inputs to 3 supervised machine learning classifiers: Support Vector Classification (SVC), Gradient Boosting Classification, and Multilayer Perceptron (MLP) neural networks. Model performance was evaluated by comparing predictions to the "true" cancer-relevant labels in a withheld test set.

Results: All machine learning models performed best when trained and tested within the derived dataset. Across the datasets, SVC and MLP both exhibited strong performance, with F1 scores as high as 0.976 and 0.997, respectively. BioBERT embeddings resulted in slightly higher metrics when compared to TF-IDF vectorization across most models.

Discussion: Models trained on the derived data performed very well internally; however, weaker performance was noted when these models were tested on the KUCC dataset. This finding highlights the subjective nature of cancer-relevant determinations. In contrast, KUCC trained models had high predictive performance when tested on the derived-specific classifications, showing that models trained on the KUCC dataset may be suitable for wider cancer-relevant prediction.

Conclusions: Overall, our results suggest that NLP can effectively automate the classification of cancer-relevant publications, enhancing research productivity tracking; however, great care should be taken in selecting the appropriate data, text representation approach, and machine learning approach.

目的:本研究的目的是开发和测试用于筛选和最终预测同行评审出版物的癌症相关性的自然语言处理(NLP)方法。材料和方法:使用了两个数据集:(1)由堪萨斯大学癌症中心(KUCC)成员共同撰写的标记为癌症相关的人工整理出版物;(2)包含来自美国癌症研究协会期刊的癌症相关摘要和来自其他医学期刊的非癌症相关摘要的衍生数据集。研究了两种文本编码方法:词频逆文档频率(TF-IDF)矢量化和各种BERT嵌入。这些表示作为3个监督机器学习分类器的输入:支持向量分类(SVC),梯度增强分类和多层感知器(MLP)神经网络。通过将预测结果与保留测试集中的“真实”癌症相关标签进行比较,来评估模型的性能。结果:所有机器学习模型在派生数据集中进行训练和测试时表现最佳。在所有数据集中,SVC和MLP均表现出较强的性能,F1得分分别高达0.976和0.997。在大多数模型中,与TF-IDF矢量化相比,BioBERT嵌入的指标略高。讨论:在导出数据上训练的模型在内部表现非常好;然而,当这些模型在KUCC数据集上进行测试时,发现性能较差。这一发现强调了癌症相关决定的主观性。相比之下,在对衍生的特定分类进行测试时,KUCC训练的模型具有较高的预测性能,这表明在KUCC数据集上训练的模型可能适用于更广泛的癌症相关预测。结论:总体而言,我们的研究结果表明,NLP可以有效地自动化癌症相关出版物的分类,增强研究生产力跟踪;然而,在选择合适的数据、文本表示方法和机器学习方法时应该非常小心。
{"title":"Utilizing natural language processing to identify cancer-relevant publications at a National Cancer Institute-designated cancer center.","authors":"Whitney Shae, Md Saiful Islam Saif, John Fife, Dinesh Pal Mudaranthakam, Dong Pei, Lisa Harlan-Williams, Jeffrey A Thompson, Devin C Koestler","doi":"10.1093/jamiaopen/ooaf156","DOIUrl":"10.1093/jamiaopen/ooaf156","url":null,"abstract":"<p><strong>Objectives: </strong>The objective of this study was to develop and test natural language processing (NLP) methods for screening and, ultimately, predicting the cancer relevance of peer-reviewed publications.</p><p><strong>Materials and methods: </strong>Two datasets were used: (1) manually curated publications labeled for cancer relevance, co-authored by members of The University of Kansas Cancer Center (KUCC) and (2) a derived dataset containing cancer-related abstracts from American Association for Cancer Research journals and noncancer-related abstracts from other medical journals. Two text encoding methods were explored: term frequency-inverse document frequency (TF-IDF) vectorization and various BERT embeddings. These representations served as inputs to 3 supervised machine learning classifiers: Support Vector Classification (SVC), Gradient Boosting Classification, and Multilayer Perceptron (MLP) neural networks. Model performance was evaluated by comparing predictions to the \"true\" cancer-relevant labels in a withheld test set.</p><p><strong>Results: </strong>All machine learning models performed best when trained and tested within the derived dataset. Across the datasets, SVC and MLP both exhibited strong performance, with F1 scores as high as 0.976 and 0.997, respectively. BioBERT embeddings resulted in slightly higher metrics when compared to TF-IDF vectorization across most models.</p><p><strong>Discussion: </strong>Models trained on the derived data performed very well internally; however, weaker performance was noted when these models were tested on the KUCC dataset. This finding highlights the subjective nature of cancer-relevant determinations. In contrast, KUCC trained models had high predictive performance when tested on the derived-specific classifications, showing that models trained on the KUCC dataset may be suitable for wider cancer-relevant prediction.</p><p><strong>Conclusions: </strong>Overall, our results suggest that NLP can effectively automate the classification of cancer-relevant publications, enhancing research productivity tracking; however, great care should be taken in selecting the appropriate data, text representation approach, and machine learning approach.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf156"},"PeriodicalIF":3.4,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696645/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145757887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal feature analysis for automated neonatal jaundice assessment using machine learning. 使用机器学习进行新生儿黄疸自动评估的多模态特征分析。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-06 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf165
Yunfeng Liang, Lin Zou, Millie Ming Rong Goh, Alvin Jia Hao Ngeow, Ngiap Chuan Tan, Andy Wee An Ta, Han Leong Goh

Objective: Neonatal jaundice monitoring is resource-intensive. Existing artificial intelligence methods use image or clinical data, but none systematically combine both or compare feature contributions. This study fills that gap by extracting and analyzing multimodal features on a large dataset, identifying an optimal feature set for accurate, accessible jaundice assessment.

Materials and methods: This study collected clinical data and skin images from 3 body regions of 633 neonates, generating 460 features across 4 categories. Four tree-based models were used to predict total serum bilirubin levels and feature importance analysis guided the selection of an optimal feature set.

Results: The optimal performance was achieved using the Light Gradient Boosting Machine (LGBM) model with 140 selected features, yielding a root mean square error (RMSE) of 2.0477 mg/dL and a Pearson correlation of 0.8435. This represents a performance gain of over 10% in RMSE compared to models using only a single data modality. Moreover, selecting the top 30 features based on SHapley Additive exPlanation (SHAP) allows for a substantial reduction in data dimensionality, while maintaining performance within 5% of the optimal model.

Discussion: Color features contributed over 60% of the total importance, with clinical data adding more than 25%, led by hour of life. Light temperature also affected predictions, while texture features had minimal impact. Among body regions, the abdomen provided the most informative signals for jaundice severity.

Conclusion: The proposed algorithm shows promise for real-world use by enabling timely, automated jaundice assessment for families, while also offering insights for future research and broader medical applications.

目的:新生儿黄疸监测是资源密集型的。现有的人工智能方法使用图像或临床数据,但没有系统地结合两者或比较特征贡献。本研究通过在大型数据集上提取和分析多模态特征来填补这一空白,为准确、可访问的黄疸评估确定最佳特征集。材料与方法:本研究收集633例新生儿3个身体区域的临床资料和皮肤图像,生成4类460个特征。四种基于树的模型用于预测血清总胆红素水平,特征重要性分析指导了最佳特征集的选择。结果:选择140个特征的光梯度增强机(LGBM)模型获得最佳性能,其均方根误差(RMSE)为2.0477 mg/dL, Pearson相关系数为0.8435。这表示与仅使用单一数据模式的模型相比,RMSE的性能提高了10%以上。此外,基于SHapley加性解释(SHAP)选择前30个特征,可以大幅降低数据维度,同时将性能保持在最优模型的5%以内。讨论:颜色特征占总重要性的60%以上,临床数据占25%以上,以生命小时为单位。光照温度也会影响预测,而纹理特征的影响最小。在身体区域中,腹部提供了黄疸严重程度的最信息信号。结论:该算法有望在现实世界中使用,为家庭提供及时、自动的黄疸评估,同时也为未来的研究和更广泛的医疗应用提供了见解。
{"title":"Multimodal feature analysis for automated neonatal jaundice assessment using machine learning.","authors":"Yunfeng Liang, Lin Zou, Millie Ming Rong Goh, Alvin Jia Hao Ngeow, Ngiap Chuan Tan, Andy Wee An Ta, Han Leong Goh","doi":"10.1093/jamiaopen/ooaf165","DOIUrl":"10.1093/jamiaopen/ooaf165","url":null,"abstract":"<p><strong>Objective: </strong>Neonatal jaundice monitoring is resource-intensive. Existing artificial intelligence methods use image or clinical data, but none systematically combine both or compare feature contributions. This study fills that gap by extracting and analyzing multimodal features on a large dataset, identifying an optimal feature set for accurate, accessible jaundice assessment.</p><p><strong>Materials and methods: </strong>This study collected clinical data and skin images from 3 body regions of 633 neonates, generating 460 features across 4 categories. Four tree-based models were used to predict total serum bilirubin levels and feature importance analysis guided the selection of an optimal feature set.</p><p><strong>Results: </strong>The optimal performance was achieved using the Light Gradient Boosting Machine (LGBM) model with 140 selected features, yielding a root mean square error (RMSE) of 2.0477 mg/dL and a Pearson correlation of 0.8435. This represents a performance gain of over 10% in RMSE compared to models using only a single data modality. Moreover, selecting the top 30 features based on SHapley Additive exPlanation (SHAP) allows for a substantial reduction in data dimensionality, while maintaining performance within 5% of the optimal model.</p><p><strong>Discussion: </strong>Color features contributed over 60% of the total importance, with clinical data adding more than 25%, led by hour of life. Light temperature also affected predictions, while texture features had minimal impact. Among body regions, the abdomen provided the most informative signals for jaundice severity.</p><p><strong>Conclusion: </strong>The proposed algorithm shows promise for real-world use by enabling timely, automated jaundice assessment for families, while also offering insights for future research and broader medical applications.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf165"},"PeriodicalIF":3.4,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12687590/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145726113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using machine learning algorithms to optimize treatment with high-cost biologics in a national cohort of patients with inflammatory bowel disease. 使用机器学习算法在全国炎症性肠病患者队列中优化高成本生物制剂治疗。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-03 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf162
Jason K Hou, Tiffany M Tang, Shubhada Sansgiry, Tony Van, Peter A Richardson, Codey Pham, Francesca Cunningham, Jessica A Baker, Ji Zhu, Akbar K Waljee

Objectives: Prediction models using statistical or machine learning (ML) approaches can enhance clinical decision support tools. Infliximab (IFX), a biologic with a newly introduced biosimilar for Crohn's disease (CD) and ulcerative colitis (UC), presents an opportunity to evaluate these tools at time of biosimilar switch to predict disease flares. This study sought to evaluate real-world safety and effectiveness of nonmedical IFX biosimilar switch in a national US cohort of CD and UC patients, and to develop and compare interpretable models for predicting adverse clinical events among patients on maintenance IFX.

Materials and methods: This retrospective cohort study used administrative and clinical data from the National Veterans Health Administration Corporate Data Warehouse. It included 2529 Veterans with CD or UC on maintenance IFX (2017-2020), either continuing originator IFX or switching to a biosimilar. The primary outcome was disease-related flare. Classification and survival models were developed using traditional and ML methods and assessed via receiver operating characteristic curve, precision-recall curve, and decision curve analysis.

Results: In 2529 Veterans with CD or UC, biosimilar switch had low predictive importance across survival models. Objective laboratory-related information yielded the highest validation. Random forest+ (RF+) outperformed all other statistical and ML models. Prior flares and total health-care encounters were the 2 most important predictors, while hemoglobin was the top laboratory predictor.

Conclusions: Prediction models, particularly RF+, may aid in optimizing biologic therapy for CD and UC by identifying patients at higher risk of flare following a biosimilar switch.

目的:使用统计或机器学习(ML)方法的预测模型可以增强临床决策支持工具。英夫利昔单抗(IFX)是一种新推出的用于克罗恩病(CD)和溃疡性结肠炎(UC)的生物仿制药,为在生物仿制药切换时评估这些工具以预测疾病发作提供了机会。本研究旨在评估非医疗IFX生物仿制药在美国CD和UC患者队列中的实际安全性和有效性,并开发和比较可解释的模型,以预测维持IFX患者的不良临床事件。材料和方法:这项回顾性队列研究使用了来自国家退伍军人健康管理局公司数据仓库的行政和临床数据。该研究包括2529名患有CD或UC的退伍军人,在维持IFX(2017-2020)期间,要么继续原研IFX,要么切换到生物仿制药。主要终点是疾病相关的爆发。采用传统方法和机器学习方法建立分类和生存模型,并通过受试者工作特征曲线、精确召回率曲线和决策曲线分析进行评估。结果:在2529名患有CD或UC的退伍军人中,生物仿制药切换在生存模型中的预测重要性较低。客观的实验室相关信息产生了最高的有效性。随机森林+ (RF+)优于所有其他统计和ML模型。先前的耀斑和总就诊次数是两个最重要的预测因子,而血红蛋白是最重要的实验室预测因子。结论:预测模型,特别是RF+,可能有助于优化CD和UC的生物治疗,通过识别生物仿制药切换后爆发风险较高的患者。
{"title":"Using machine learning algorithms to optimize treatment with high-cost biologics in a national cohort of patients with inflammatory bowel disease.","authors":"Jason K Hou, Tiffany M Tang, Shubhada Sansgiry, Tony Van, Peter A Richardson, Codey Pham, Francesca Cunningham, Jessica A Baker, Ji Zhu, Akbar K Waljee","doi":"10.1093/jamiaopen/ooaf162","DOIUrl":"10.1093/jamiaopen/ooaf162","url":null,"abstract":"<p><strong>Objectives: </strong>Prediction models using statistical or machine learning (ML) approaches can enhance clinical decision support tools. Infliximab (IFX), a biologic with a newly introduced biosimilar for Crohn's disease (CD) and ulcerative colitis (UC), presents an opportunity to evaluate these tools at time of biosimilar switch to predict disease flares. This study sought to evaluate real-world safety and effectiveness of nonmedical IFX biosimilar switch in a national US cohort of CD and UC patients, and to develop and compare interpretable models for predicting adverse clinical events among patients on maintenance IFX.</p><p><strong>Materials and methods: </strong>This retrospective cohort study used administrative and clinical data from the National Veterans Health Administration Corporate Data Warehouse. It included 2529 Veterans with CD or UC on maintenance IFX (2017-2020), either continuing originator IFX or switching to a biosimilar. The primary outcome was disease-related flare. Classification and survival models were developed using traditional and ML methods and assessed via receiver operating characteristic curve, precision-recall curve, and decision curve analysis.</p><p><strong>Results: </strong>In 2529 Veterans with CD or UC, biosimilar switch had low predictive importance across survival models. Objective laboratory-related information yielded the highest validation. Random forest+ (RF+) outperformed all other statistical and ML models. Prior flares and total health-care encounters were the 2 most important predictors, while hemoglobin was the top laboratory predictor.</p><p><strong>Conclusions: </strong>Prediction models, particularly RF+, may aid in optimizing biologic therapy for CD and UC by identifying patients at higher risk of flare following a biosimilar switch.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf162"},"PeriodicalIF":3.4,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12681052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating sociodemographic bias in a deployed machine-learned patient deterioration model. 在部署的机器学习患者恶化模型中评估社会人口统计学偏差。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-01 DOI: 10.1093/jamiaopen/ooaf158
Michael Colacci, Chloe Pou-Prom, Arjumand Siddiqi, Muhammad Mamdani, Amol A Verma

Background: Bias evaluations of machine learning (ML) models often focus on performance in research settings, with limited assessment of downstream bias following clinical deployment. The objective of this study was to evaluate whether CHARTwatch, a real-time ML early warning system for inpatient deterioration, demonstrated algorithmic bias in model performance, or produced disparities in care processes, and outcomes across patient sociodemographic groups.

Methods: We evaluated CHARTwatch implementation on the internal medicine service at a large academic hospital. Patient outcomes during the intervention period (November 1, 2020-June 1, 2022) were compared to the control period (November 1, 2016-December 31, 2019) using propensity score overlap weighting. We evaluated differences across key sociodemographic subgroups, including age, sex, homelessness, and neighborhood-level socioeconomic and racialized composition. Outcomes included model performance (sensitivity and specificity), processes of care, and patient outcomes (non-palliative in-hospital death).

Results: Among 12 877 patients (9079 control, 3798 intervention), 13.3% were experiencing homelessness and 36.9% lived in the quintile with the highest neighborhood racialized and newcomer populations. Model sensitivity was 70.1% overall, with no significant variation across subgroups. Model specificity varied by age, <60 years: 93% (95% Confidence Interval [CI] 91-95%), 60-80 years: 90% (95%CI 87-92%), and >80 years: 84% (95%CI 79-88%), P < .001, but not other subgroups. CHARTwatch implementation was associated with an increase in code status documentation among patients experiencing homelessness, without significant differences in other care processes or outcomes.

Conclusion: CHARTwatch model performance and impact were generally consistent across measured sociodemographic subgroups. ML-based clinical decision support tools, and associated standardization of care, may reduce existing inequities, as was observed for code status orders among patients experiencing homelessness. This evaluation provides a framework for future bias assessments of deployed ML-CDS tools.

背景:机器学习(ML)模型的偏倚评估通常集中在研究环境中的表现,在临床部署后对下游偏倚的评估有限。本研究的目的是评估CHARTwatch(一种用于住院患者病情恶化的实时ML早期预警系统)是否在模型性能上表现出算法偏差,或者在不同患者社会人口统计学群体的护理过程和结果中是否产生差异。方法:评价charwatch在某大型学术医院内科服务中的实施情况。使用倾向评分重叠加权将干预期(2020年11月1日- 2022年6月1日)的患者结果与对照期(2016年11月1日- 2019年12月31日)的患者结果进行比较。我们评估了主要社会人口亚组的差异,包括年龄、性别、无家可归者、社区社会经济和种族构成。结果包括模型性能(敏感性和特异性)、护理过程和患者结果(非姑息性院内死亡)。结果:在12877例患者中(对照组9079例,干预组3798例),有13.3%的人无家可归,36.9%的人生活在社区种族化和新移民人口最多的五分位数。模型敏感性总体为70.1%,亚组间无显著差异。模型特异性因年龄而异,80岁:84% (95%CI 79-88%), P结论:charwatch模型的性能和影响在测量的社会人口亚组中基本一致。基于机器学习的临床决策支持工具和相关的护理标准化可能会减少现有的不公平现象,正如在无家可归的患者中观察到的代码状态命令一样。该评估为部署ML-CDS工具的未来偏差评估提供了一个框架。
{"title":"Evaluating sociodemographic bias in a deployed machine-learned patient deterioration model.","authors":"Michael Colacci, Chloe Pou-Prom, Arjumand Siddiqi, Muhammad Mamdani, Amol A Verma","doi":"10.1093/jamiaopen/ooaf158","DOIUrl":"10.1093/jamiaopen/ooaf158","url":null,"abstract":"<p><strong>Background: </strong>Bias evaluations of machine learning (ML) models often focus on performance in research settings, with limited assessment of downstream bias following clinical deployment. The objective of this study was to evaluate whether CHARTwatch, a real-time ML early warning system for inpatient deterioration, demonstrated algorithmic bias in model performance, or produced disparities in care processes, and outcomes across patient sociodemographic groups.</p><p><strong>Methods: </strong>We evaluated CHARTwatch implementation on the internal medicine service at a large academic hospital. Patient outcomes during the intervention period (November 1, 2020-June 1, 2022) were compared to the control period (November 1, 2016-December 31, 2019) using propensity score overlap weighting. We evaluated differences across key sociodemographic subgroups, including age, sex, homelessness, and neighborhood-level socioeconomic and racialized composition. Outcomes included model performance (sensitivity and specificity), processes of care, and patient outcomes (non-palliative in-hospital death).</p><p><strong>Results: </strong>Among 12 877 patients (9079 control, 3798 intervention), 13.3% were experiencing homelessness and 36.9% lived in the quintile with the highest neighborhood racialized and newcomer populations. Model sensitivity was 70.1% overall, with no significant variation across subgroups. Model specificity varied by age, <60 years: 93% (95% Confidence Interval [CI] 91-95%), 60-80 years: 90% (95%CI 87-92%), and >80 years: 84% (95%CI 79-88%), <i>P</i> < .001, but not other subgroups. CHARTwatch implementation was associated with an increase in code status documentation among patients experiencing homelessness, without significant differences in other care processes or outcomes.</p><p><strong>Conclusion: </strong>CHARTwatch model performance and impact were generally consistent across measured sociodemographic subgroups. ML-based clinical decision support tools, and associated standardization of care, may reduce existing inequities, as was observed for code status orders among patients experiencing homelessness. This evaluation provides a framework for future bias assessments of deployed ML-CDS tools.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf158"},"PeriodicalIF":3.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12668680/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145661688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biomedical data repositories require governance for artificial intelligence/machine learning applications at every step. 生物医学数据存储库在每一步都需要对人工智能/机器学习应用程序进行治理。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-12-01 DOI: 10.1093/jamiaopen/ooaf134
Ellen Wright Clayton, Susannah Rose, Camille Nebecker, Laurie Novak, Yael Bensoussan, You Chen, Benjamin X Collins, Ashley Cordes, Barbara J Evans, Kadija S Ferryman, Samantha Hurst, Xiaoqian Jiang, Aaron Y Lee, Shannon McWeeney, Jillian Parker, Jean-Christophe Bélisle-Pipon, Eric Rosenthal, Zhijun Yin, Joseph Yracheta, Bradley Adam Malin

Objectives: The NIH's Bridge2AI Program has funded 4 "new flagship biomedical and behavioral datasets that are properly documented and ready for use with AI [artificial intelligence] or ML [machine learning] technologies" to promote the adoption of AI. This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use.

Materials and methods: We outline major steps involved in creating and using these datasets in ethically acceptable ways, including (1) data selection-what data are being selected and why, (2) increasing attention to public concerns, (3) the role of participant consent depending on data source, (4) ensuring responsible use, (5) where and how data are stored, (6) what control participants have over data sharing, (7) data access, and (8) data download.

Results: We discuss ethical, legal, social, and practical challenges raised at each step of creating AI-ready datasets, noting the importance of addressing issues of future data storage and use. We identify some of the many choices that these projects have made, including how to incorporate public input, where to store data, and defining criteria for access to and downloading data.

Discussion: The processes involved in the establishment and governance of the Bridge2AI datasets vary widely but have common elements, suggesting opportunities for future programs to lean upon Bridge2AI strategies.

Conclusions: This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use, particularly as confronted by the 4 distinct projects funded by this program.

目标:NIH的Bridge2AI计划资助了4个“新的旗舰生物医学和行为数据集,这些数据集已被妥善记录并准备好与AI[人工智能]或ML[机器学习]技术一起使用”,以促进人工智能的采用。本文讨论了数据收集和治理中的挑战和经验教训,以确保负责任地使用它们。材料和方法:我们概述了以道德上可接受的方式创建和使用这些数据集所涉及的主要步骤,包括(1)数据选择-选择哪些数据及其原因,(2)增加对公众关注的关注,(3)参与者同意取决于数据源的作用,(4)确保负责任的使用,(5)数据存储的位置和方式,(6)参与者对数据共享的控制,(7)数据访问,(8)数据下载。结果:我们讨论了在创建人工智能就绪数据集的每个步骤中提出的道德、法律、社会和实践挑战,并注意到解决未来数据存储和使用问题的重要性。我们确定了这些项目所做的许多选择中的一些,包括如何纳入公共输入,在哪里存储数据,以及定义访问和下载数据的标准。讨论:建立和治理Bridge2AI数据集所涉及的流程差异很大,但有共同的要素,这为未来的项目提供了依赖Bridge2AI战略的机会。结论:本文讨论了数据收集和治理方面的挑战和经验教训,以确保负责任地使用数据,特别是该计划资助的4个不同项目所面临的挑战和经验教训。
{"title":"Biomedical data repositories require governance for artificial intelligence/machine learning applications at every step.","authors":"Ellen Wright Clayton, Susannah Rose, Camille Nebecker, Laurie Novak, Yael Bensoussan, You Chen, Benjamin X Collins, Ashley Cordes, Barbara J Evans, Kadija S Ferryman, Samantha Hurst, Xiaoqian Jiang, Aaron Y Lee, Shannon McWeeney, Jillian Parker, Jean-Christophe Bélisle-Pipon, Eric Rosenthal, Zhijun Yin, Joseph Yracheta, Bradley Adam Malin","doi":"10.1093/jamiaopen/ooaf134","DOIUrl":"10.1093/jamiaopen/ooaf134","url":null,"abstract":"<p><strong>Objectives: </strong>The NIH's Bridge2AI Program has funded 4 \"new flagship biomedical and behavioral datasets that are properly documented and ready for use with AI [artificial intelligence] or ML [machine learning] technologies\" to promote the adoption of AI. This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use.</p><p><strong>Materials and methods: </strong>We outline major steps involved in creating and using these datasets in ethically acceptable ways, including (1) data selection-what data are being selected and why, (2) increasing attention to public concerns, (3) the role of participant consent depending on data source, (4) ensuring responsible use, (5) where and how data are stored, (6) what control participants have over data sharing, (7) data access, and (8) data download.</p><p><strong>Results: </strong>We discuss ethical, legal, social, and practical challenges raised at each step of creating AI-ready datasets, noting the importance of addressing issues of future data storage and use. We identify some of the many choices that these projects have made, including how to incorporate public input, where to store data, and defining criteria for access to and downloading data.</p><p><strong>Discussion: </strong>The processes involved in the establishment and governance of the Bridge2AI datasets vary widely but have common elements, suggesting opportunities for future programs to lean upon Bridge2AI strategies.</p><p><strong>Conclusions: </strong>This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use, particularly as confronted by the 4 distinct projects funded by this program.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf134"},"PeriodicalIF":3.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12668681/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145661265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmenting large language models to predict social determinants of mental health in opioid use disorder using patient clinical notes. 利用患者临床记录增强大型语言模型来预测阿片类药物使用障碍中心理健康的社会决定因素。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-27 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf142
Madhavi Pagare, Deva Sai Kumar Bheesetti, Inyene Essien-Aleksi, Mohammad Arif Ul Alam

Objective: Identifying social determinants of mental health (SDOMH) in patients with opioid use disorder (OUD) is crucial for estimating risk and enabling early intervention. Extracting such data from unstructured clinical notes is challenging due to annotation complexity and requires advanced natural language processing (NLP) techniques. We propose the Human-in-the-Loop Large Language Model Interaction for Annotation (HLLIA) framework, combined with a Multilevel Hierarchical Clinical-Longformer Embedding (MHCLE) algorithm, to annotate and predict SDOMH variables.

Materials and methods: We utilized 2636 annotated discharge summaries from the Medical Information Mart for Intensive Care (MIMIC-IV) dataset. High-quality annotations were ensured via a human-in-the-loop approach, refined using large language models (LLMs). The MHCLE algorithm performed multi-label classification of 13 SDOMH variables and was evaluated against baseline models, including RoBERTa, Bio_ClinicalBERT, ClinicalBERT, and ClinicalBigBird.

Results: The MHCLE model achieved superior performance with 96.29% accuracy and a 95.41% F1score, surpassing baseline models. Training-testing policies P1, P2, and P3 yielded accuracies of 98.49%, 90.10%, and 89.04%, respectively, highlighting the importance of human intervention in refining LLM annotations.

Discussion and conclusion: Integrating the MHCLE model with the HLLIA framework offers an effective approach for predicting SDOMH factors from clinical notes, advancing NLP in OUD care. It highlights the importance of human oversight and sets a benchmark for future research.

目的:确定阿片类药物使用障碍(OUD)患者心理健康(SDOMH)的社会决定因素对于估计风险和实现早期干预至关重要。由于注释的复杂性,从非结构化的临床记录中提取此类数据具有挑战性,并且需要先进的自然语言处理(NLP)技术。我们提出了人在循环大语言模型交互注释(hlia)框架,结合多层分层临床-长前嵌入(MHCLE)算法,来注释和预测SDOMH变量。材料和方法:我们利用重症医疗信息集市(MIMIC-IV)数据集中的2636份带注释的出院摘要。通过使用大型语言模型(llm)进行改进的human-in-the-loop方法确保了高质量的注释。MHCLE算法对13个SDOMH变量进行多标签分类,并根据基线模型进行评估,包括RoBERTa、Bio_ClinicalBERT、ClinicalBERT和ClinicalBigBird。结果:MHCLE模型的准确率为96.29%,f1评分为95.41%,优于基线模型。训练测试策略P1、P2和P3的准确率分别为98.49%、90.10%和89.04%,突出了人类干预在精炼LLM注释中的重要性。讨论与结论:将MHCLE模型与hlia框架相结合,为从临床记录中预测SDOMH因素提供了有效的方法,促进了OUD护理中的NLP。它强调了人类监督的重要性,并为未来的研究设定了基准。
{"title":"Augmenting large language models to predict social determinants of mental health in opioid use disorder using patient clinical notes.","authors":"Madhavi Pagare, Deva Sai Kumar Bheesetti, Inyene Essien-Aleksi, Mohammad Arif Ul Alam","doi":"10.1093/jamiaopen/ooaf142","DOIUrl":"10.1093/jamiaopen/ooaf142","url":null,"abstract":"<p><strong>Objective: </strong>Identifying social determinants of mental health (SDOMH) in patients with opioid use disorder (OUD) is crucial for estimating risk and enabling early intervention. Extracting such data from unstructured clinical notes is challenging due to annotation complexity and requires advanced natural language processing (NLP) techniques. We propose the Human-in-the-Loop Large Language Model Interaction for Annotation (HLLIA) framework, combined with a Multilevel Hierarchical Clinical-Longformer Embedding (MHCLE) algorithm, to annotate and predict SDOMH variables.</p><p><strong>Materials and methods: </strong>We utilized 2636 annotated discharge summaries from the Medical Information Mart for Intensive Care (MIMIC-IV) dataset. High-quality annotations were ensured via a human-in-the-loop approach, refined using large language models (LLMs). The MHCLE algorithm performed multi-label classification of 13 SDOMH variables and was evaluated against baseline models, including RoBERTa, Bio_ClinicalBERT, ClinicalBERT, and ClinicalBigBird.</p><p><strong>Results: </strong>The MHCLE model achieved superior performance with 96.29% accuracy and a 95.41% F1score, surpassing baseline models. Training-testing policies P1, P2, and P3 yielded accuracies of 98.49%, 90.10%, and 89.04%, respectively, highlighting the importance of human intervention in refining LLM annotations.</p><p><strong>Discussion and conclusion: </strong>Integrating the MHCLE model with the HLLIA framework offers an effective approach for predicting SDOMH factors from clinical notes, advancing NLP in OUD care. It highlights the importance of human oversight and sets a benchmark for future research.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf142"},"PeriodicalIF":3.4,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12664681/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Span-based annotation framework for LLM-based clinical named entity recognition: development and validation using Korean emergency department notes. 基于法学硕士的临床命名实体识别的基于span的注释框架:使用韩国急诊科笔记的开发和验证。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-26 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf157
Eun Hye Jang, Javier Aguirre, Sangji Lee, Hyeyoon Moon, Won Chul Cha

Objective: This study aims to develop and validate of a span-based annotation framework for clinical named entity recognition (NER) using large language models (LLMs) based on Korean emergency department clinical notes.

Materials and methods: Two datasets with the same entity types but different annotation spans (word- vs phrase-level) were constructed, with the phrase-level dataset further was expanded into a doubled version. A Korean language-specific LLM was fine-tuned on each dataset, producing three variants that were compared with two baseline models, few-shot LLM and fine-tuned small language model (SLM). The final variant fine-tuned on the doubled phrase-level dataset was further evaluated against a human annotator.

Results: In all experimental settings, three variants outperformed the baselines by achieving the highest F1 scores across all metrics. The final variant achieved F1 scores exceeding 0.80 across all averaging strategies and evaluation metrics, including token-based, span-based exact, and span-based partial evaluations demonstrating its robustness applicable in a practical setting.

Discussion: While prompt engineering with few-shot is widely adopted for LLM-based clinical NER, our results proved that supervised fine-tuning (SFT) is consistently superior. The final variant outperformed the human annotator, emphasizing its potential as an automatic labeling tool.

Conclusion: This study introduced a novel span-based annotation framework for LLM-based clinical NER verified by three independent experiments. In multilingual and real-world clinical settings, LLMs have proven in handling complex entity spans that include word-level and phrase-level annotations, particularly for long and attribute-rich entities.

目的:本研究旨在利用基于韩国急诊科临床记录的大型语言模型(llm),开发并验证基于跨度的临床命名实体识别(NER)注释框架。材料和方法:构建了两个实体类型相同但标注跨度不同(词级和短语级)的数据集,并将短语级数据集进一步扩展为双版本。针对每个数据集对特定于韩语的LLM进行了微调,产生了三个变体,这些变体与两个基线模型(少量LLM和微调的小语言模型(SLM))进行了比较。在双短语级数据集上微调的最终变体针对人类注释器进行了进一步评估。结果:在所有实验设置中,三个变体通过在所有指标中获得最高的F1分数而优于基线。最终的变体在所有平均策略和评估指标(包括基于令牌的、基于跨度的精确评估和基于跨度的部分评估)中获得了超过0.80的F1分数,证明了它在实际环境中的稳健性。讨论:虽然基于法学硕士的临床NER广泛采用少量镜头的快速工程,但我们的研究结果证明,监督微调(SFT)始终是优越的。最终的变体优于人类注释器,强调了其作为自动标记工具的潜力。结论:本研究为基于llm的临床NER引入了一种新的基于span的注释框架,并通过三个独立实验验证。在多语言和现实世界的临床环境中,法学硕士已经证明可以处理复杂的实体跨度,包括单词级和短语级注释,特别是对于长且属性丰富的实体。
{"title":"Span-based annotation framework for LLM-based clinical named entity recognition: development and validation using Korean emergency department notes.","authors":"Eun Hye Jang, Javier Aguirre, Sangji Lee, Hyeyoon Moon, Won Chul Cha","doi":"10.1093/jamiaopen/ooaf157","DOIUrl":"10.1093/jamiaopen/ooaf157","url":null,"abstract":"<p><strong>Objective: </strong>This study aims to develop and validate of a span-based annotation framework for clinical named entity recognition (NER) using large language models (LLMs) based on Korean emergency department clinical notes.</p><p><strong>Materials and methods: </strong>Two datasets with the same entity types but different annotation spans (word- vs phrase-level) were constructed, with the phrase-level dataset further was expanded into a doubled version. A Korean language-specific LLM was fine-tuned on each dataset, producing three variants that were compared with two baseline models, few-shot LLM and fine-tuned small language model (SLM). The final variant fine-tuned on the doubled phrase-level dataset was further evaluated against a human annotator.</p><p><strong>Results: </strong>In all experimental settings, three variants outperformed the baselines by achieving the highest F1 scores across all metrics. The final variant achieved F1 scores exceeding 0.80 across all averaging strategies and evaluation metrics, including token-based, span-based exact, and span-based partial evaluations demonstrating its robustness applicable in a practical setting.</p><p><strong>Discussion: </strong>While prompt engineering with few-shot is widely adopted for LLM-based clinical NER, our results proved that supervised fine-tuning (SFT) is consistently superior. The final variant outperformed the human annotator, emphasizing its potential as an automatic labeling tool.</p><p><strong>Conclusion: </strong>This study introduced a novel span-based annotation framework for LLM-based clinical NER verified by three independent experiments. In multilingual and real-world clinical settings, LLMs have proven in handling complex entity spans that include word-level and phrase-level annotations, particularly for long and attribute-rich entities.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf157"},"PeriodicalIF":3.4,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657458/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial intelligence-generated draft replies to patient messages in pediatrics. 人工智能生成的儿科病人信息回复草稿。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-22 eCollection Date: 2025-12-01 DOI: 10.1093/jamiaopen/ooaf159
April S Liang, Shivam Vedak, Alex Dussaq, Dong-Han Yao, Joshua A Villarreal, Sijo Thomas, Nicholas Chen, Tanya Townsend, Natalie M Pageler, Keith Morse

Objectives: This study describes the utilization and experiences of artificial intelligence (AI)-generated draft responses to patient messages in pediatric ambulatory clinicians and contextualizes their experiences in relation to those of adult specialty clinicians.

Materials and methods: A prospective pilot was conducted from September 2023 to August 2024 in 2 pediatric clinics (General Pediatric and Adolescent Medicine) and 2 obstetric clinics (Reproductive Endocrinology and Infertility and General Obstetrics) within an academic health system in Northern California. Participants included physician, nurse, and medical assistant volunteers. The intervention involved a feature utilizing large language models embedded in the electronic health record to generate draft responses. Proportion of AI-generated draft used was collected, as were prepilot and follow-up surveys.

Results: A total of 61 clinicians (26 pediatric, 35 obstetric) enrolled, with 46 (75%) completing both surveys. Pediatric clinicians utilized 13.3% (95% CI, 12.3%-14.4%) of AI-generated drafts, and usage rates when responding to patients vs their proxies was similar (15% vs 12.9%, P = .24). Despite using AI-generated drafts significantly less than obstetric clinicians (18.3% [17.2%-19.5%], P < .0001), pediatric clinicians reported a significant reduction in perceived task load (NASA Task Load Index: 59.9-50.9, P = .04) and were more likely to recommend the tool (LTR: 7.0 vs 5.2, P = .04).

Discussion and conclusion: Pediatric clinicians used AI-generated drafts at a rate within previously reported ranges in adult specialties and experienced utility. These findings suggest this tool has potential for enhancing efficiency and reducing task load in pediatric care.

目的:本研究描述了人工智能(AI)生成的患者信息回复草案在儿科门诊临床医生中的应用和经验,并将他们的经验与成人专科临床医生的经验联系起来。材料和方法:前瞻性试点于2023年9月至2024年8月在北加州学术卫生系统内的2家儿科诊所(普通儿科和青少年医学)和2家产科诊所(生殖内分泌和不孕症和普通产科)进行。参与者包括医生、护士和医疗助理志愿者。干预措施包括利用嵌入在电子健康记录中的大型语言模型来生成回复草稿的功能。收集使用人工智能生成的草稿的比例,以及预试点和后续调查。结果:共有61名临床医生(26名儿科医生,35名产科医生)入组,其中46名(75%)完成了两项调查。儿科临床医生使用了13.3% (95% CI, 12.3%-14.4%)的人工智能生成的草稿,在回应患者和他们的代理时,使用率相似(15%对12.9%,P = .24)。尽管使用人工智能生成的草稿明显少于产科医生(18.3% [17.2%-19.5%]),P P =。04)并且更有可能推荐该工具(LTR: 7.0 vs 5.2, P = .04)。讨论和结论:儿科临床医生使用人工智能生成的草稿的比率在先前报道的成人专科和经验丰富的实用范围内。这些发现表明,该工具具有提高儿科护理效率和减少任务负荷的潜力。
{"title":"Artificial intelligence-generated draft replies to patient messages in pediatrics.","authors":"April S Liang, Shivam Vedak, Alex Dussaq, Dong-Han Yao, Joshua A Villarreal, Sijo Thomas, Nicholas Chen, Tanya Townsend, Natalie M Pageler, Keith Morse","doi":"10.1093/jamiaopen/ooaf159","DOIUrl":"10.1093/jamiaopen/ooaf159","url":null,"abstract":"<p><strong>Objectives: </strong>This study describes the utilization and experiences of artificial intelligence (AI)-generated draft responses to patient messages in pediatric ambulatory clinicians and contextualizes their experiences in relation to those of adult specialty clinicians.</p><p><strong>Materials and methods: </strong>A prospective pilot was conducted from September 2023 to August 2024 in 2 pediatric clinics (General Pediatric and Adolescent Medicine) and 2 obstetric clinics (Reproductive Endocrinology and Infertility and General Obstetrics) within an academic health system in Northern California. Participants included physician, nurse, and medical assistant volunteers. The intervention involved a feature utilizing large language models embedded in the electronic health record to generate draft responses. Proportion of AI-generated draft used was collected, as were prepilot and follow-up surveys.</p><p><strong>Results: </strong>A total of 61 clinicians (26 pediatric, 35 obstetric) enrolled, with 46 (75%) completing both surveys. Pediatric clinicians utilized 13.3% (95% CI, 12.3%-14.4%) of AI-generated drafts, and usage rates when responding to patients vs their proxies was similar (15% vs 12.9%, <i>P</i> = .24). Despite using AI-generated drafts significantly less than obstetric clinicians (18.3% [17.2%-19.5%], <i>P</i> < .0001), pediatric clinicians reported a significant reduction in perceived task load (NASA Task Load Index: 59.9-50.9, <i>P</i> = .04) and were more likely to recommend the tool (LTR: 7.0 vs 5.2, <i>P</i> = .04).</p><p><strong>Discussion and conclusion: </strong>Pediatric clinicians used AI-generated drafts at a rate within previously reported ranges in adult specialties and experienced utility. These findings suggest this tool has potential for enhancing efficiency and reducing task load in pediatric care.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf159"},"PeriodicalIF":3.4,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12643547/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145606516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JAMIA Open
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1