首页 > 最新文献

JMIR Medical Informatics最新文献

英文 中文
Development of Quality Indicators for the Correct Use of Electronic Medical Records in Primary Care: Modified Delphi Study. 初级保健中正确使用电子病历质量指标的发展:修正德尔菲研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-19 DOI: 10.2196/80057
Rico Paridaens, Steve Van den Bulck, Michel De Jonghe, Benjamin Fauquert, Liesbeth Meel, Willem Raat, Bert Vaes

Background: When used correctly, electronic medical records (EMRs) can support clinical decision-making, provide information for research, facilitate coordination of care, reduce medical errors, and generate patient health summaries. Studies have reported large differences in the quality of EMR data.

Objective: Our study aimed to develop an evidence-based set of electronically extractable quality indicators (QIs) approved by expert consensus to assess the good use of EMRs by general practitioners (GPs) from a medical perspective.

Methods: The RAND-modified Delphi method was used in this study. The TRIP and MEDLINE databases were searched, and a selection of recommendations was filtered using the specific, measurable, assignable, realistic, and time-bound principles. The panel comprised 12 GPs and 6 EMR developers. The selected recommendations were transformed into QIs as percentages.

Results: A combined list of 20 indicators and 30 recommendations was created from 9 guidelines and 4 review articles. After the consensus round, 20 (100%) indicators and 20 (67%) recommendations were approved by the panel. All 20 recommendations were transformed into QIs. Most (16, 40%) QIs evaluated the completeness and adequacy of the problem list.

Conclusions: This study provided a set of 40 EMR-extractable QIs for the correct use of EMRs in primary care. These QIs can be used to map the completeness of EMRs by setting up an audit and feedback system, and to develop specific (computer-based) training for GPs.

背景:如果使用得当,电子病历(emr)可以支持临床决策,为研究提供信息,促进护理协调,减少医疗差错,并生成患者健康摘要。研究报告了电子病历数据质量的巨大差异。目的:本研究旨在开发一套经专家共识批准的循证电子可提取质量指标(QIs),从医学角度评估全科医生(gp)对电子病历的良好使用。方法:采用rand修正的德尔菲法进行研究。检索了TRIP和MEDLINE数据库,并根据具体的、可测量的、可分配的、现实的和有时间限制的原则筛选了一系列建议。该小组由12名全科医生和6名电子病历开发人员组成。选定的建议以百分比形式转换为质量指数。结果:从9个指南和4篇综述文章中创建了一个包含20个指标和30个建议的综合清单。经过协商一致,专家组通过了20项(100%)指标和20项(67%)建议。所有20条建议都转化为QIs。大多数(16.40%)QIs评估了问题列表的完整性和充分性。结论:本研究为emr在初级保健中的正确使用提供了一套40个可提取的QIs。这些质量指标可用于通过建立审核和反馈系统来确定电子病历的完整性,并为全科医生制定具体的(以计算机为基础的)培训。
{"title":"Development of Quality Indicators for the Correct Use of Electronic Medical Records in Primary Care: Modified Delphi Study.","authors":"Rico Paridaens, Steve Van den Bulck, Michel De Jonghe, Benjamin Fauquert, Liesbeth Meel, Willem Raat, Bert Vaes","doi":"10.2196/80057","DOIUrl":"10.2196/80057","url":null,"abstract":"<p><strong>Background: </strong>When used correctly, electronic medical records (EMRs) can support clinical decision-making, provide information for research, facilitate coordination of care, reduce medical errors, and generate patient health summaries. Studies have reported large differences in the quality of EMR data.</p><p><strong>Objective: </strong>Our study aimed to develop an evidence-based set of electronically extractable quality indicators (QIs) approved by expert consensus to assess the good use of EMRs by general practitioners (GPs) from a medical perspective.</p><p><strong>Methods: </strong>The RAND-modified Delphi method was used in this study. The TRIP and MEDLINE databases were searched, and a selection of recommendations was filtered using the specific, measurable, assignable, realistic, and time-bound principles. The panel comprised 12 GPs and 6 EMR developers. The selected recommendations were transformed into QIs as percentages.</p><p><strong>Results: </strong>A combined list of 20 indicators and 30 recommendations was created from 9 guidelines and 4 review articles. After the consensus round, 20 (100%) indicators and 20 (67%) recommendations were approved by the panel. All 20 recommendations were transformed into QIs. Most (16, 40%) QIs evaluated the completeness and adequacy of the problem list.</p><p><strong>Conclusions: </strong>This study provided a set of 40 EMR-extractable QIs for the correct use of EMRs in primary care. These QIs can be used to map the completeness of EMRs by setting up an audit and feedback system, and to develop specific (computer-based) training for GPs.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e80057"},"PeriodicalIF":3.8,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12865340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of Venous Thromboembolism Risk Prediction Models Based on Whole Blood Gene Expression Profiling Using 20 Machine Learning Algorithms: Comprehensive Analysis Study. 利用20种机器学习算法建立基于全血基因表达谱的静脉血栓栓塞风险预测模型:综合分析研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-16 DOI: 10.2196/75565
Yedong Huang, Xiaoyun Chen, Guannan Bai, Yajun Zhao, Dapeng Kuang, Lin Zhang, Wei Lu

Background: There is a lack of venous thromboembolism (VTE) risk prediction models based on gene expression information.

Objective: This study aimed to construct a VTE prediction model based on whole blood gene expression profiling, by performing a comprehensive analysis of 20 machine learning (ML) algorithms.

Methods: Two transcriptome datasets containing patients with VTE and healthy controls were obtained by searching the Gene Expression Omnibus database and used as the training and validation sets, respectively. Feature selection for model construction was performed on the training set using the least absolute shrinkage and selection operator and random forest, followed by the selection of the intersection of the chosen features. Subsequently, recursive feature elimination was applied to further refine the selected features. The selected features underwent model construction using 20 ML algorithms. The performance of the models was evaluated using various methods such as receiver operating characteristic and confusion matrix. The validation set was used for external model validation.

Results: The final results demonstrated that all algorithm models, except for k-nearest neighbor, exhibited good performance in VTE prediction. External validation data indicated that 9 algorithm models had an area under the curve greater than 0.75. The confusion matrix analysis revealed that the algorithm models maintained high specificity in the external validation cohort.

Conclusions: This study used 20 ML algorithms to construct VTE prediction models based on whole blood gene expression information, with 9 of these models demonstrating good diagnostic performance in external validation cohorts. The above models, when used in conjunction with D-dimer, may provide more valuable references for VTE diagnosis.

背景:目前缺乏基于基因表达信息的静脉血栓栓塞(VTE)风险预测模型。目的:通过对20种机器学习(ML)算法的综合分析,构建基于全血基因表达谱的静脉血栓栓塞预测模型。方法:通过检索Gene Expression Omnibus数据库获取VTE患者和健康对照的转录组数据集,分别作为训练集和验证集。使用最小绝对收缩算子和随机森林对训练集进行特征选择,然后选择所选特征的交集进行模型构建。随后,采用递归特征消去法进一步细化所选特征。选择的特征使用20ml算法进行模型构建。利用接收机工作特性和混淆矩阵等方法对模型的性能进行了评价。验证集用于外部模型验证。结果:最终结果表明,除k近邻外,所有算法模型在VTE预测中表现良好。外部验证数据表明,有9个算法模型的曲线下面积大于0.75。混淆矩阵分析显示,算法模型在外部验证队列中保持高特异性。结论:本研究采用20 ML算法构建了基于全血基因表达信息的VTE预测模型,其中9个模型在外部验证队列中表现出较好的诊断性能。上述模型与d -二聚体结合使用,可为VTE诊断提供更有价值的参考。
{"title":"Development of Venous Thromboembolism Risk Prediction Models Based on Whole Blood Gene Expression Profiling Using 20 Machine Learning Algorithms: Comprehensive Analysis Study.","authors":"Yedong Huang, Xiaoyun Chen, Guannan Bai, Yajun Zhao, Dapeng Kuang, Lin Zhang, Wei Lu","doi":"10.2196/75565","DOIUrl":"10.2196/75565","url":null,"abstract":"<p><strong>Background: </strong>There is a lack of venous thromboembolism (VTE) risk prediction models based on gene expression information.</p><p><strong>Objective: </strong>This study aimed to construct a VTE prediction model based on whole blood gene expression profiling, by performing a comprehensive analysis of 20 machine learning (ML) algorithms.</p><p><strong>Methods: </strong>Two transcriptome datasets containing patients with VTE and healthy controls were obtained by searching the Gene Expression Omnibus database and used as the training and validation sets, respectively. Feature selection for model construction was performed on the training set using the least absolute shrinkage and selection operator and random forest, followed by the selection of the intersection of the chosen features. Subsequently, recursive feature elimination was applied to further refine the selected features. The selected features underwent model construction using 20 ML algorithms. The performance of the models was evaluated using various methods such as receiver operating characteristic and confusion matrix. The validation set was used for external model validation.</p><p><strong>Results: </strong>The final results demonstrated that all algorithm models, except for k-nearest neighbor, exhibited good performance in VTE prediction. External validation data indicated that 9 algorithm models had an area under the curve greater than 0.75. The confusion matrix analysis revealed that the algorithm models maintained high specificity in the external validation cohort.</p><p><strong>Conclusions: </strong>This study used 20 ML algorithms to construct VTE prediction models based on whole blood gene expression information, with 9 of these models demonstrating good diagnostic performance in external validation cohorts. The above models, when used in conjunction with D-dimer, may provide more valuable references for VTE diagnosis.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e75565"},"PeriodicalIF":3.8,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12810949/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mild Cognitive Impairment Detection System Based on Unstructured Spontaneous Speech: Longitudinal Dual-Modal Framework. 基于非结构化自发语音的轻度认知障碍检测系统:纵向双模态框架。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-15 DOI: 10.2196/80883
Yu-Shan Liao, Thiri Wai, Ting-Yun Liao, Ho-Ling Chang, Yu-Ling Chang, Li-Chen Fu

Background: In recent years, the incidence of cognitive diseases has also risen with the significant increase in population aging. Among these diseases, Alzheimer disease constitutes a substantial proportion, placing a high-cost burden on health care systems. To give early treatment and slow the progression of patient deterioration, it is crucial to diagnose mild cognitive impairment (MCI), a transitional stage.

Objective: In this study, we use autobiographical memory (AM) test speech data to establish a dual-modal longitudinal cognitive detection system for MCI. The AM test is a psychological assessment method that evaluates the cognitive status of subjects as they freely narrate important life experiences.

Methods: Identifying hidden disease-related information in unstructured, spontaneous speech is more difficult than in structured speech. To improve this process, we use both speech and text data, which provide more clues about a person's cognitive state. In addition, to track how cognition changes over time in spontaneous speech, we introduce an aging trajectory module. This module uses local and global alignment loss functions to better learn time-related features by aligning cognitive changes across different time points.

Results: In our experiments on the Chinese dataset, the longitudinal model incorporating the aging trajectory module achieved area under the receiver operating characteristic curve of 0.85 and 0.89 on 2 datasets, respectively, showing significant improvement over cross-sectional, single time point models. We also conducted ablation studies to verify the necessity of the proposed aging trajectory module. To confirm that the model not only applies to AM test data, we used part of the model to evaluate the performance on the ADReSSo dataset, a single time point semistructured data for validation, with results showing an accuracy exceeding 0.88.

Conclusions: This study presents a noninvasive and scalable approach for early MCI detection by leveraging AM speech data across multiple time points. Through dual-modal analysis and the introduction of an aging trajectory module, our system effectively captures cognitive decline trends over time. Experimental results demonstrate the method's robustness and generalizability, highlighting its potential for real-world, long-term cognitive monitoring.

背景:近年来,随着人口老龄化的显著加剧,认知疾病的发病率也有所上升。在这些疾病中,阿尔茨海默病占很大比例,给卫生保健系统带来了高昂的费用负担。为了给予早期治疗和减缓患者恶化的进展,诊断轻度认知障碍(MCI)是一个过渡性阶段是至关重要的。目的:利用自传体记忆(AM)测试语音数据,建立MCI双模态纵向认知检测系统。AM测试是一种心理评估方法,评估受试者在自由叙述重要生活经历时的认知状况。方法:在非结构化、自发语言中识别隐性疾病相关信息比在结构化语言中更困难。为了改进这一过程,我们同时使用语音和文本数据,它们提供了更多关于一个人的认知状态的线索。此外,为了跟踪认知随时间的变化,我们引入了一个老化轨迹模块。该模块使用局部和全局对齐损失函数,通过对齐不同时间点的认知变化,更好地学习与时间相关的特征。结果:在中国数据集的实验中,纳入老化轨迹模块的纵向模型在2个数据集上的受试者工作特征曲线下面积分别达到0.85和0.89,比横截面、单时间点模型有显著改善。我们还进行了烧蚀研究,以验证所提出的老化轨迹模块的必要性。为了证实该模型不仅适用于AM测试数据,我们使用部分模型在ADReSSo数据集(单时间点半结构化数据)上评估性能进行验证,结果显示准确率超过0.88。结论:本研究提出了一种非侵入性和可扩展的方法,通过利用AM语音数据跨多个时间点进行早期MCI检测。通过双模态分析和衰老轨迹模块的引入,我们的系统有效地捕捉了认知能力随时间的下降趋势。实验结果证明了该方法的鲁棒性和泛化性,突出了其在现实世界中长期认知监测的潜力。
{"title":"Mild Cognitive Impairment Detection System Based on Unstructured Spontaneous Speech: Longitudinal Dual-Modal Framework.","authors":"Yu-Shan Liao, Thiri Wai, Ting-Yun Liao, Ho-Ling Chang, Yu-Ling Chang, Li-Chen Fu","doi":"10.2196/80883","DOIUrl":"10.2196/80883","url":null,"abstract":"<p><strong>Background: </strong>In recent years, the incidence of cognitive diseases has also risen with the significant increase in population aging. Among these diseases, Alzheimer disease constitutes a substantial proportion, placing a high-cost burden on health care systems. To give early treatment and slow the progression of patient deterioration, it is crucial to diagnose mild cognitive impairment (MCI), a transitional stage.</p><p><strong>Objective: </strong>In this study, we use autobiographical memory (AM) test speech data to establish a dual-modal longitudinal cognitive detection system for MCI. The AM test is a psychological assessment method that evaluates the cognitive status of subjects as they freely narrate important life experiences.</p><p><strong>Methods: </strong>Identifying hidden disease-related information in unstructured, spontaneous speech is more difficult than in structured speech. To improve this process, we use both speech and text data, which provide more clues about a person's cognitive state. In addition, to track how cognition changes over time in spontaneous speech, we introduce an aging trajectory module. This module uses local and global alignment loss functions to better learn time-related features by aligning cognitive changes across different time points.</p><p><strong>Results: </strong>In our experiments on the Chinese dataset, the longitudinal model incorporating the aging trajectory module achieved area under the receiver operating characteristic curve of 0.85 and 0.89 on 2 datasets, respectively, showing significant improvement over cross-sectional, single time point models. We also conducted ablation studies to verify the necessity of the proposed aging trajectory module. To confirm that the model not only applies to AM test data, we used part of the model to evaluate the performance on the ADReSSo dataset, a single time point semistructured data for validation, with results showing an accuracy exceeding 0.88.</p><p><strong>Conclusions: </strong>This study presents a noninvasive and scalable approach for early MCI detection by leveraging AM speech data across multiple time points. Through dual-modal analysis and the introduction of an aging trajectory module, our system effectively captures cognitive decline trends over time. Experimental results demonstrate the method's robustness and generalizability, highlighting its potential for real-world, long-term cognitive monitoring.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e80883"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12807404/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145985738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prompting and Fine-Tuning Large Language Models for Parkinson Disease Diagnosis: Comparative Evaluation Study Using the PPMI Structured Dataset. 帕金森病诊断的提示和微调大语言模型:使用PPMI结构化数据集的比较评估研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-15 DOI: 10.2196/77561
Hyun-Ji Shin, Young Jin Jeong, Sungmin Jun, Do-Young Kang
<p><strong>Background: </strong>Parkinson disease (PD) presents diagnostic challenges due to its heterogeneous motor and nonmotor manifestations. Traditional machine learning (ML) approaches have been evaluated on structured clinical variables. However, the diagnostic utility of large language models (LLMs) using natural language representations of structured clinical data remains underexplored.</p><p><strong>Objective: </strong>This study aimed to evaluate the diagnostic classification performance of multiple LLMs using natural language prompts derived from structured clinical data and to compare their performance with traditional ML baselines.</p><p><strong>Methods: </strong>We reformatted structured clinical variables from the Parkinson's Progression Markers Initiative (PPMI) dataset into natural language prompts and used them as inputs for several LLMs. Variables with high multicollinearity were removed, and the top 10 features were selected using Shapley additive explanations (SHAP)-based feature ranking. LLM performance was examined across few-shot prompting, dual-output prompting that additionally generated post hoc explanatory text as an exploratory component, and supervised fine-tuning. Logistic regression (LR) and support vector machine (SVM) classifiers served as ML baselines. Model performance was evaluated using F<sub>1</sub>-scores on both the test set and a temporally independent validation set (temporal validation set) of limited size, and repeated output generation was carried out to assess stability.</p><p><strong>Results: </strong>On the test set of 122 participants, LR and SVM trained on the 10 SHAP-selected clinical variables each achieved a macro-averaged F<sub>1</sub>-score of 0.960 (accuracy 0.975). LLMs receiving natural language prompts derived from the same variables reached comparable performance, with the best few-shot configurations achieving macro-averaged F<sub>1</sub>-scores of 0.987 (accuracy 0.992). In the temporal validation set of 31 participants, LR maintained a macro-averaged F<sub>1</sub>-score of 0.903, whereas SVM showed substantial performance degradation. In contrast, multiple LLMs sustained high diagnostic performance, reaching macro-averaged F<sub>1</sub>-scores up to 0.968 and high recall for PD. Repeated output generation across LLM conditions produced generally stable predictions, with rare variability observed across runs. Under dual-output prompting, diagnostic performance showed a reduction relative to few-shot prompting while remaining generally stable. Supervised fine-tuning of lightweight models improved stability and enabled GPT-4o-mini to achieve a macro-averaged F<sub>1</sub>-score of 0.987 on the test set, with uniformly correct predictions observed in the small temporal validation set, which should be interpreted cautiously given the limited sample size and exploratory nature of the evaluation.</p><p><strong>Conclusions: </strong>This study provides an exploratory benchmark of how modern
背景:帕金森病(PD)由于其异质性的运动和非运动表现,给诊断带来了挑战。传统的机器学习(ML)方法已经在结构化临床变量上进行了评估。然而,使用结构化临床数据的自然语言表示的大型语言模型(llm)的诊断效用仍未得到充分探索。目的:本研究旨在评估基于结构化临床数据的自然语言提示对多种LLMs的诊断分类性能,并将其性能与传统ML基线进行比较。方法:我们将帕金森进展标记计划(PPMI)数据集中的结构化临床变量重新格式化为自然语言提示,并将其用作几个llm的输入。剔除多重共线性较高的变量,采用基于Shapley加性解释(SHAP)的特征排序方法选出前10个特征。通过少量提示、双输出提示和监督微调来检查LLM的性能,双输出提示额外生成临时解释性文本作为探索性组件,并监督微调。逻辑回归(LR)和支持向量机(SVM)分类器作为ML基线。使用f1分数在测试集和有限大小的时间独立验证集(时间验证集)上评估模型性能,并进行重复输出生成以评估稳定性。结果:在122名受试者的测试集上,对10个shap选择的临床变量进行训练的LR和SVM的宏观平均f1得分均为0.960(准确率0.975)。接收来自相同变量的自然语言提示的llm达到了相当的性能,最佳的少射配置实现了0.987的宏观平均f1分数(准确率0.992)。在31个参与者的时间验证集中,LR保持了0.903的宏观平均f1得分,而SVM表现出明显的性能下降。相比之下,多个llm保持了较高的诊断性能,宏观平均f1得分高达0.968,PD的召回率也很高。在LLM条件下重复生成输出,通常会产生稳定的预测,在运行期间观察到罕见的可变性。在双输出提示下,诊断性能相对于少量提示有所下降,但总体保持稳定。轻量级模型的监督微调提高了稳定性,使gpt - 40 -mini在测试集中实现了宏观平均f1得分0.987,在小时间验证集中观察到一致正确的预测,考虑到有限的样本量和评估的探索性,应该谨慎解释。结论:本研究为现代法学硕士如何以自然语言形式处理结构化临床变量提供了探索性基准。虽然有几个模型在测试和时间验证数据集上实现了与LR相当的诊断性能,但它们的输出对提示格式、模型选择和类别分布很敏感。重复输出代之间的偶然性反映了llm的随机性质,轻量级模型需要监督微调以实现稳定的泛化。这些发现强调了当前llm在处理表格临床信息方面的能力和局限性,并强调了谨慎应用和进一步研究的必要性。
{"title":"Prompting and Fine-Tuning Large Language Models for Parkinson Disease Diagnosis: Comparative Evaluation Study Using the PPMI Structured Dataset.","authors":"Hyun-Ji Shin, Young Jin Jeong, Sungmin Jun, Do-Young Kang","doi":"10.2196/77561","DOIUrl":"10.2196/77561","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Parkinson disease (PD) presents diagnostic challenges due to its heterogeneous motor and nonmotor manifestations. Traditional machine learning (ML) approaches have been evaluated on structured clinical variables. However, the diagnostic utility of large language models (LLMs) using natural language representations of structured clinical data remains underexplored.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aimed to evaluate the diagnostic classification performance of multiple LLMs using natural language prompts derived from structured clinical data and to compare their performance with traditional ML baselines.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We reformatted structured clinical variables from the Parkinson's Progression Markers Initiative (PPMI) dataset into natural language prompts and used them as inputs for several LLMs. Variables with high multicollinearity were removed, and the top 10 features were selected using Shapley additive explanations (SHAP)-based feature ranking. LLM performance was examined across few-shot prompting, dual-output prompting that additionally generated post hoc explanatory text as an exploratory component, and supervised fine-tuning. Logistic regression (LR) and support vector machine (SVM) classifiers served as ML baselines. Model performance was evaluated using F&lt;sub&gt;1&lt;/sub&gt;-scores on both the test set and a temporally independent validation set (temporal validation set) of limited size, and repeated output generation was carried out to assess stability.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;On the test set of 122 participants, LR and SVM trained on the 10 SHAP-selected clinical variables each achieved a macro-averaged F&lt;sub&gt;1&lt;/sub&gt;-score of 0.960 (accuracy 0.975). LLMs receiving natural language prompts derived from the same variables reached comparable performance, with the best few-shot configurations achieving macro-averaged F&lt;sub&gt;1&lt;/sub&gt;-scores of 0.987 (accuracy 0.992). In the temporal validation set of 31 participants, LR maintained a macro-averaged F&lt;sub&gt;1&lt;/sub&gt;-score of 0.903, whereas SVM showed substantial performance degradation. In contrast, multiple LLMs sustained high diagnostic performance, reaching macro-averaged F&lt;sub&gt;1&lt;/sub&gt;-scores up to 0.968 and high recall for PD. Repeated output generation across LLM conditions produced generally stable predictions, with rare variability observed across runs. Under dual-output prompting, diagnostic performance showed a reduction relative to few-shot prompting while remaining generally stable. Supervised fine-tuning of lightweight models improved stability and enabled GPT-4o-mini to achieve a macro-averaged F&lt;sub&gt;1&lt;/sub&gt;-score of 0.987 on the test set, with uniformly correct predictions observed in the small temporal validation set, which should be interpreted cautiously given the limited sample size and exploratory nature of the evaluation.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;This study provides an exploratory benchmark of how modern","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e77561"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12856398/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing a Suicide Risk Prediction Algorithm Using Electronic Health Record Data in Mental Health Care: Real-World Case Study. 在精神卫生保健中使用电子健康记录数据开发自杀风险预测算法:现实世界案例研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-14 DOI: 10.2196/74240
Linda Hummel, Karin C A G Lorenz-Artz, Joyce J P A Bierbooms, Inge M B Bongers
<p><strong>Background: </strong>Artificial intelligence (AI) offers potential solutions to address the challenges faced by a strained mental health care system, such as increasing demand for care, staff shortages, and pressured accessibility. While developing AI-based tools for clinical practice is technically feasible and has the potential to produce real-world impact, only a few are actually implemented into clinical practice. Implementation starts at the algorithm development phase, as this phase bridges theoretical innovation and practical application. The design and the way the AI tool is developed may either facilitate or hinder later implementation and use.</p><p><strong>Objective: </strong>This study aims to examine the development process of a suicide risk prediction algorithm using real-world electronic health record (EHR) data through a qualitative case study approach for clinical use in mental health care. It explores which challenges the development team encountered in creating the algorithm and how they addressed these challenges. This study identifies key considerations for the integration of technical and clinical perspectives in algorithms, facilitating the evolution of mental health organizations toward data-driven practice. The studied algorithm remains exploratory and has not yet been implemented in clinical practice.</p><p><strong>Methods: </strong>An exploratory, multimethod qualitative case study was conducted, using a hybrid approach with both inductive and deductive analysis. Data were collected through desk research, reflective team meetings, and iterative feedback sessions with the development team. Thematic analysis was used to identify development challenges and the team's responses. Based on these findings, key considerations for future algorithm development were derived.</p><p><strong>Results: </strong>Key challenges included defining, operationalizing, and measuring suicide incidents within EHRs due to issues such as missing data, underreporting, and differences between data sources. Predicting factors were identified by consulting clinical experts; however, psychosocial variables had to be constructed as they could not directly be extracted from EHR data. Risk of bias occurred when traditional suicide prevention questionnaires, unequally distributed across patients, were used as input. Analyzing unstructured data by natural language processing was challenging due to data noise, but ultimately enabled successful sentiment analysis, which provided dynamic, clinically relevant information for the algorithm. A complex model enhanced predictive accuracy but posed challenges regarding understandability, which was highly valued by clinicians.</p><p><strong>Conclusions: </strong>To advance mental health care as a data-driven field, several critical considerations must be addressed: ensuring robust data governance and quality, fostering cultural shifts in data documentation practices, establishing mechanisms for continuous
背景:人工智能(AI)为解决紧张的精神卫生保健系统所面临的挑战提供了潜在的解决方案,例如对护理的需求增加、人员短缺和可及性压力。虽然为临床实践开发基于人工智能的工具在技术上是可行的,并且有可能产生现实世界的影响,但实际上只有少数应用于临床实践。实现从算法开发阶段开始,因为这个阶段是理论创新和实际应用的桥梁。人工智能工具的设计和开发方式可能会促进或阻碍以后的实施和使用。目的:本研究旨在通过定性案例研究方法,探讨一种基于现实世界电子健康记录(EHR)数据的自杀风险预测算法的开发过程,以供临床精神卫生保健使用。它探讨了开发团队在创建算法时遇到的挑战,以及他们如何处理这些挑战。本研究确定了算法中技术和临床观点整合的关键考虑因素,促进了精神卫生组织向数据驱动实践的发展。所研究的算法仍然是探索性的,尚未在临床实践中实现。方法:采用归纳和演绎相结合的方法,进行了一项探索性的、多方法的定性案例研究。数据是通过桌面研究、反思团队会议和与开发团队的迭代反馈会议收集的。专题分析用于确定发展挑战和团队的应对措施。基于这些发现,推导了未来算法开发的关键考虑因素。结果:主要挑战包括在电子病历中定义、操作和测量自杀事件,这是由于数据缺失、少报和数据源之间的差异等问题造成的。通过咨询临床专家确定预测因素;然而,由于不能直接从电子病历数据中提取社会心理变量,因此必须构建社会心理变量。当使用传统的自杀预防问卷作为输入时,不均匀分布在患者之间,会产生偏倚风险。由于数据噪声,通过自然语言处理分析非结构化数据具有挑战性,但最终实现了成功的情感分析,为算法提供了动态的临床相关信息。一个复杂的模型提高了预测的准确性,但对可理解性提出了挑战,这受到临床医生的高度重视。结论:为了推动精神卫生保健成为一个数据驱动的领域,必须解决几个关键问题:确保稳健的数据治理和质量,促进数据文档实践中的文化转变,建立持续监测人工智能工具使用的机制,减轻偏见风险,平衡预测性能与可解释性,并保持临床医生的“循环”方法。未来的研究应优先考虑与人工智能在精神卫生保健实践中的发展、实施和日常使用相关的社会技术方面。
{"title":"Developing a Suicide Risk Prediction Algorithm Using Electronic Health Record Data in Mental Health Care: Real-World Case Study.","authors":"Linda Hummel, Karin C A G Lorenz-Artz, Joyce J P A Bierbooms, Inge M B Bongers","doi":"10.2196/74240","DOIUrl":"10.2196/74240","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Artificial intelligence (AI) offers potential solutions to address the challenges faced by a strained mental health care system, such as increasing demand for care, staff shortages, and pressured accessibility. While developing AI-based tools for clinical practice is technically feasible and has the potential to produce real-world impact, only a few are actually implemented into clinical practice. Implementation starts at the algorithm development phase, as this phase bridges theoretical innovation and practical application. The design and the way the AI tool is developed may either facilitate or hinder later implementation and use.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aims to examine the development process of a suicide risk prediction algorithm using real-world electronic health record (EHR) data through a qualitative case study approach for clinical use in mental health care. It explores which challenges the development team encountered in creating the algorithm and how they addressed these challenges. This study identifies key considerations for the integration of technical and clinical perspectives in algorithms, facilitating the evolution of mental health organizations toward data-driven practice. The studied algorithm remains exploratory and has not yet been implemented in clinical practice.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;An exploratory, multimethod qualitative case study was conducted, using a hybrid approach with both inductive and deductive analysis. Data were collected through desk research, reflective team meetings, and iterative feedback sessions with the development team. Thematic analysis was used to identify development challenges and the team's responses. Based on these findings, key considerations for future algorithm development were derived.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Key challenges included defining, operationalizing, and measuring suicide incidents within EHRs due to issues such as missing data, underreporting, and differences between data sources. Predicting factors were identified by consulting clinical experts; however, psychosocial variables had to be constructed as they could not directly be extracted from EHR data. Risk of bias occurred when traditional suicide prevention questionnaires, unequally distributed across patients, were used as input. Analyzing unstructured data by natural language processing was challenging due to data noise, but ultimately enabled successful sentiment analysis, which provided dynamic, clinically relevant information for the algorithm. A complex model enhanced predictive accuracy but posed challenges regarding understandability, which was highly valued by clinicians.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;To advance mental health care as a data-driven field, several critical considerations must be addressed: ensuring robust data governance and quality, fostering cultural shifts in data documentation practices, establishing mechanisms for continuous","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e74240"},"PeriodicalIF":3.8,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12803502/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145985721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nomograms Based on X-Ray Radiomics for Predicting Pain Progression in Knee Osteoarthritis Using Data From the Foundation for the National Institutes of Health: Development and Validation Study. 基于x射线放射组学的nomogram预测膝关节骨关节炎疼痛进展的方法,使用来自美国国立卫生研究院基金会的数据:开发和验证研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-14 DOI: 10.2196/78338
Yingwei Sun, Jing Liu, Chunbo Deng, Chengbao Peng, Shinong Pan, Xueyong Liu

Background: Knee osteoarthritis (KOA) is one of the most prevalent chronic musculoskeletal disorders among the older adult population. Screening populations at risk of rapid progression of osteoarthritis and implementing appropriate early intervention strategies is advantageous for the treatment and prognosis of affected patients.

Objective: This study aimed to construct and validate a nomogram model based on x-ray radiomics to effectively identify individuals experiencing progression of KOA pain.

Methods: The Foundation for the National Institutes of Health Biomarkers Consortium included a total of 600 participants who were classified as pain progressors (n=297, 49.5%) and non-pain progressors (n=303, 50.5%) according to an increase in the Western Ontario and McMaster Universities Osteoarthritis Index pain score of ≥9 points (on a scale from 0 to 100) during the follow-up period of 24 to 48 months. X-rays that lacked defined spacing in the DICOM image were excluded. Fully automatic selection of subchondral bone regions on the inner and outer edges of the tibia and femur as regions of interest and extraction of radiomics features for different combinations of regions of interest were conducted. Least absolute shrinkage and selection operator regression was used to select features and generate a radiomics score using Shapley additive explanations for interpretability. The radiomics score, along with clinical indicators, was incorporated into nomograms using a multivariable logistic regression model. The subgroup analysis focused solely on the progression of pain and cases with no progression at all. The receiver operating characteristic curve, along with calibration and decision curves, was used to assess the discriminative performance.

Results: A total of 450 participants were included in the study. Shapley additive explanations analysis identified Wavelet-HH_gldm_HighGrayLevelEmphasis as the primary radiomics feature. Nomogram 1 and nomogram 2 for predicting KOA pain progression achieved area under the curve values of 0.766 and 0.753, respectively, with mean absolute errors of 0.012 and 0.008, respectively, in the calibration curves. Decision curve analysis showed a positive net benefit across a range of threshold probabilities. In subgroup analyses, nomogram 3 and nomogram 4 yielded areas under the curve of 0.795 and 0.740, respectively.

Conclusions: The nomograms based on x-ray radiomics demonstrated excellent predictive capability and accuracy in forecasting the progression of KOA pain.

背景:膝骨关节炎(KOA)是老年人中最常见的慢性肌肉骨骼疾病之一。筛查有骨关节炎快速进展风险的人群,并实施适当的早期干预策略,有利于受影响患者的治疗和预后。目的:本研究旨在建立并验证基于x线放射组学的nomogram模型,以有效识别KOA疼痛进展的个体。方法:美国国立卫生研究院生物标志物联盟基金会共纳入600名参与者,根据西安大略省和麦克马斯特大学骨关节炎指数疼痛评分≥9分(从0到100分)的增加,他们被分为疼痛进展者(n=297, 49.5%)和非疼痛进展者(n=303, 50.5%),随访时间为24至48个月。排除DICOM图像中缺乏确定间距的x射线。全自动选择胫骨和股骨内外边缘的软骨下骨区域作为感兴趣区域,并提取不同感兴趣区域组合的放射组学特征。最小绝对收缩和选择算子回归用于选择特征,并使用Shapley加法解释可解释性生成放射组学评分。放射组学评分以及临床指标使用多变量logistic回归模型纳入nomogram。亚组分析仅关注疼痛的进展和完全没有进展的病例。采用受试者工作特征曲线、校准曲线和决策曲线来评估鉴别性能。结果:共有450名参与者被纳入研究。Shapley加性解释分析确定了Wavelet-HH_gldm_HighGrayLevelEmphasis为放射组学的主要特征。预测KOA疼痛进展的Nomogram 1和Nomogram 2的曲线下面积分别为0.766和0.753,校准曲线的平均绝对误差分别为0.012和0.008。决策曲线分析显示,在一系列阈值概率范围内,净收益为正。在亚群分析中,图3和图4的曲线下面积分别为0.795和0.740。结论:基于x线放射组学的形态图在预测KOA疼痛进展方面具有良好的预测能力和准确性。
{"title":"Nomograms Based on X-Ray Radiomics for Predicting Pain Progression in Knee Osteoarthritis Using Data From the Foundation for the National Institutes of Health: Development and Validation Study.","authors":"Yingwei Sun, Jing Liu, Chunbo Deng, Chengbao Peng, Shinong Pan, Xueyong Liu","doi":"10.2196/78338","DOIUrl":"10.2196/78338","url":null,"abstract":"<p><strong>Background: </strong>Knee osteoarthritis (KOA) is one of the most prevalent chronic musculoskeletal disorders among the older adult population. Screening populations at risk of rapid progression of osteoarthritis and implementing appropriate early intervention strategies is advantageous for the treatment and prognosis of affected patients.</p><p><strong>Objective: </strong>This study aimed to construct and validate a nomogram model based on x-ray radiomics to effectively identify individuals experiencing progression of KOA pain.</p><p><strong>Methods: </strong>The Foundation for the National Institutes of Health Biomarkers Consortium included a total of 600 participants who were classified as pain progressors (n=297, 49.5%) and non-pain progressors (n=303, 50.5%) according to an increase in the Western Ontario and McMaster Universities Osteoarthritis Index pain score of ≥9 points (on a scale from 0 to 100) during the follow-up period of 24 to 48 months. X-rays that lacked defined spacing in the DICOM image were excluded. Fully automatic selection of subchondral bone regions on the inner and outer edges of the tibia and femur as regions of interest and extraction of radiomics features for different combinations of regions of interest were conducted. Least absolute shrinkage and selection operator regression was used to select features and generate a radiomics score using Shapley additive explanations for interpretability. The radiomics score, along with clinical indicators, was incorporated into nomograms using a multivariable logistic regression model. The subgroup analysis focused solely on the progression of pain and cases with no progression at all. The receiver operating characteristic curve, along with calibration and decision curves, was used to assess the discriminative performance.</p><p><strong>Results: </strong>A total of 450 participants were included in the study. Shapley additive explanations analysis identified Wavelet-HH_gldm_HighGrayLevelEmphasis as the primary radiomics feature. Nomogram 1 and nomogram 2 for predicting KOA pain progression achieved area under the curve values of 0.766 and 0.753, respectively, with mean absolute errors of 0.012 and 0.008, respectively, in the calibration curves. Decision curve analysis showed a positive net benefit across a range of threshold probabilities. In subgroup analyses, nomogram 3 and nomogram 4 yielded areas under the curve of 0.795 and 0.740, respectively.</p><p><strong>Conclusions: </strong>The nomograms based on x-ray radiomics demonstrated excellent predictive capability and accuracy in forecasting the progression of KOA pain.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e78338"},"PeriodicalIF":3.8,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12853086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145985808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large Language Models for Psychiatric Diagnosis Based on Multicenter Real-World Clinical Records: Comparative Study. 基于多中心真实世界临床记录的精神病诊断大语言模型:一项比较研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-13 DOI: 10.2196/77699
Maoqian Sun, Jia Yu, Zhuhong Long, Yun Yang, Tao Xiao, Jiaquan Liang, Jun Feng, Huaili Deng, Guoping Huang

Background: Psychiatric disorders are diagnostically challenging and often rely on subjective clinical judgment, particularly in resource-limited settings. Large language models (LLMs) have demonstrated potential in supporting psychiatric diagnosis; however, robust evidence from large-scale, real-world clinical data remains limited.

Objective: This study aimed to evaluate and compare the diagnostic performance of multiple LLMs for psychiatric disorders using multicenter real-world electronic health records (EHRs).

Methods: We retrospectively analyzed 9923 inpatient EHRs collected from 6 psychiatric centers across China, encompassing all ICD-10 (International Statistical Classification of Diseases, Tenth Revision) psychiatric categories. In total, 3 LLMs-GPT-4.0 (OpenAI), GPT-3.5 (OpenAI), and GLM-4-Plus (Zhipu AI)-were evaluated against physician-confirmed discharge diagnoses. Diagnostic performance was assessed using strict accuracy criteria and lenient classification metrics, with subgroup analyses conducted across diagnostic categories and age groups.

Results: GPT-4.0 achieved the highest overall strict diagnostic accuracy (71.7%) and the highest weighted F1-score under lenient evaluation (0.881), particularly for high-prevalence disorders, such as mood disorders and schizophrenia spectrum disorders. Diagnostic performance varied across age groups, with the highest accuracy observed in older adult patients (up to 79.5%) and lower accuracy in adolescents. Across centers, model performance remained stable, with no significant intercenter differences.

Conclusions: LLMs-especially GPT-4.0-demonstrate promising capability in supporting psychiatric diagnosis using real-world EHRs. However, diagnostic performance varies by age group and disorder category. LLMs should be regarded as assistive tools rather than replacements for clinical judgment, and further validation is needed before routine clinical implementation.

背景:精神疾病的诊断具有挑战性,往往依赖于主观的临床判断,特别是在资源有限的情况下。大型语言模型(LLMs)在支持精神病诊断方面已经显示出潜力;然而,来自大规模真实临床数据的有力证据仍然有限。目的:本研究旨在评估和比较使用多中心真实世界电子健康记录的多种大语言模型对精神疾病的诊断性能。方法:我们回顾性分析了从中国6个精神病学中心收集的9923例住院患者电子健康记录,包括所有ICD-10精神病学类别。三个LLMs-GPT-4.0, GPT-3.5和glm -4- plus根据医生确认的出院诊断进行评估。使用严格的准确性标准和宽松的分类指标评估诊断性能,并在诊断类别和年龄组之间进行亚组分析。结果:GPT-4.0获得了最高的总体严格诊断准确率(71.7%)和最高的宽松评估加权F1评分(0.881),特别是对于高患病率的疾病,如情绪障碍和精神分裂症谱系障碍。诊断表现因年龄组而异,在老年患者中观察到的准确率最高(高达79.5%),而在青少年中准确率较低。在不同的中心,模型的表现保持稳定,中心之间没有显著的差异。结论:大型语言模型——尤其是gpt -4.0——在使用真实世界的电子健康记录支持精神病诊断方面表现出了很好的能力。然而,诊断表现因年龄组和障碍类别而异。法学硕士应被视为辅助工具,而不是临床判断的替代品,在常规临床应用之前需要进一步验证。临床试验:
{"title":"Large Language Models for Psychiatric Diagnosis Based on Multicenter Real-World Clinical Records: Comparative Study.","authors":"Maoqian Sun, Jia Yu, Zhuhong Long, Yun Yang, Tao Xiao, Jiaquan Liang, Jun Feng, Huaili Deng, Guoping Huang","doi":"10.2196/77699","DOIUrl":"10.2196/77699","url":null,"abstract":"<p><strong>Background: </strong>Psychiatric disorders are diagnostically challenging and often rely on subjective clinical judgment, particularly in resource-limited settings. Large language models (LLMs) have demonstrated potential in supporting psychiatric diagnosis; however, robust evidence from large-scale, real-world clinical data remains limited.</p><p><strong>Objective: </strong>This study aimed to evaluate and compare the diagnostic performance of multiple LLMs for psychiatric disorders using multicenter real-world electronic health records (EHRs).</p><p><strong>Methods: </strong>We retrospectively analyzed 9923 inpatient EHRs collected from 6 psychiatric centers across China, encompassing all ICD-10 (International Statistical Classification of Diseases, Tenth Revision) psychiatric categories. In total, 3 LLMs-GPT-4.0 (OpenAI), GPT-3.5 (OpenAI), and GLM-4-Plus (Zhipu AI)-were evaluated against physician-confirmed discharge diagnoses. Diagnostic performance was assessed using strict accuracy criteria and lenient classification metrics, with subgroup analyses conducted across diagnostic categories and age groups.</p><p><strong>Results: </strong>GPT-4.0 achieved the highest overall strict diagnostic accuracy (71.7%) and the highest weighted F1-score under lenient evaluation (0.881), particularly for high-prevalence disorders, such as mood disorders and schizophrenia spectrum disorders. Diagnostic performance varied across age groups, with the highest accuracy observed in older adult patients (up to 79.5%) and lower accuracy in adolescents. Across centers, model performance remained stable, with no significant intercenter differences.</p><p><strong>Conclusions: </strong>LLMs-especially GPT-4.0-demonstrate promising capability in supporting psychiatric diagnosis using real-world EHRs. However, diagnostic performance varies by age group and disorder category. LLMs should be regarded as assistive tools rather than replacements for clinical judgment, and further validation is needed before routine clinical implementation.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":"e77699"},"PeriodicalIF":3.8,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12848494/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neutrophil Percentage-to-Albumin Ratio as a Novel Prognostic Biomarker in Adult Diffuse Gliomas: Retrospective Study Integrating 3 Machine Learning Models and Cox Regression. 中性粒细胞百分比-白蛋白比率作为成人弥漫性胶质瘤的一种新的预后生物标志物:整合3种机器学习模型和Cox回归的回顾性研究
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-13 DOI: 10.2196/79945
Congcong Zhu, Jiyang An, Lili Zhou

Background: Adult-type diffuse glioma (ADG) is the most common primary malignant tumor of the central nervous system. Its highly invasive nature, marked heterogeneity, and resistance to therapy contribute to a high risk of recurrence and poor prognosis. At present, the lack of reliable prognostic tools poses a significant barrier to the development of individualized treatment strategies.

Objective: This study aimed to develop an effective prognostic model for ADG by integrating multiple machine learning algorithms, in order to enhance the precision of individualized clinical decision-making.

Methods: In this retrospective study, 160 newly diagnosed patients with ADG who underwent surgical resection and histopathological confirmation at our institution between June 2019 and September 2021 were included. A total of 32 variables, including clinical characteristics, molecular biomarkers, and preoperative hematological indicators, were collected. Overall survival (OS) and progression-free survival (PFS) were defined as the study endpoints. Feature selection was performed using least absolute shrinkage and selection operator regression, extreme gradient boosting, and random forest algorithms. Kaplan-Meier survival curves and log-rank tests were used for survival analysis. Multivariate Cox proportional hazards models were constructed to identify independent prognostic factors, and nomograms were developed accordingly. The model's discriminative ability, calibration, and clinical utility were evaluated using the concordance index, area under the receiver operating characteristic curve (area under the curve), calibration plots, and Kaplan-Meier analysis.

Results: Age, neutrophil percentage-to-albumin ratio (NPAR), and platelet-to-mean platelet volume ratio were identified as independent prognostic factors for OS, while age and NPAR were independent predictors for PFS (all P<.001). The prognostic models based on these variables demonstrated good predictive performance, with concordance index values of 0.731 and 0.763 for the training and validation cohorts in the OS model, respectively. The PFS model also showed robust performance. Area under the curve values and calibration curves further supported the models' accuracy and stability. Risk stratification analysis revealed clear survival differences between risk groups (all P<.05), indicating strong clinical applicability.

Conclusions: This study is the first to identify preoperative NPAR as a significant prognostic biomarker for ADG using machine learning approaches. The prognostic model incorporating NPAR, platelet-to-mean platelet volume ratio, and age demonstrated favorable predictive performance, offering a novel perspective for accurate risk stratification and personalized treatment in patients with ADG.

背景:成人型弥漫性胶质瘤(ADG)是中枢神经系统最常见的原发性恶性肿瘤。其高度侵袭性、明显的异质性和对治疗的抵抗导致复发风险高,预后差。目前,缺乏可靠的预后工具对个体化治疗策略的发展构成了重大障碍。目的:本研究旨在通过整合多种机器学习算法,建立有效的ADG预后模型,以提高个体化临床决策的准确性。方法:在这项回顾性研究中,纳入了2019年6月至2021年9月在我院接受手术切除和组织病理学证实的160例新诊断的ADG患者。共收集临床特征、分子生物标志物、术前血液学指标等32项变量。总生存期(OS)和无进展生存期(PFS)被定义为研究终点。使用最小绝对收缩和选择算子回归、极端梯度增强和随机森林算法进行特征选择。Kaplan-Meier生存曲线和log-rank检验用于生存分析。构建多变量Cox比例风险模型以确定独立的预后因素,并绘制相应的nomogram。采用一致性指数、受试者工作特征曲线下面积(曲线下面积)、校准图和Kaplan-Meier分析来评估模型的判别能力、校准和临床效用。结果:年龄、中性粒细胞百分比-白蛋白比(NPAR)和血小板-平均血小板体积比被确定为OS的独立预后因素,而年龄和NPAR是PFS的独立预测因素。结论:本研究首次使用机器学习方法确定术前NPAR是ADG的重要预后生物标志物。结合NPAR、血小板与平均血小板体积比和年龄的预后模型显示出良好的预测性能,为ADG患者的准确风险分层和个性化治疗提供了新的视角。
{"title":"Neutrophil Percentage-to-Albumin Ratio as a Novel Prognostic Biomarker in Adult Diffuse Gliomas: Retrospective Study Integrating 3 Machine Learning Models and Cox Regression.","authors":"Congcong Zhu, Jiyang An, Lili Zhou","doi":"10.2196/79945","DOIUrl":"10.2196/79945","url":null,"abstract":"<p><strong>Background: </strong>Adult-type diffuse glioma (ADG) is the most common primary malignant tumor of the central nervous system. Its highly invasive nature, marked heterogeneity, and resistance to therapy contribute to a high risk of recurrence and poor prognosis. At present, the lack of reliable prognostic tools poses a significant barrier to the development of individualized treatment strategies.</p><p><strong>Objective: </strong>This study aimed to develop an effective prognostic model for ADG by integrating multiple machine learning algorithms, in order to enhance the precision of individualized clinical decision-making.</p><p><strong>Methods: </strong>In this retrospective study, 160 newly diagnosed patients with ADG who underwent surgical resection and histopathological confirmation at our institution between June 2019 and September 2021 were included. A total of 32 variables, including clinical characteristics, molecular biomarkers, and preoperative hematological indicators, were collected. Overall survival (OS) and progression-free survival (PFS) were defined as the study endpoints. Feature selection was performed using least absolute shrinkage and selection operator regression, extreme gradient boosting, and random forest algorithms. Kaplan-Meier survival curves and log-rank tests were used for survival analysis. Multivariate Cox proportional hazards models were constructed to identify independent prognostic factors, and nomograms were developed accordingly. The model's discriminative ability, calibration, and clinical utility were evaluated using the concordance index, area under the receiver operating characteristic curve (area under the curve), calibration plots, and Kaplan-Meier analysis.</p><p><strong>Results: </strong>Age, neutrophil percentage-to-albumin ratio (NPAR), and platelet-to-mean platelet volume ratio were identified as independent prognostic factors for OS, while age and NPAR were independent predictors for PFS (all P<.001). The prognostic models based on these variables demonstrated good predictive performance, with concordance index values of 0.731 and 0.763 for the training and validation cohorts in the OS model, respectively. The PFS model also showed robust performance. Area under the curve values and calibration curves further supported the models' accuracy and stability. Risk stratification analysis revealed clear survival differences between risk groups (all P<.05), indicating strong clinical applicability.</p><p><strong>Conclusions: </strong>This study is the first to identify preoperative NPAR as a significant prognostic biomarker for ADG using machine learning approaches. The prognostic model incorporating NPAR, platelet-to-mean platelet volume ratio, and age demonstrated favorable predictive performance, offering a novel perspective for accurate risk stratification and personalized treatment in patients with ADG.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e79945"},"PeriodicalIF":3.8,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12848496/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145968133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Factors Associated With the Stalled Implementation of a Ground-Up Electronic Health Record System in South Africa: Qualitative Insights From the E-Tick Case Study Using the Consolidated Framework for Implementation Research (CFIR). 探索与南非电子健康记录系统实施停滞相关的因素:使用实施研究综合框架(CFIR)的E-Tick案例研究的定性见解。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-12 DOI: 10.2196/73831
Campion Zharima, Frances Griffiths, Jane Goudge

Background: Electronic health records (EHRs) have the potential to improve service delivery through record keeping and monitoring health outcomes. As countries move toward universal health coverage, digital health tools such as EHRs are essential for achieving this goal. However, EHR implementation in middle-income countries like South Africa faces obstacles.

Objective: This study explores the reasons behind a stalled implementation of the electronic tick register (E-tick) system (an electronic version of a paper primary health care register to record services provided), using the Consolidated Framework for Implementation Research.

Methods: Using a qualitative design, in-depth interviews were conducted with 38 participants to explore their perceptions and experiences, and the factors surrounding the success and stalling of E-ticks. Participants included managers, stakeholders, implementers, and end users from the 3 implementation clinics. Data was collected using semistructured interview guides. The Thematic and Consolidated Framework for Implementation Research framework analysis (innovation, inner setting, individual characteristics, implementation process, and outer setting) was applied.

Results: The E-tick system was designed to improve data quality in paper health registers, addressing inaccuracies in reporting to district and provincial health departments (Innovation domain). Implementers iteratively developed the system through user input from managers and clinicians, and stakeholder engagement of software developers, funders, health managers, and decision-makers from the provincial health department (individual characteristics). Although the system was initially well adopted by end users, it stalled primarily due to outer setting factors, which included a change of developers, funding cuts, and limited support at the provincial health department level due to capacity gaps, political appointments, and mistrust stemming from corruption and abuse of the tender system. Moreover, resistance to leveraging lessons from locally developed small-scale systems further constrained institutional support for the E-tick.

Conclusions: Although successful implementation of EHRs can be facilitated by strong user engagement and co-design, outer setting factors such as governance, funding, and policy alignment can pose significant threats to sustainability. This underscores the importance of effective synergy between top-down and bottom-up processes for successful implementation.

背景:电子健康记录(EHRs)具有通过记录保存和监测健康结果来改善服务提供的潜力。随着各国向全民健康覆盖迈进,电子病历等数字卫生工具对于实现这一目标至关重要。然而,电子病历在南非等中等收入国家的实施面临障碍。目的:本研究利用实施研究的综合框架,探讨电子签到登记(E-tick)系统(用于记录所提供服务的纸质初级卫生保健登记簿的电子版)实施停滞背后的原因。方法:采用定性设计,对38名参与者进行深度访谈,探讨他们的看法和经验,以及影响电子蜱成功和失速的因素。参与者包括来自3个实现诊所的管理人员、涉众、实现者和最终用户。使用半结构化访谈指南收集数据。应用实施研究的专题和综合框架(创新、内部设置、个体特征、实施过程和外部设置)框架分析。结果:E-tick系统旨在提高纸质卫生登记的数据质量,解决向区和省卫生部门报告的不准确问题(创新领域)。实现者通过管理人员和临床医生的用户输入,以及软件开发人员、资助者、卫生管理人员和省级卫生部门决策者的利益相关者参与(个人特征),迭代地开发系统。虽然该系统最初被最终用户很好地采用,但它主要由于外部环境因素而停滞不前,这些因素包括开发商的变更、资金削减、省级卫生部门由于能力差距、政治任命和腐败和滥用招标制度而产生的不信任而提供的有限支持。此外,抵制利用当地发展的小规模系统的经验教训进一步限制了对电子蜱虫的机构支持。结论:尽管强大的用户参与和协同设计可以促进电子病历的成功实施,但治理、资金和政策一致性等外部环境因素可能对可持续性构成重大威胁。这强调了自顶向下和自底向上过程之间有效协同作用对成功执行的重要性。
{"title":"Exploring Factors Associated With the Stalled Implementation of a Ground-Up Electronic Health Record System in South Africa: Qualitative Insights From the E-Tick Case Study Using the Consolidated Framework for Implementation Research (CFIR).","authors":"Campion Zharima, Frances Griffiths, Jane Goudge","doi":"10.2196/73831","DOIUrl":"10.2196/73831","url":null,"abstract":"<p><strong>Background: </strong>Electronic health records (EHRs) have the potential to improve service delivery through record keeping and monitoring health outcomes. As countries move toward universal health coverage, digital health tools such as EHRs are essential for achieving this goal. However, EHR implementation in middle-income countries like South Africa faces obstacles.</p><p><strong>Objective: </strong>This study explores the reasons behind a stalled implementation of the electronic tick register (E-tick) system (an electronic version of a paper primary health care register to record services provided), using the Consolidated Framework for Implementation Research.</p><p><strong>Methods: </strong>Using a qualitative design, in-depth interviews were conducted with 38 participants to explore their perceptions and experiences, and the factors surrounding the success and stalling of E-ticks. Participants included managers, stakeholders, implementers, and end users from the 3 implementation clinics. Data was collected using semistructured interview guides. The Thematic and Consolidated Framework for Implementation Research framework analysis (innovation, inner setting, individual characteristics, implementation process, and outer setting) was applied.</p><p><strong>Results: </strong>The E-tick system was designed to improve data quality in paper health registers, addressing inaccuracies in reporting to district and provincial health departments (Innovation domain). Implementers iteratively developed the system through user input from managers and clinicians, and stakeholder engagement of software developers, funders, health managers, and decision-makers from the provincial health department (individual characteristics). Although the system was initially well adopted by end users, it stalled primarily due to outer setting factors, which included a change of developers, funding cuts, and limited support at the provincial health department level due to capacity gaps, political appointments, and mistrust stemming from corruption and abuse of the tender system. Moreover, resistance to leveraging lessons from locally developed small-scale systems further constrained institutional support for the E-tick.</p><p><strong>Conclusions: </strong>Although successful implementation of EHRs can be facilitated by strong user engagement and co-design, outer setting factors such as governance, funding, and policy alignment can pose significant threats to sustainability. This underscores the importance of effective synergy between top-down and bottom-up processes for successful implementation.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e73831"},"PeriodicalIF":3.8,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795486/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ethical Imperatives for Retrieval-Augmented Generation in Clinical Nursing: Viewpoint on Responsible AI Use. 临床护理中检索增强生成的伦理责任:对人工智能负责任使用的看法。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-09 DOI: 10.2196/79922
Xinyi Tu, Chenghao Shi, Peilin Qian, Lizhu Wang

Unlabelled: Retrieval-augmented generation (RAG) systems have emerged as a powerful technique to enhance the capabilities of large language models by enabling them to access external, up-to-date knowledge in real time, and RAG systems are being increasingly adopted by researchers in the medical field. In this viewpoint article, we explore the ethical imperatives for implementing RAG systems in clinical nursing environments, with particular attention to how these technologies affect patient care quality and safety. The purpose of this paper is to examine the ethical risks introduced by RAG-enhanced large language models in clinical nursing and to propose strategic guidelines for their responsible implementation. Key considerations include ensuring accuracy, fairness, transparency, and accountability, as well as maintaining essential human oversight, as discussed through a structured analysis. We argue that robust data governance, explainable artificial intelligence (AI) techniques, and continuous monitoring are critical components of a responsible RAG implementation strategy. Ultimately, realizing the benefits of RAG while mitigating ethical concerns requires sustained collaboration among health care professionals, AI developers, and policymakers, fostering a future where AI supports patient safety, reduces disparities, and improves the quality of nursing care.

未标记:检索增强生成(RAG)系统已经成为一种强大的技术,可以通过使大型语言模型能够实时访问外部最新知识来增强其能力,并且RAG系统正越来越多地被医学领域的研究人员采用。在这篇观点文章中,我们探讨了在临床护理环境中实施RAG系统的道德要求,特别关注这些技术如何影响患者护理质量和安全。本文的目的是研究由rag增强的大型语言模型在临床护理中引入的伦理风险,并提出负责任实施的战略指导方针。关键的考虑因素包括确保准确性、公平性、透明度和可问责性,以及维护必要的人类监督,如通过结构化分析所讨论的那样。我们认为,稳健的数据治理、可解释的人工智能(AI)技术和持续监控是负责任的RAG实施策略的关键组成部分。最终,在减轻伦理问题的同时实现RAG的好处需要卫生保健专业人员、人工智能开发人员和政策制定者之间的持续合作,促进人工智能支持患者安全、减少差异并提高护理质量的未来。
{"title":"Ethical Imperatives for Retrieval-Augmented Generation in Clinical Nursing: Viewpoint on Responsible AI Use.","authors":"Xinyi Tu, Chenghao Shi, Peilin Qian, Lizhu Wang","doi":"10.2196/79922","DOIUrl":"10.2196/79922","url":null,"abstract":"<p><strong>Unlabelled: </strong>Retrieval-augmented generation (RAG) systems have emerged as a powerful technique to enhance the capabilities of large language models by enabling them to access external, up-to-date knowledge in real time, and RAG systems are being increasingly adopted by researchers in the medical field. In this viewpoint article, we explore the ethical imperatives for implementing RAG systems in clinical nursing environments, with particular attention to how these technologies affect patient care quality and safety. The purpose of this paper is to examine the ethical risks introduced by RAG-enhanced large language models in clinical nursing and to propose strategic guidelines for their responsible implementation. Key considerations include ensuring accuracy, fairness, transparency, and accountability, as well as maintaining essential human oversight, as discussed through a structured analysis. We argue that robust data governance, explainable artificial intelligence (AI) techniques, and continuous monitoring are critical components of a responsible RAG implementation strategy. Ultimately, realizing the benefits of RAG while mitigating ethical concerns requires sustained collaboration among health care professionals, AI developers, and policymakers, fostering a future where AI supports patient safety, reduces disparities, and improves the quality of nursing care.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e79922"},"PeriodicalIF":3.8,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12788701/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145947072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JMIR Medical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1