首页 > 最新文献

JMIR AI最新文献

英文 中文
Personalization of AI Using Personal Foundation Models Can Lead to More Precise Digital Therapeutics. 使用个人基础模型的人工智能个性化可以带来更精确的数字治疗。
IF 2 Pub Date : 2025-08-21 DOI: 10.2196/55530
Peter Washington

Digital health interventions often use machine learning (ML) models to make predictions of repeated adverse health events. For example, models may be used to analyze patient data to identify patterns that can anticipate the likelihood of disease exacerbations, enabling timely interventions and personalized treatment plans. However, many digital health applications require the prediction of highly heterogeneous and nuanced health events. The cross-subject variability of these events makes traditional ML approaches, where a single generalized model is trained to classify a particular condition, unlikely to generalize to patients outside of the training set. A natural solution is to train a separate model for each individual or subgroup, essentially overfitting the model to the unique characteristics of the individual without negatively overfitting in terms of the desired prediction task. Such an approach has traditionally required extensive data labels from each individual, a reality that has rendered personalized ML infeasible for precision health care. The recent popularization of self-supervised learning, however, provides a solution to this issue: by pretraining deep learning models on the vast array of unlabeled data streams arising from patient-generated health data, personalized models can be fine-tuned to predict the health outcome of interest with fewer labels than purely supervised approaches, making personalization of deep learning models much more achievable from a practical perspective. This perspective describes the current state-of-the-art in both self-supervised learning and ML personalization for health care as well as growing efforts to combine these two ideas by conducting self-supervised pretraining on an individual's data. However, there are practical challenges that must be addressed in order to fully realize this potential, such as human-computer interaction innovations to ensure consistent labeling practices within a single participant.

数字健康干预措施通常使用机器学习(ML)模型来预测反复出现的不良健康事件。例如,模型可用于分析患者数据,以确定能够预测疾病恶化可能性的模式,从而实现及时干预和个性化治疗计划。然而,许多数字健康应用程序需要预测高度异构和细微差别的健康事件。这些事件的跨学科可变性使得传统的机器学习方法不太可能推广到训练集之外的患者。传统的机器学习方法是训练一个单一的广义模型来分类特定的疾病。一个自然的解决方案是为每个个体或子群体训练一个单独的模型,本质上是将模型过度拟合到个体的独特特征上,而不是在期望的预测任务方面负过拟合。这种方法传统上需要每个人的大量数据标签,这使得个性化ML在精确医疗保健中不可行。然而,最近自我监督学习的普及为这个问题提供了一个解决方案:通过在患者生成的健康数据中产生的大量未标记数据流上预训练深度学习模型,可以对个性化模型进行微调,以比纯监督方法更少的标签来预测感兴趣的健康结果,从实用的角度来看,使深度学习模型的个性化更容易实现。这一观点描述了当前医疗保健领域自我监督学习和机器学习个性化的最新技术,以及通过对个人数据进行自我监督预训练来结合这两种思想的不断努力。然而,为了充分实现这一潜力,必须解决实际的挑战,例如人机交互创新,以确保在单个参与者中保持一致的标签实践。
{"title":"Personalization of AI Using Personal Foundation Models Can Lead to More Precise Digital Therapeutics.","authors":"Peter Washington","doi":"10.2196/55530","DOIUrl":"10.2196/55530","url":null,"abstract":"<p><p>Digital health interventions often use machine learning (ML) models to make predictions of repeated adverse health events. For example, models may be used to analyze patient data to identify patterns that can anticipate the likelihood of disease exacerbations, enabling timely interventions and personalized treatment plans. However, many digital health applications require the prediction of highly heterogeneous and nuanced health events. The cross-subject variability of these events makes traditional ML approaches, where a single generalized model is trained to classify a particular condition, unlikely to generalize to patients outside of the training set. A natural solution is to train a separate model for each individual or subgroup, essentially overfitting the model to the unique characteristics of the individual without negatively overfitting in terms of the desired prediction task. Such an approach has traditionally required extensive data labels from each individual, a reality that has rendered personalized ML infeasible for precision health care. The recent popularization of self-supervised learning, however, provides a solution to this issue: by pretraining deep learning models on the vast array of unlabeled data streams arising from patient-generated health data, personalized models can be fine-tuned to predict the health outcome of interest with fewer labels than purely supervised approaches, making personalization of deep learning models much more achievable from a practical perspective. This perspective describes the current state-of-the-art in both self-supervised learning and ML personalization for health care as well as growing efforts to combine these two ideas by conducting self-supervised pretraining on an individual's data. However, there are practical challenges that must be addressed in order to fully realize this potential, such as human-computer interaction innovations to ensure consistent labeling practices within a single participant.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e55530"},"PeriodicalIF":2.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12411786/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Real-Time Signal-Based Wavelet Long Short-Term Memory Method for Length-of-Stay Prediction for the Intensive Care Unit: Development and Evaluation Study. 一种基于实时信号的小波长短期记忆方法用于重症监护病房的住院时间预测:开发与评估研究。
IF 2 Pub Date : 2025-08-20 DOI: 10.2196/71247
Yiqun Jiang, Qing Li, Wenli Zhang

Background: Efficient allocation of health care resources is essential for long-term hospital operation. Effective intensive care unit (ICU) management is essential for alleviating the financial strain on health care systems. Accurate prediction of length-of-stay in ICUs is vital for optimizing capacity planning and resource allocation, with the challenge of achieving early, real-time predictions.

Objective: This study aimed to develop a predictive model, namely wavelet long short-term memory model (WT-LSTM), for ICU length-of-stay using only real-time vital sign data. The model is designed for urgent care settings where demographic and historical patient data or laboratory results may be unavailable; the model leverages real-time inputs to deliver early and accurate ICU length-of-stay predictions.

Methods: The proposed model integrates discrete wavelet transformation and long short-term memory (LSTM) neural networks to filter noise from patients' vital sign series and improve length-of-stay prediction accuracy. Model performance was evaluated using the electronic ICU database, focusing on 10 common ICU admission diagnoses in the database.

Results: The results demonstrate that WT-LSTM consistently outperforms baseline models, including linear regression, LSTM, and bidirectional long short-term memory, in predicting ICU length-of-stay using vital sign data, achieving significant improvements in mean square error. Specifically, the wavelet transformation component of the model enhances the overall performance of WT-LSTM. Removing this component results in an average decrease of 3.3% in mean square error; such a phenomenon is particularly pronounced in specific patient cohorts. The model's adaptability is highlighted through real-time predictions using only 3-hour, 6-hour, 12-hour, and 24-hour input data. Using only 3 hours of input data, the WT-LSTM model delivers competitive results across the 10 most common ICU admission diagnoses, often outperforming Acute Physiology and Chronic Health Evaluation IV, the leading ICU outcome prediction system currently implemented in clinical practice. WT-LSTM effectively captures patterns from vital signs recorded during the initial hours of a patient's ICU stay, making it a promising tool for early prediction and resource optimization in the ICU.

Conclusions: Our proposed WT-LSTM model, based on real-time vital sign data, offers a promising solution for ICU length-of-stay prediction. Its high accuracy and early prediction capabilities hold significant potential for enhancing clinical practice, optimizing resource allocation, and supporting critical clinical and administrative decisions in ICU management.

背景:医疗资源的有效配置对医院的长期运营至关重要。有效的重症监护病房(ICU)管理对于减轻卫生保健系统的财政压力至关重要。在实现早期、实时预测的挑战下,准确预测icu的住院时间对于优化容量规划和资源分配至关重要。目的:建立基于实时生命体征数据的ICU住院时间预测模型,即小波长短期记忆模型(WT-LSTM)。该模型是为可能无法获得人口统计和历史患者数据或实验室结果的紧急护理环境设计的;该模型利用实时输入来提供早期和准确的ICU住院时间预测。方法:将离散小波变换与LSTM神经网络相结合,过滤患者生命体征序列中的噪声,提高住院时间预测精度。利用电子ICU数据库对模型性能进行评估,重点关注数据库中10个常见的ICU入院诊断。结果:结果表明,WT-LSTM在使用生命体征数据预测ICU住院时间方面始终优于基线模型,包括线性回归、LSTM和双向长短期记忆,均方误差显著提高。具体来说,模型的小波变换成分增强了WT-LSTM的整体性能。去除该分量后,均方误差平均降低3.3%;这种现象在特定的患者群体中尤为明显。通过仅使用3小时、6小时、12小时和24小时输入数据进行实时预测,突出了模型的适应性。WT-LSTM模型仅使用3小时的输入数据,就能在10种最常见的ICU入院诊断中提供具有竞争力的结果,通常优于急性生理学和慢性健康评估IV,这是目前临床实践中实施的领先的ICU预后预测系统。WT-LSTM有效地捕获了患者在ICU住院的最初几个小时内记录的生命体征模式,使其成为ICU早期预测和资源优化的有前途的工具。结论:我们提出的基于实时生命体征数据的WT-LSTM模型,为ICU住院时间预测提供了一个有希望的解决方案。其高准确性和早期预测能力在加强临床实践、优化资源分配以及支持ICU管理中的关键临床和行政决策方面具有重要潜力。
{"title":"A Real-Time Signal-Based Wavelet Long Short-Term Memory Method for Length-of-Stay Prediction for the Intensive Care Unit: Development and Evaluation Study.","authors":"Yiqun Jiang, Qing Li, Wenli Zhang","doi":"10.2196/71247","DOIUrl":"10.2196/71247","url":null,"abstract":"<p><strong>Background: </strong>Efficient allocation of health care resources is essential for long-term hospital operation. Effective intensive care unit (ICU) management is essential for alleviating the financial strain on health care systems. Accurate prediction of length-of-stay in ICUs is vital for optimizing capacity planning and resource allocation, with the challenge of achieving early, real-time predictions.</p><p><strong>Objective: </strong>This study aimed to develop a predictive model, namely wavelet long short-term memory model (WT-LSTM), for ICU length-of-stay using only real-time vital sign data. The model is designed for urgent care settings where demographic and historical patient data or laboratory results may be unavailable; the model leverages real-time inputs to deliver early and accurate ICU length-of-stay predictions.</p><p><strong>Methods: </strong>The proposed model integrates discrete wavelet transformation and long short-term memory (LSTM) neural networks to filter noise from patients' vital sign series and improve length-of-stay prediction accuracy. Model performance was evaluated using the electronic ICU database, focusing on 10 common ICU admission diagnoses in the database.</p><p><strong>Results: </strong>The results demonstrate that WT-LSTM consistently outperforms baseline models, including linear regression, LSTM, and bidirectional long short-term memory, in predicting ICU length-of-stay using vital sign data, achieving significant improvements in mean square error. Specifically, the wavelet transformation component of the model enhances the overall performance of WT-LSTM. Removing this component results in an average decrease of 3.3% in mean square error; such a phenomenon is particularly pronounced in specific patient cohorts. The model's adaptability is highlighted through real-time predictions using only 3-hour, 6-hour, 12-hour, and 24-hour input data. Using only 3 hours of input data, the WT-LSTM model delivers competitive results across the 10 most common ICU admission diagnoses, often outperforming Acute Physiology and Chronic Health Evaluation IV, the leading ICU outcome prediction system currently implemented in clinical practice. WT-LSTM effectively captures patterns from vital signs recorded during the initial hours of a patient's ICU stay, making it a promising tool for early prediction and resource optimization in the ICU.</p><p><strong>Conclusions: </strong>Our proposed WT-LSTM model, based on real-time vital sign data, offers a promising solution for ICU length-of-stay prediction. Its high accuracy and early prediction capabilities hold significant potential for enhancing clinical practice, optimizing resource allocation, and supporting critical clinical and administrative decisions in ICU management.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e71247"},"PeriodicalIF":2.0,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12367335/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning Multi-Modal Melanoma Detection: Algorithm Development and Validation. 深度学习多模态黑色素瘤检测:算法开发和验证。
IF 2 Pub Date : 2025-08-13 DOI: 10.2196/66561
Nithika Vivek, Karthik Ramesh

Background: The visual similarity of melanoma and seborrheic keratosis has made it difficult for older patients with disabilities to know when to seek medical attention, contributing to the metastasis of melanoma.

Objective: This study aimed to present a novel multimodal deep learning-based technique to distinguish between melanoma and seborrheic keratosis.

Methods: Our strategy is three-fold: (1) use patient image data to train and test three deep learning models using transfer learning (ResNet50, InceptionV3, and VGG16) and one author-designed model, (2) use patient metadata to train and test a deep learning model, and (3) combine the predictions of the image model with the best accuracy and the metadata model, using nonlinear least squares regression to specify ideal weights to each model for a combined prediction.

Results: The accuracy of the combined model was 88% (195/221 classified correctly) on test data from the HAM10000 dataset. Model reliability was assessed by visualizing the output activation map of each model and comparing the diagnosis patterns to that of dermatologists. The addition of metadata to the image dataset was key to reducing the false-negative and false-positive rates simultaneously, thereby producing better metrics and improving overall model accuracy.

Conclusions: Results from this experiment could be used to eliminate late diagnosis of melanoma via easy access to an app. Future experiments can use text data (subjective data pertaining to how the patient felt over a certain period of time) to allow this model to reflect the real hospital setting to a greater extent.

背景:黑色素瘤和脂溢性角化病视觉上的相似性使得老年残疾患者难以知道何时就医,从而导致黑色素瘤的转移。目的:本研究旨在提出一种新的基于多模态深度学习的技术来区分黑色素瘤和脂溢性角化病。方法:我们的策略有三个方面:(1)使用患者图像数据使用迁移学习训练和测试三个深度学习模型(ResNet50, InceptionV3和VGG16)和一个作者设计的模型;(2)使用患者元数据训练和测试一个深度学习模型;(3)将具有最佳精度的图像模型和元数据模型的预测结合起来,使用非线性最小二乘回归为每个模型指定理想权值进行组合预测。结果:在HAM10000数据集的测试数据上,组合模型的准确率为88%(195/221分类正确)。通过可视化每个模型的输出激活图并将诊断模式与皮肤科医生的诊断模式进行比较,来评估模型的可靠性。向图像数据集添加元数据是同时降低假阴性和假阳性率的关键,从而产生更好的指标并提高整体模型准确性。结论:本实验的结果可以通过应用程序的便捷访问来消除黑色素瘤的晚期诊断。未来的实验可以使用文本数据(关于患者在一定时间内的感受的主观数据),使该模型更大程度上反映真实的医院环境。
{"title":"Deep Learning Multi-Modal Melanoma Detection: Algorithm Development and Validation.","authors":"Nithika Vivek, Karthik Ramesh","doi":"10.2196/66561","DOIUrl":"10.2196/66561","url":null,"abstract":"<p><strong>Background: </strong>The visual similarity of melanoma and seborrheic keratosis has made it difficult for older patients with disabilities to know when to seek medical attention, contributing to the metastasis of melanoma.</p><p><strong>Objective: </strong>This study aimed to present a novel multimodal deep learning-based technique to distinguish between melanoma and seborrheic keratosis.</p><p><strong>Methods: </strong>Our strategy is three-fold: (1) use patient image data to train and test three deep learning models using transfer learning (ResNet50, InceptionV3, and VGG16) and one author-designed model, (2) use patient metadata to train and test a deep learning model, and (3) combine the predictions of the image model with the best accuracy and the metadata model, using nonlinear least squares regression to specify ideal weights to each model for a combined prediction.</p><p><strong>Results: </strong>The accuracy of the combined model was 88% (195/221 classified correctly) on test data from the HAM10000 dataset. Model reliability was assessed by visualizing the output activation map of each model and comparing the diagnosis patterns to that of dermatologists. The addition of metadata to the image dataset was key to reducing the false-negative and false-positive rates simultaneously, thereby producing better metrics and improving overall model accuracy.</p><p><strong>Conclusions: </strong>Results from this experiment could be used to eliminate late diagnosis of melanoma via easy access to an app. Future experiments can use text data (subjective data pertaining to how the patient felt over a certain period of time) to allow this model to reflect the real hospital setting to a greater extent.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e66561"},"PeriodicalIF":2.0,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12346184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144849967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-Supported Shared Decision-Making (AI-SDM): Conceptual Framework. ai支持的共享决策(AI-SDM):概念框架。
IF 2 Pub Date : 2025-08-07 DOI: 10.2196/75866
Mohammed As'ad, Nawarh Faran, Hala Joharji

Unlabelled: Shared decision-making is central to patient-centered care but is often hampered by artificial intelligence (AI) systems that focus on technical transparency rather than delivering context-rich, clinically meaningful reasoning. Although AI explainability methods elucidate how decisions are made, they fall short of addressing the "why" that supports effective patient-clinician dialogue. To bridge this gap, we introduce artificial intelligence-supported shared decision-making (AI-SDM), a conceptual framework designed to integrate AI-based reasoning into shared decision-making to enhance care quality while preserving patient autonomy. AI-SDM is a structured, multimodel framework that synthesizes predictive modeling, evidence-based recommendations, and generative AI techniques to produce adaptive, context-sensitive explanations. The framework distinguishes conventional AI explainability from AI reasoning-prioritizing the generation of tailored, narrative justifications that inform shared decisions. A hypothetical clinical scenario in stroke management is used to illustrate how AI-SDM facilitates an iterative, triadic deliberation process between health care providers, patients, and AI outputs. This integration is intended to transform raw algorithmic data into actionable insights that directly support the decision-making process without supplanting human judgment.

未标记:共享决策对于以患者为中心的护理至关重要,但往往受到人工智能(AI)系统的阻碍,这些系统关注的是技术透明度,而不是提供丰富的、有临床意义的推理。尽管人工智能的可解释性方法阐明了决策是如何做出的,但它们无法解决支持有效的医患对话的“为什么”。为了弥补这一差距,我们引入了人工智能支持的共享决策(AI-SDM),这是一个概念框架,旨在将基于人工智能的推理整合到共享决策中,以提高护理质量,同时保持患者的自主权。AI- sdm是一个结构化的多模型框架,它综合了预测建模、循证建议和生成式AI技术,以产生自适应的、上下文敏感的解释。该框架将传统的人工智能可解释性与人工智能推理区分开来——优先考虑生成定制的、叙述性的理由,为共同决策提供信息。在卒中管理中,一个假设的临床场景被用来说明AI- sdm如何促进卫生保健提供者、患者和AI输出之间的迭代、三方审议过程。这种整合旨在将原始算法数据转化为可操作的见解,直接支持决策过程,而不会取代人类的判断。
{"title":"AI-Supported Shared Decision-Making (AI-SDM): Conceptual Framework.","authors":"Mohammed As'ad, Nawarh Faran, Hala Joharji","doi":"10.2196/75866","DOIUrl":"10.2196/75866","url":null,"abstract":"<p><strong>Unlabelled: </strong>Shared decision-making is central to patient-centered care but is often hampered by artificial intelligence (AI) systems that focus on technical transparency rather than delivering context-rich, clinically meaningful reasoning. Although AI explainability methods elucidate how decisions are made, they fall short of addressing the \"why\" that supports effective patient-clinician dialogue. To bridge this gap, we introduce artificial intelligence-supported shared decision-making (AI-SDM), a conceptual framework designed to integrate AI-based reasoning into shared decision-making to enhance care quality while preserving patient autonomy. AI-SDM is a structured, multimodel framework that synthesizes predictive modeling, evidence-based recommendations, and generative AI techniques to produce adaptive, context-sensitive explanations. The framework distinguishes conventional AI explainability from AI reasoning-prioritizing the generation of tailored, narrative justifications that inform shared decisions. A hypothetical clinical scenario in stroke management is used to illustrate how AI-SDM facilitates an iterative, triadic deliberation process between health care providers, patients, and AI outputs. This integration is intended to transform raw algorithmic data into actionable insights that directly support the decision-making process without supplanting human judgment.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e75866"},"PeriodicalIF":2.0,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12331219/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144801173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing Revisit Risk in Emergency Department Patients: Machine Learning Approach. 评估急诊科患者重访风险:机器学习方法。
IF 2 Pub Date : 2025-08-07 DOI: 10.2196/74053
Wang-Chuan Juang, Zheng-Xun Cai, Chia-Mei Chen, Zhi-Hong You

Background: Overcrowded emergency rooms might degrade the quality of care and overload the clinic staff. Assessing unscheduled return visits (URVs) to the emergency department (ED) is a quality assurance procedure to identify ED-discharged patients with a high likelihood of bounce-back, to ensure patient safety, and ultimately to reduce medical costs by decreasing the frequency of URVs. The field of machine learning (ML) has evolved considerably in the past decades, and many ML applications have been deployed in various contexts.

Objective: This study aims to develop an ML-assisted framework that identifies high-risk patients who may revisit the ED within 72 hours after the initial visit. Furthermore, this study evaluates different ML models, feature sets, and feature encoding methods in order to build an effective prediction model.

Methods: This study proposes an ML-assisted system that extracts the features from both structured and unstructured medical data to predict patients who are likely to revisit the ED, where the structured data includes patients' electronic health records, and the unstructured data is their medical notes (subjective, objective, assessment, and plan). A 5-year dataset consisting of 184,687 ED visits, along with 324,111 historical electronic health records and the associated medical notes, was obtained from Kaohsiung Veterans General Hospital, a tertiary medical center in Taiwan, to evaluate the proposed system.

Results: The evaluation results indicate that incorporating convolutional neural network-based feature extraction from unstructured ED physician narrative notes, combined with structured vital signs and demographic data, significantly enhances predictive performance. The proposed approach achieves an area under the receiver operating characteristic curve of 0.705 and a recall of 0.718, demonstrating its effectiveness in predicting URVs. These findings highlight the potential of integrating structured and unstructured clinical data to improve predictive accuracy in this context.

Conclusions: The study demonstrates that an ML-assisted framework may be applied as a decision support tool to assist ED clinicians in identifying revisiting patients, although the model's performance may not be sufficient for clinic implementation. Given the improvement in the area under the receiver operating characteristic curve, the proposed framework should be further explored as a workable decision support tool to pinpoint ED patients with a high risk of revisit and provide them with appropriate and timely care.

背景:过度拥挤的急诊室可能会降低护理质量并使诊所工作人员超负荷。评估急诊部(ED)的非计划回访(URVs)是一种质量保证程序,用于识别急诊出院后极有可能反弹的患者,确保患者安全,并最终通过减少URVs的频率来降低医疗成本。在过去的几十年里,机器学习(ML)领域已经有了很大的发展,许多ML应用程序已经部署在各种环境中。目的:本研究旨在开发一种机器学习辅助框架,以识别可能在初次就诊后72小时内再次就诊的高危患者。此外,本研究评估了不同的机器学习模型、特征集和特征编码方法,以建立有效的预测模型。方法:本研究提出了一个机器学习辅助系统,该系统从结构化和非结构化医疗数据中提取特征,以预测可能再次访问急诊科的患者,其中结构化数据包括患者的电子健康记录,非结构化数据是他们的医疗记录(主观、客观、评估和计划)。从台湾三级医疗中心高雄退伍军人总医院获得了一个5年的数据集,包括184,687次急诊科就诊,以及324,111份历史电子健康记录和相关医疗记录,以评估所提出的系统。结果:评估结果表明,将基于卷积神经网络的特征提取从非结构化的急诊科医生叙述笔记中提取,并结合结构化的生命体征和人口统计数据,显著提高了预测性能。该方法在接收者工作特征曲线下的面积为0.705,召回率为0.718,证明了该方法预测urv的有效性。这些发现强调了整合结构化和非结构化临床数据以提高这种情况下预测准确性的潜力。结论:该研究表明,ml辅助框架可以作为一种决策支持工具,帮助急诊科医生识别重访患者,尽管该模型的性能可能不足以用于临床实施。鉴于接受者工作特征曲线下面积的改善,建议的框架应进一步探索作为一种可行的决策支持工具,以确定有重访风险的ED患者,并为他们提供适当和及时的护理。
{"title":"Assessing Revisit Risk in Emergency Department Patients: Machine Learning Approach.","authors":"Wang-Chuan Juang, Zheng-Xun Cai, Chia-Mei Chen, Zhi-Hong You","doi":"10.2196/74053","DOIUrl":"10.2196/74053","url":null,"abstract":"<p><strong>Background: </strong>Overcrowded emergency rooms might degrade the quality of care and overload the clinic staff. Assessing unscheduled return visits (URVs) to the emergency department (ED) is a quality assurance procedure to identify ED-discharged patients with a high likelihood of bounce-back, to ensure patient safety, and ultimately to reduce medical costs by decreasing the frequency of URVs. The field of machine learning (ML) has evolved considerably in the past decades, and many ML applications have been deployed in various contexts.</p><p><strong>Objective: </strong>This study aims to develop an ML-assisted framework that identifies high-risk patients who may revisit the ED within 72 hours after the initial visit. Furthermore, this study evaluates different ML models, feature sets, and feature encoding methods in order to build an effective prediction model.</p><p><strong>Methods: </strong>This study proposes an ML-assisted system that extracts the features from both structured and unstructured medical data to predict patients who are likely to revisit the ED, where the structured data includes patients' electronic health records, and the unstructured data is their medical notes (subjective, objective, assessment, and plan). A 5-year dataset consisting of 184,687 ED visits, along with 324,111 historical electronic health records and the associated medical notes, was obtained from Kaohsiung Veterans General Hospital, a tertiary medical center in Taiwan, to evaluate the proposed system.</p><p><strong>Results: </strong>The evaluation results indicate that incorporating convolutional neural network-based feature extraction from unstructured ED physician narrative notes, combined with structured vital signs and demographic data, significantly enhances predictive performance. The proposed approach achieves an area under the receiver operating characteristic curve of 0.705 and a recall of 0.718, demonstrating its effectiveness in predicting URVs. These findings highlight the potential of integrating structured and unstructured clinical data to improve predictive accuracy in this context.</p><p><strong>Conclusions: </strong>The study demonstrates that an ML-assisted framework may be applied as a decision support tool to assist ED clinicians in identifying revisiting patients, although the model's performance may not be sufficient for clinic implementation. Given the improvement in the area under the receiver operating characteristic curve, the proposed framework should be further explored as a workable decision support tool to pinpoint ED patients with a high risk of revisit and provide them with appropriate and timely care.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e74053"},"PeriodicalIF":2.0,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12332214/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144801174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Training Language Models for Estimating Priority Levels in Ultrasound Examination Waitlists: Algorithm Development and Validation. 用于估计超声检查候补名单优先级的训练语言模型:算法开发和验证。
IF 2 Pub Date : 2025-07-22 DOI: 10.2196/68020
Kanato Masayoshi, Masahiro Hashimoto, Naoki Toda, Hirozumi Mori, Goh Kobayashi, Hasnine Haque, Mizuki So, Masahiro Jinzaki

Background: Ultrasound examinations, while valuable, are time-consuming and often limited in availability. Consequently, many hospitals implement reservation systems; however, these systems typically lack prioritization for examination purposes. Hence, our hospital uses a waitlist system that prioritizes examination requests based on their clinical value when slots become available due to cancellations. This system, however, requires a manual review of examination purposes, which are recorded in free-form text. We hypothesized that artificial intelligence language models could preliminarily estimate the priority of requests before manual reviews.

Objective: This study aimed to investigate potential challenges associated with using language models for estimating the priority of medical examination requests and to evaluate the performance of language models in processing Japanese medical texts.

Methods: We retrospectively collected ultrasound examination requests from the waitlist system at Keio University Hospital, spanning January 2020 to March 2023. Each request comprised an examination purpose documented by the requesting physician and a 6-tier priority level assigned by a radiologist during the clinical workflow. We fine-tuned JMedRoBERTa, Luke, OpenCalm, and LLaMA2 under two conditions: (1) tuning only the final layer and (2) tuning all layers using either standard backpropagation or low-rank adaptation.

Results: We had 2335 and 204 requests in the training and test datasets post cleaning. When only the final layers were tuned, JMedRoBERTa outperformed the other models (Kendall coefficient=0.225). With full fine-tuning, JMedRoBERTa continued to perform best (Kendall coefficient=0.254), though with reduced margins compared with the other models. The radiologist's retrospective re-evaluation yielded a Kendall coefficient of 0.221.

Conclusions: Language models can estimate the priority of examination requests with accuracy comparable with that of human radiologists. The fine-tuning results indicate that general-purpose language models can be adapted to domain-specific texts (ie, Japanese medical texts) with sufficient fine-tuning. Further research is required to address priority rank ambiguity, expand the dataset across multiple institutions, and explore more recent language models with potentially higher performance or better suitability for this task.

背景:超声检查虽然有价值,但费时且可用性有限。因此,许多医院实行预约制度;然而,这些系统通常缺乏审查目的的优先级。因此,我们医院使用了一个等候名单系统,当由于取消而有空位时,该系统根据其临床价值优先考虑检查请求。然而,这一制度需要对考试目的进行人工审查,并以自由格式的文本记录。我们假设人工智能语言模型可以在人工审查之前初步估计请求的优先级。目的:本研究旨在探讨使用语言模型估计医学检查请求优先级的潜在挑战,并评估语言模型在处理日语医学文本中的性能。方法:回顾性收集2020年1月至2023年3月期间庆应义塾大学医院候诊名单系统中的超声检查请求。每个请求包括由请求医师记录的检查目的和由放射科医生在临床工作流程中分配的6级优先级别。我们在两个条件下对JMedRoBERTa、Luke、OpenCalm和LLaMA2进行了微调:(1)仅对最后一层进行了调优;(2)使用标准反向传播或低秩自适应对所有层进行了调优。结果:在清洗后的训练和测试数据集中,我们有2335和204个请求。当只调整最后一层时,JMedRoBERTa优于其他模型(Kendall系数=0.225)。经过全面微调,JMedRoBERTa继续表现最佳(Kendall系数=0.254),尽管与其他模型相比,边际值有所减少。放射科医生的回顾性重新评估得出肯德尔系数为0.221。结论:语言模型可以估计检查请求的优先级,其准确性与人类放射科医生相当。微调结果表明,通过充分的微调,通用语言模型可以适应特定领域的文本(如日语医学文本)。需要进一步的研究来解决优先级歧义,跨多个机构扩展数据集,并探索具有更高性能或更适合此任务的最新语言模型。
{"title":"Training Language Models for Estimating Priority Levels in Ultrasound Examination Waitlists: Algorithm Development and Validation.","authors":"Kanato Masayoshi, Masahiro Hashimoto, Naoki Toda, Hirozumi Mori, Goh Kobayashi, Hasnine Haque, Mizuki So, Masahiro Jinzaki","doi":"10.2196/68020","DOIUrl":"10.2196/68020","url":null,"abstract":"<p><strong>Background: </strong>Ultrasound examinations, while valuable, are time-consuming and often limited in availability. Consequently, many hospitals implement reservation systems; however, these systems typically lack prioritization for examination purposes. Hence, our hospital uses a waitlist system that prioritizes examination requests based on their clinical value when slots become available due to cancellations. This system, however, requires a manual review of examination purposes, which are recorded in free-form text. We hypothesized that artificial intelligence language models could preliminarily estimate the priority of requests before manual reviews.</p><p><strong>Objective: </strong>This study aimed to investigate potential challenges associated with using language models for estimating the priority of medical examination requests and to evaluate the performance of language models in processing Japanese medical texts.</p><p><strong>Methods: </strong>We retrospectively collected ultrasound examination requests from the waitlist system at Keio University Hospital, spanning January 2020 to March 2023. Each request comprised an examination purpose documented by the requesting physician and a 6-tier priority level assigned by a radiologist during the clinical workflow. We fine-tuned JMedRoBERTa, Luke, OpenCalm, and LLaMA2 under two conditions: (1) tuning only the final layer and (2) tuning all layers using either standard backpropagation or low-rank adaptation.</p><p><strong>Results: </strong>We had 2335 and 204 requests in the training and test datasets post cleaning. When only the final layers were tuned, JMedRoBERTa outperformed the other models (Kendall coefficient=0.225). With full fine-tuning, JMedRoBERTa continued to perform best (Kendall coefficient=0.254), though with reduced margins compared with the other models. The radiologist's retrospective re-evaluation yielded a Kendall coefficient of 0.221.</p><p><strong>Conclusions: </strong>Language models can estimate the priority of examination requests with accuracy comparable with that of human radiologists. The fine-tuning results indicate that general-purpose language models can be adapted to domain-specific texts (ie, Japanese medical texts) with sufficient fine-tuning. Further research is required to address priority rank ambiguity, expand the dataset across multiple institutions, and explore more recent language models with potentially higher performance or better suitability for this task.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e68020"},"PeriodicalIF":2.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12325119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144692629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Natural Language Processing for Identification of Hospitalized People Who Use Drugs: Cohort Study. 自然语言处理识别住院用药患者:队列研究。
IF 2 Pub Date : 2025-07-18 DOI: 10.2196/63147
Taisuke Sato, Emily D Grussing, Ruchi Patel, Jessica Ridgway, Joji Suzuki, Benjamin Sweigart, Robert Miller, Alysse G Wurcel

Background: People who use drugs (PWUD) are at heightened risk of severe injection-related infections. Current research relies on billing codes to identify PWUD-a methodology with suboptimal accuracy that may underestimate the economic, racial, and ethnic diversity of hospitalized PWUD.

Objective: The goal of this study is to examine the impact of natural language processing (NLP) on enhancing identification of PWUD in electronic medical records, with a specific focus on determining improved systems of identifying populations who may previously been missed, including people who have low income or those from racially and ethnically minoritized populations.

Methods: Health informatics specialists assisted in querying a cohort of likely PWUD hospital admissions at Tufts Medical Center between 2020-2022 using the following criteria: (1) ICD-10 codes indicative of drug use, (2) positive drug toxicology results, (3) prescriptions for medications for opioid use disorder, and (4) applying NLP-detected presence of "token" keywords in the electronic medical records likely indicative of the patient being a PWUD. Hospital admissions were split into two groups: highly documented (all four criteria present) and minimally documented (NLP-only). These groups were examined to assess the impact of race, ethnicity, and social vulnerability index. With chart review as the "gold standard," the positive predictive value was calculated.

Results: The cohort included 4548 hospitalization admissions, with broad heterogeneity in how people entered the cohort and subcohorts; a total of 288 hospital admissions entered the cohort through NLP token presence alone. NLP demonstrated a 54% positive predictive value, outperforming biomarkers, prescription for medications for opioid use disorder, and ICD codes in identifying hospitalizations of PWUD. Additionally, NLP significantly enhanced these methods when integrated into the identification algorithm. The study also found that people from racially and ethnically minoritized communities and those with lower social vulnerability index were significantly more likely to have lower rates of PWUD-related documentation.

Conclusions: NLP proved effective in identifying hospitalizations of PWUD, surpassing traditional methods. While further refinement is needed, NLP shows promising potential in minimizing health care disparities.

背景:吸毒者(PWUD)发生严重注射相关感染的风险较高。目前的研究依赖于账单代码来识别pwd,这是一种准确度不理想的方法,可能低估了住院pwd的经济、种族和民族多样性。目的:本研究的目的是研究自然语言处理(NLP)对增强电子病历中PWUD识别的影响,特别关注确定改进的系统,以识别以前可能被遗漏的人群,包括低收入人群或来自种族和少数民族的人群。方法:健康信息学专家协助查询Tufts医疗中心2020-2022年期间可能的PWUD住院患者队列,使用以下标准:(1)指示药物使用的ICD-10代码,(2)阳性药物毒理学结果,(3)阿片类药物使用障碍的药物处方,以及(4)应用nlp检测到的电子病历中存在的“令牌”关键字,可能表明患者是PWUD。住院患者分为两组:高度记录(所有四项标准均存在)和最低记录(仅nlp)。对这些群体进行检查,以评估种族、民族和社会脆弱性指数的影响。以图表回顾为“金标准”,计算阳性预测值。结果:该队列包括4548例住院患者,人们进入队列和亚队列的方式存在广泛的异质性;仅通过NLP象征性存在就有288名住院患者进入队列。NLP显示出54%的阳性预测值,优于生物标志物、阿片类药物使用障碍的药物处方和识别PWUD住院的ICD代码。此外,当整合到识别算法中时,NLP显着增强了这些方法。该研究还发现,来自种族和少数民族社区的人以及社会脆弱性指数较低的人更有可能拥有较低的pwd相关文件。结论:NLP在识别PWUD住院情况方面优于传统方法。虽然需要进一步完善,但NLP在减少医疗保健差距方面显示出很大的潜力。
{"title":"Natural Language Processing for Identification of Hospitalized People Who Use Drugs: Cohort Study.","authors":"Taisuke Sato, Emily D Grussing, Ruchi Patel, Jessica Ridgway, Joji Suzuki, Benjamin Sweigart, Robert Miller, Alysse G Wurcel","doi":"10.2196/63147","DOIUrl":"10.2196/63147","url":null,"abstract":"<p><strong>Background: </strong>People who use drugs (PWUD) are at heightened risk of severe injection-related infections. Current research relies on billing codes to identify PWUD-a methodology with suboptimal accuracy that may underestimate the economic, racial, and ethnic diversity of hospitalized PWUD.</p><p><strong>Objective: </strong>The goal of this study is to examine the impact of natural language processing (NLP) on enhancing identification of PWUD in electronic medical records, with a specific focus on determining improved systems of identifying populations who may previously been missed, including people who have low income or those from racially and ethnically minoritized populations.</p><p><strong>Methods: </strong>Health informatics specialists assisted in querying a cohort of likely PWUD hospital admissions at Tufts Medical Center between 2020-2022 using the following criteria: (1) ICD-10 codes indicative of drug use, (2) positive drug toxicology results, (3) prescriptions for medications for opioid use disorder, and (4) applying NLP-detected presence of \"token\" keywords in the electronic medical records likely indicative of the patient being a PWUD. Hospital admissions were split into two groups: highly documented (all four criteria present) and minimally documented (NLP-only). These groups were examined to assess the impact of race, ethnicity, and social vulnerability index. With chart review as the \"gold standard,\" the positive predictive value was calculated.</p><p><strong>Results: </strong>The cohort included 4548 hospitalization admissions, with broad heterogeneity in how people entered the cohort and subcohorts; a total of 288 hospital admissions entered the cohort through NLP token presence alone. NLP demonstrated a 54% positive predictive value, outperforming biomarkers, prescription for medications for opioid use disorder, and ICD codes in identifying hospitalizations of PWUD. Additionally, NLP significantly enhanced these methods when integrated into the identification algorithm. The study also found that people from racially and ethnically minoritized communities and those with lower social vulnerability index were significantly more likely to have lower rates of PWUD-related documentation.</p><p><strong>Conclusions: </strong>NLP proved effective in identifying hospitalizations of PWUD, surpassing traditional methods. While further refinement is needed, NLP shows promising potential in minimizing health care disparities.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e63147"},"PeriodicalIF":2.0,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12294639/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144664120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-SDM: A Concept of Integrating AI Reasoning into Shared Decision-Making. AI- sdm:将AI推理集成到共享决策中的概念。
IF 2 Pub Date : 2025-07-08 DOI: 10.2196/75866
Mohammed As'ad, Nawarh Faran, Hala Joharji

Unstructured: Shared decision-making is central to patient-centered care but is often hampered by AI systems that focus on technical transparency rather than delivering context-rich, clinically meaningful reasoning. Although XAI methods elucidate how decisions are made, they fall short in addressing the "why" that supports effective patient-clinician dialogue. To bridge this gap, we introduce AI-SDM, a conceptual framework designed to integrate AI-based reasoning into Shared decision-making to enhance care quality while preserving patient autonomy. AI-SDM is a structured, multi-model framework that synthesizes predictive modelling, evidence-based recommendations, and generative AI techniques to produce adaptive, context-sensitive explanations. The framework distinguishes conventional AI explainability from AI reasoning-prioritizing the generation of tailored, narrative justifications that inform shared decisions. A hypothetical clinical scenario in stroke management is used to illustrate how AI-SDM facilitates an iterative, triadic deliberation process between healthcare providers, patients, and AI outputs. This integration is intended to transform raw algorithmic data into actionable insights that directly support the decision-making process without supplanting human judgment.

非结构化:共享决策对于以患者为中心的护理至关重要,但往往受到人工智能系统的阻碍,这些系统关注的是技术透明度,而不是提供丰富的、有临床意义的推理。尽管XAI方法阐明了决策是如何做出的,但它们在解决支持有效的医患对话的“为什么”方面存在不足。为了弥补这一差距,我们引入了AI-SDM,这是一个概念框架,旨在将基于ai的推理整合到共享决策中,以提高护理质量,同时保持患者的自主权。AI- sdm是一个结构化的多模型框架,它综合了预测建模、循证建议和生成式AI技术,以产生自适应的、上下文敏感的解释。该框架将传统的人工智能可解释性与人工智能推理区分开来——优先考虑生成定制的、叙述性的理由,为共同决策提供信息。在脑卒中管理中,一个假设的临床场景被用来说明AI- sdm如何促进医疗保健提供者、患者和AI输出之间的迭代、三方审议过程。这种整合旨在将原始算法数据转化为可操作的见解,直接支持决策过程,而不会取代人类的判断。
{"title":"AI-SDM: A Concept of Integrating AI Reasoning into Shared Decision-Making.","authors":"Mohammed As'ad, Nawarh Faran, Hala Joharji","doi":"10.2196/75866","DOIUrl":"10.2196/75866","url":null,"abstract":"<p><strong>Unstructured: </strong>Shared decision-making is central to patient-centered care but is often hampered by AI systems that focus on technical transparency rather than delivering context-rich, clinically meaningful reasoning. Although XAI methods elucidate how decisions are made, they fall short in addressing the \"why\" that supports effective patient-clinician dialogue. To bridge this gap, we introduce AI-SDM, a conceptual framework designed to integrate AI-based reasoning into Shared decision-making to enhance care quality while preserving patient autonomy. AI-SDM is a structured, multi-model framework that synthesizes predictive modelling, evidence-based recommendations, and generative AI techniques to produce adaptive, context-sensitive explanations. The framework distinguishes conventional AI explainability from AI reasoning-prioritizing the generation of tailored, narrative justifications that inform shared decisions. A hypothetical clinical scenario in stroke management is used to illustrate how AI-SDM facilitates an iterative, triadic deliberation process between healthcare providers, patients, and AI outputs. This integration is intended to transform raw algorithmic data into actionable insights that directly support the decision-making process without supplanting human judgment.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144593085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning Multi Modal Melanoma Detection: Algorithm Development and Validation. 深度学习多模态黑色素瘤检测:算法开发和验证。
IF 2 Pub Date : 2025-07-05 DOI: 10.2196/66561
Nithika Vivek, Karthik Ramesh

Background: The visual similarity of melanoma and seborrheic keratosis has made it difficult for elderly patients with disabilities to know when to seek medical attention, contributing to the metastasis of melanoma.

Objective: In this paper, we present a novel multi-modal deep learning-based technique to distinguish between melanoma and seborrheic keratosis.

Methods: Our strategy is three-fold: (1) utilize patient image data to train and test three deep learning models using transfer learning (ResNet50, InceptionV3, and VGG16) and one author designed model, (2) use patient metadata to train and test a deep learning model, and (3) combine the predictions of the image model with the best accuracy and the metadata model, using nonlinear least squares regression to specify ideal weights to each model for a combined prediction.

Results: The accuracy of the combined model was 88% (195/221 classified correctly) on test data from the HAM10000 dataset. Model reliability was assessed by visualizing the output activation map of each model and comparing the diagnosis patterns to that of dermatologists. The addition of metadata to the image dataset was key to reducing the false negative and false positive rate simultaneously, thereby producing better metrics and improving overall model accuracy.

Conclusions: Results from this experiment could be used to eliminate late diagnosis of melanoma via easy access to an app. Future experiments can utilize text data (subjective data pertaining to how the patient felt over a certain period of time) to allow this model to reflect the real hospital setting to a greater extent.

Clinicaltrial:

背景:黑色素瘤与脂溢性角化病在视觉上的相似性使得老年残疾患者难以知道何时就医,从而导致黑色素瘤的转移。目的:在本文中,我们提出了一种新的基于多模态深度学习的技术来区分黑色素瘤和脂溢性角化病。方法:我们的策略有三个方面:(1)利用患者图像数据使用迁移学习训练和测试三个深度学习模型(ResNet50, InceptionV3和VGG16)和一个作者设计的模型;(2)使用患者元数据训练和测试一个深度学习模型;(3)将具有最佳精度的图像模型和元数据模型的预测结合起来,使用非线性最小二乘回归为每个模型指定理想权值进行组合预测。结果:在HAM10000数据集的测试数据上,组合模型的准确率为88%(195/221分类正确)。通过可视化每个模型的输出激活图并将诊断模式与皮肤科医生的诊断模式进行比较,来评估模型的可靠性。向图像数据集添加元数据是同时降低假阴性和假阳性率的关键,从而产生更好的指标并提高整体模型准确性。结论:本实验的结果可以通过应用程序的便捷访问来消除黑色素瘤的晚期诊断。未来的实验可以利用文本数据(关于患者在一定时间内的感受的主观数据),使该模型更大程度上反映真实的医院环境。临床试验:
{"title":"Deep Learning Multi Modal Melanoma Detection: Algorithm Development and Validation.","authors":"Nithika Vivek, Karthik Ramesh","doi":"10.2196/66561","DOIUrl":"10.2196/66561","url":null,"abstract":"<p><strong>Background: </strong>The visual similarity of melanoma and seborrheic keratosis has made it difficult for elderly patients with disabilities to know when to seek medical attention, contributing to the metastasis of melanoma.</p><p><strong>Objective: </strong>In this paper, we present a novel multi-modal deep learning-based technique to distinguish between melanoma and seborrheic keratosis.</p><p><strong>Methods: </strong>Our strategy is three-fold: (1) utilize patient image data to train and test three deep learning models using transfer learning (ResNet50, InceptionV3, and VGG16) and one author designed model, (2) use patient metadata to train and test a deep learning model, and (3) combine the predictions of the image model with the best accuracy and the metadata model, using nonlinear least squares regression to specify ideal weights to each model for a combined prediction.</p><p><strong>Results: </strong>The accuracy of the combined model was 88% (195/221 classified correctly) on test data from the HAM10000 dataset. Model reliability was assessed by visualizing the output activation map of each model and comparing the diagnosis patterns to that of dermatologists. The addition of metadata to the image dataset was key to reducing the false negative and false positive rate simultaneously, thereby producing better metrics and improving overall model accuracy.</p><p><strong>Conclusions: </strong>Results from this experiment could be used to eliminate late diagnosis of melanoma via easy access to an app. Future experiments can utilize text data (subjective data pertaining to how the patient felt over a certain period of time) to allow this model to reflect the real hospital setting to a greater extent.</p><p><strong>Clinicaltrial: </strong></p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144577145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Large Language Models for Accurate Retrieval of Patient Information From Medical Reports: Systematic Evaluation Study. 利用大型语言模型从医疗报告中准确检索患者信息:系统评估研究。
Pub Date : 2025-07-03 DOI: 10.2196/68776
Angel Manuel Garcia-Carmona, Maria-Lorena Prieto, Enrique Puertas, Juan-Jose Beunza

Background: The digital transformation of health care has introduced both opportunities and challenges, particularly in managing and analyzing the vast amounts of unstructured medical data generated daily. There is a need to explore the feasibility of generative solutions in extracting data from medical reports, categorized by specific criteria.

Objective: This study aimed to investigate the application of large language models (LLMs) for the automated extraction of structured information from unstructured medical reports, using the LangChain framework in Python.

Methods: Through a systematic evaluation of leading LLMs-GPT-4o, Llama 3, Llama 3.1, Gemma 2, Qwen 2, and Qwen 2.5-using zero-shot prompting techniques and embedding results into a vector database, this study assessed the performance of LLMs in extracting patient demographics, diagnostic details, and pharmacological data.

Results: Evaluation metrics, including accuracy, precision, recall, and F1-score, revealed high efficacy across most categories, with GPT-4o achieving the highest overall performance (91.4% accuracy).

Conclusions: The findings highlight notable differences in precision and recall between models, particularly in extracting names and age-related information. There were challenges in processing unstructured medical text, including variability in model performance across data types. Our findings demonstrate the feasibility of integrating LLMs into health care workflows; LLMs offer substantial improvements in data accessibility and support clinical decision-making processes. In addition, the paper describes the role of retrieval-augmented generation techniques in enhancing information retrieval accuracy, addressing issues such as hallucinations and outdated data in LLM outputs. Future work should explore the need for optimization through larger and more diverse training datasets, advanced prompting strategies, and the integration of domain-specific knowledge to improve model generalizability and precision.

背景:医疗保健的数字化转型带来了机遇和挑战,特别是在管理和分析每天产生的大量非结构化医疗数据方面。有必要探讨从按具体标准分类的医疗报告中提取数据的生成解决方案的可行性。目的:本研究旨在探讨大型语言模型(LLMs)在Python中使用LangChain框架从非结构化医疗报告中自动提取结构化信息的应用。方法:采用零次提示技术,并将结果嵌入到载体数据库中,对领先的LLMs- gbt - 40、Llama 3、Llama 3.1、Gemma 2、Qwen 2和Qwen 2.5进行系统评估,评估LLMs在提取患者人口统计学、诊断细节和药理学数据方面的表现。结果:评估指标,包括准确性、精密度、召回率和f1评分,显示了大多数类别的高效率,gpt - 40达到了最高的总体性能(91.4%的准确率)。结论:研究结果突出了不同模型在准确率和召回率方面的显著差异,特别是在提取姓名和年龄相关信息方面。在处理非结构化医学文本方面存在挑战,包括跨数据类型的模型性能的可变性。我们的研究结果证明了将法学硕士纳入医疗保健工作流程的可行性;法学硕士在数据可访问性和支持临床决策过程方面提供了实质性的改进。此外,本文描述了检索增强生成技术在提高信息检索准确性方面的作用,解决了LLM输出中的幻觉和过时数据等问题。未来的工作应该通过更大、更多样化的训练数据集、先进的提示策略和领域特定知识的集成来探索优化的需求,以提高模型的泛化性和精度。
{"title":"Leveraging Large Language Models for Accurate Retrieval of Patient Information From Medical Reports: Systematic Evaluation Study.","authors":"Angel Manuel Garcia-Carmona, Maria-Lorena Prieto, Enrique Puertas, Juan-Jose Beunza","doi":"10.2196/68776","DOIUrl":"10.2196/68776","url":null,"abstract":"<p><strong>Background: </strong>The digital transformation of health care has introduced both opportunities and challenges, particularly in managing and analyzing the vast amounts of unstructured medical data generated daily. There is a need to explore the feasibility of generative solutions in extracting data from medical reports, categorized by specific criteria.</p><p><strong>Objective: </strong>This study aimed to investigate the application of large language models (LLMs) for the automated extraction of structured information from unstructured medical reports, using the LangChain framework in Python.</p><p><strong>Methods: </strong>Through a systematic evaluation of leading LLMs-GPT-4o, Llama 3, Llama 3.1, Gemma 2, Qwen 2, and Qwen 2.5-using zero-shot prompting techniques and embedding results into a vector database, this study assessed the performance of LLMs in extracting patient demographics, diagnostic details, and pharmacological data.</p><p><strong>Results: </strong>Evaluation metrics, including accuracy, precision, recall, and F<sub>1</sub>-score, revealed high efficacy across most categories, with GPT-4o achieving the highest overall performance (91.4% accuracy).</p><p><strong>Conclusions: </strong>The findings highlight notable differences in precision and recall between models, particularly in extracting names and age-related information. There were challenges in processing unstructured medical text, including variability in model performance across data types. Our findings demonstrate the feasibility of integrating LLMs into health care workflows; LLMs offer substantial improvements in data accessibility and support clinical decision-making processes. In addition, the paper describes the role of retrieval-augmented generation techniques in enhancing information retrieval accuracy, addressing issues such as hallucinations and outdated data in LLM outputs. Future work should explore the need for optimization through larger and more diverse training datasets, advanced prompting strategies, and the integration of domain-specific knowledge to improve model generalizability and precision.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e68776"},"PeriodicalIF":0.0,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12271962/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JMIR AI
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1