首页 > 最新文献

Journal of Medical Systems最新文献

英文 中文
Activities of Daily Living Detection through Energy Consumption Data and Machine Learning to Support Independent Aging. 通过能源消耗数据和机器学习来支持独立老龄化的日常生活检测活动。
IF 5.7 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-03 DOI: 10.1007/s10916-025-02256-2
Alejandro Pérez-Vereda, Jesús Fontecha, Adrián Sanchez-Miguel, Luis Cabañero, Iván González, Christopher Nugent

The aging population presents significant challenges for healthcare and social services, emphasizing the need for innovative solutions that support independent living. This study explores the feasibility of identifying Instrumental Activities of Daily Living (IADLs) through power consumption data collected from smart plug-based system. Using a combination of unsupervised and supervised machine learning techniques, including K-Means clustering and Long Short-Term Memory (LSTM) networks, we developed a method to classify and predict IADLs based on energy usage patterns. The REFIT dataset was used to train and validate the models, ensuring generalizability across different households. Results demonstrate that K-means clustering effectively group energy consumption patterns with Silhouette & DB algorithms in a reasonable time (Silhouette score of 0.88 and a Davies-Bouldin Index of 0.29), while LSTM models trained on monthly household data, demonstrated high rates of activities classified over time (with F1-Score of 0.99). IADLs like cooking, cleaning, and entertainment showed the highest classification accuracy due to their distinct energy features. This approach enables non-intrusive monitoring of daily routines, offering potential applications in Ambient Assisted Living (AAL) environments. Despite limitations in detecting activities without direct energy consumption, this study highlights the potential of energy-based activity recognition for promoting independent aging. Future work will focus on refining abnormal behavior detection and integrating additional contextual factors to improve accuracy.

人口老龄化对医疗保健和社会服务提出了重大挑战,强调需要支持独立生活的创新解决方案。本研究探讨了通过基于智能插头系统收集的功耗数据来识别日常生活工具活动(IADLs)的可行性。结合使用无监督和有监督机器学习技术,包括K-Means聚类和长短期记忆(LSTM)网络,我们开发了一种基于能源使用模式分类和预测iadl的方法。使用REFIT数据集来训练和验证模型,以确保不同家庭的通用性。结果表明,K-means聚类在合理的时间内有效地使用Silhouette和DB算法对能耗模式进行分组(Silhouette得分为0.88,戴维斯-布尔登指数为0.29),而使用月度家庭数据训练的LSTM模型显示出随时间分类的活动率很高(f1得分为0.99)。烹饪,清洁和娱乐等iadl由于其独特的能量特征而显示出最高的分类准确性。这种方法可以实现非侵入式的日常监测,在环境辅助生活(AAL)环境中提供潜在的应用。尽管在检测无直接能量消耗的活动方面存在局限性,但本研究强调了基于能量的活动识别在促进独立衰老方面的潜力。未来的工作将集中在改进异常行为检测和整合其他上下文因素以提高准确性。
{"title":"Activities of Daily Living Detection through Energy Consumption Data and Machine Learning to Support Independent Aging.","authors":"Alejandro Pérez-Vereda, Jesús Fontecha, Adrián Sanchez-Miguel, Luis Cabañero, Iván González, Christopher Nugent","doi":"10.1007/s10916-025-02256-2","DOIUrl":"https://doi.org/10.1007/s10916-025-02256-2","url":null,"abstract":"<p><p>The aging population presents significant challenges for healthcare and social services, emphasizing the need for innovative solutions that support independent living. This study explores the feasibility of identifying Instrumental Activities of Daily Living (IADLs) through power consumption data collected from smart plug-based system. Using a combination of unsupervised and supervised machine learning techniques, including K-Means clustering and Long Short-Term Memory (LSTM) networks, we developed a method to classify and predict IADLs based on energy usage patterns. The REFIT dataset was used to train and validate the models, ensuring generalizability across different households. Results demonstrate that K-means clustering effectively group energy consumption patterns with Silhouette & DB algorithms in a reasonable time (Silhouette score of 0.88 and a Davies-Bouldin Index of 0.29), while LSTM models trained on monthly household data, demonstrated high rates of activities classified over time (with F1-Score of 0.99). IADLs like cooking, cleaning, and entertainment showed the highest classification accuracy due to their distinct energy features. This approach enables non-intrusive monitoring of daily routines, offering potential applications in Ambient Assisted Living (AAL) environments. Despite limitations in detecting activities without direct energy consumption, this study highlights the potential of energy-based activity recognition for promoting independent aging. Future work will focus on refining abnormal behavior detection and integrating additional contextual factors to improve accuracy.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"124"},"PeriodicalIF":5.7,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145212421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GAN-Enhanced Hybrid Deep Learning with Explainable AI for Automated Cataract Diagnosis. gan增强混合深度学习与可解释的人工智能用于自动白内障诊断。
IF 5.7 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-02 DOI: 10.1007/s10916-025-02249-1
Shashank Mouli Satapathy, Mitali Gopinath Paul, Anusha Garg, Suhani Bhatnagar

Cataracts, among the most prevalent eye disorders, result in diminished vision due to cloudiness in the eye's natural lens. Timely diagnosis is crucial for preventing irreversible damage. While effective, existing automated systems encounter difficulties like limited dataset variety, lack of interpretability, and suboptimal generalization in real-world scenarios. This study presents a novel deep learning-based method that incorporates Generative AI (GenAI) and Explainable AI (XAI) to enhance cataract detection. The proposed methodology leverages a fine-tuned InceptionResNetV2 with additional layers, trained on a hybrid dataset enriched by merging six open-source datasets, along with synthetic images generated via Generative Adversarial Networks (GANs). Class weights address data imbalance, while stratified K-Fold cross-validation ensures robust evaluation. Our system offers graphical interpretation through Gradient-weighted Class Activation Mapping (Grad-CAM) heatmaps, supporting clinical transparency and reliability. The model evaluation reports a mean K-Fold accuracy of 97.58% with a standard deviation of 0.0040, and a 95% confidence interval (CI) of (0.9702, 0.9814). On the external dataset, the model achieved an overall accuracy of 97%, an AUC of 0.9944, and for the cataract class, a precision of 96%, recall (sensitivity) of 94%, F1-score of 95%. Our method, by incorporating synthetic images and explainable AI, ensures enhanced data diversity, addresses class imbalance, reduced dependency on large annotated datasets, and offers greater interpretability that facilitates expert validation and builds stronger clinical trust, making it superior to existing cataract detection systems.

白内障是最常见的眼部疾病之一,由于眼睛的自然晶状体浑浊而导致视力下降。及时诊断对于预防不可逆转的损害至关重要。虽然有效,但现有的自动化系统遇到了诸如数据集种类有限、缺乏可解释性以及在现实场景中的次优泛化等困难。本研究提出了一种基于深度学习的新方法,该方法结合了生成人工智能(GenAI)和可解释人工智能(XAI)来增强白内障检测。所提出的方法利用经过微调的带有额外层的InceptionResNetV2,在合并了六个开源数据集的混合数据集上进行训练,以及通过生成对抗网络(gan)生成的合成图像。类权重解决了数据不平衡,而分层K-Fold交叉验证确保了稳健的评估。我们的系统通过梯度加权类激活图(Grad-CAM)热图提供图形解释,支持临床透明度和可靠性。模型评价的K-Fold平均准确率为97.58%,标准差为0.0040,95%置信区间(CI)为(0.9702,0.9814)。在外部数据集上,该模型的总体准确率为97%,AUC为0.9944,对于白内障类别,准确率为96%,召回率(灵敏度)为94%,f1评分为95%。我们的方法结合了合成图像和可解释的人工智能,确保了增强的数据多样性,解决了类别失衡,减少了对大型注释数据集的依赖,并提供了更大的可解释性,促进了专家验证,建立了更强的临床信任,使其优于现有的白内障检测系统。
{"title":"GAN-Enhanced Hybrid Deep Learning with Explainable AI for Automated Cataract Diagnosis.","authors":"Shashank Mouli Satapathy, Mitali Gopinath Paul, Anusha Garg, Suhani Bhatnagar","doi":"10.1007/s10916-025-02249-1","DOIUrl":"https://doi.org/10.1007/s10916-025-02249-1","url":null,"abstract":"<p><p>Cataracts, among the most prevalent eye disorders, result in diminished vision due to cloudiness in the eye's natural lens. Timely diagnosis is crucial for preventing irreversible damage. While effective, existing automated systems encounter difficulties like limited dataset variety, lack of interpretability, and suboptimal generalization in real-world scenarios. This study presents a novel deep learning-based method that incorporates Generative AI (GenAI) and Explainable AI (XAI) to enhance cataract detection. The proposed methodology leverages a fine-tuned InceptionResNetV2 with additional layers, trained on a hybrid dataset enriched by merging six open-source datasets, along with synthetic images generated via Generative Adversarial Networks (GANs). Class weights address data imbalance, while stratified K-Fold cross-validation ensures robust evaluation. Our system offers graphical interpretation through Gradient-weighted Class Activation Mapping (Grad-CAM) heatmaps, supporting clinical transparency and reliability. The model evaluation reports a mean K-Fold accuracy of 97.58% with a standard deviation of 0.0040, and a 95% confidence interval (CI) of (0.9702, 0.9814). On the external dataset, the model achieved an overall accuracy of 97%, an AUC of 0.9944, and for the cataract class, a precision of 96%, recall (sensitivity) of 94%, F1-score of 95%. Our method, by incorporating synthetic images and explainable AI, ensures enhanced data diversity, addresses class imbalance, reduced dependency on large annotated datasets, and offers greater interpretability that facilitates expert validation and builds stronger clinical trust, making it superior to existing cataract detection systems.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"123"},"PeriodicalIF":5.7,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145206401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance of Large Language Models in Complex Anesthesia Decision-Making: A Comparative Study of Four LLMs in High-Risk Patients. 大型语言模型在复杂麻醉决策中的表现:四种高危患者LLMs的比较研究。
IF 5.7 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-01 DOI: 10.1007/s10916-025-02247-3
Qian Ruan, Jinghong Shi, Yunke Dai, Pingliang Yang, Na Zhu, Shun Wang

To evaluate and compare the performance of four Large Language Models (LLMs) in anesthesia decision-making for critically ill obstetric and geriatric patients and analyze their decision reliability across different surgical specialties. Prospective comparative analysis using standardized case evaluations. Four LLMs (ChatGPT-4o, Claude 3.5 Sonnet, DeepSeek-R1, and Grok 3). Thirty complex surgical cases (10 obstetric, 20 geriatric; 8 specialties) were analyzed. A 12-dimensional framework tested the models using unified prompts and decision points. Five trained anesthesiologists independently evaluated the models across six dimensions (patient assessment, anesthesia plan, risk management, individualization, contingency planning, decision logic; 1-10 scale, total 6-60). Overall, DeepSeek performed best (51.43 ± 2.74 points), significantly outperforming other models (P < 0.001). For obstetric cases, the mean scores were: DeepSeek (52.00 ± 1.83), Grok (49.40 ± 3.06), ChatGPT (47.60 ± 2.88), and Claude (46.60 ± 2.17). For geriatric cases, scores were: DeepSeek (51.15 ± 3.10), Grok (48.60 ± 2.33), ChatGPT (47.35 ± 2.50), and Claude (45.75 ± 2.05). Across specialties, all models performed best in hepatobiliary surgery, burn surgery, and thoracic surgery. DeepSeek demonstrated consistent performance across all dimensions, with notable advantages in decision logic (8.80 ± 0.40) and contingency planning (8.27 ± 0.45). All LLMs demonstrated strong anesthesia decision-making capabilities, with DeepSeek showing the best overall performance. Exploratory analysis revealed performance variations across specialties, although small sample sizes preclude definitive conclusions. Clinical implementation should consider specialty-specific factors and decision process characteristics.

评估和比较四种大型语言模型(LLMs)在产科和老年危重患者麻醉决策中的表现,并分析其在不同外科专科的决策可靠性。采用标准化案例评估的前瞻性比较分析。四个法学硕士(chatgpt - 40, Claude 3.5 Sonnet, DeepSeek-R1和Grok 3)。分析了30例复杂外科病例(产科10例,老年20例,8个专科)。一个12维框架使用统一的提示和决策点测试模型。5名训练有素的麻醉师从患者评估、麻醉计划、风险管理、个体化、应急计划、决策逻辑6个维度对模型进行独立评估;量表1-10分,总分6-60分。总体而言,DeepSeek的表现最好(51.43±2.74分),显著优于其他模型(P
{"title":"Performance of Large Language Models in Complex Anesthesia Decision-Making: A Comparative Study of Four LLMs in High-Risk Patients.","authors":"Qian Ruan, Jinghong Shi, Yunke Dai, Pingliang Yang, Na Zhu, Shun Wang","doi":"10.1007/s10916-025-02247-3","DOIUrl":"https://doi.org/10.1007/s10916-025-02247-3","url":null,"abstract":"<p><p>To evaluate and compare the performance of four Large Language Models (LLMs) in anesthesia decision-making for critically ill obstetric and geriatric patients and analyze their decision reliability across different surgical specialties. Prospective comparative analysis using standardized case evaluations. Four LLMs (ChatGPT-4o, Claude 3.5 Sonnet, DeepSeek-R1, and Grok 3). Thirty complex surgical cases (10 obstetric, 20 geriatric; 8 specialties) were analyzed. A 12-dimensional framework tested the models using unified prompts and decision points. Five trained anesthesiologists independently evaluated the models across six dimensions (patient assessment, anesthesia plan, risk management, individualization, contingency planning, decision logic; 1-10 scale, total 6-60). Overall, DeepSeek performed best (51.43 ± 2.74 points), significantly outperforming other models (P < 0.001). For obstetric cases, the mean scores were: DeepSeek (52.00 ± 1.83), Grok (49.40 ± 3.06), ChatGPT (47.60 ± 2.88), and Claude (46.60 ± 2.17). For geriatric cases, scores were: DeepSeek (51.15 ± 3.10), Grok (48.60 ± 2.33), ChatGPT (47.35 ± 2.50), and Claude (45.75 ± 2.05). Across specialties, all models performed best in hepatobiliary surgery, burn surgery, and thoracic surgery. DeepSeek demonstrated consistent performance across all dimensions, with notable advantages in decision logic (8.80 ± 0.40) and contingency planning (8.27 ± 0.45). All LLMs demonstrated strong anesthesia decision-making capabilities, with DeepSeek showing the best overall performance. Exploratory analysis revealed performance variations across specialties, although small sample sizes preclude definitive conclusions. Clinical implementation should consider specialty-specific factors and decision process characteristics.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"122"},"PeriodicalIF":5.7,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145199733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical Risk Computation by Large Language Models Using Validated Risk Scores. 使用验证风险评分的大型语言模型进行临床风险计算。
IF 5.7 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-09-30 DOI: 10.1007/s10916-025-02261-5
Kaan Kara, Tuba Gunel

Recent advances in artificial intelligence have propelled Large Language Models (LLMs) in natural language understanding, enabling new healthcare applications. While LLMs can analyze health data, directly predicting patient risk scores can be unreliable due to inaccuracies, biases, and difficulty interpreting complex medical data. A more trustworthy approach uses LLMs to calculate traditional clinical risk scores-validated, evidence-based formulas widely accepted in medicine. This improves validity, transparency, and safety by relying on established scoring systems rather than LLM-generated risk assessments, while still allowing LLMs to enhance clinical workflows through clear and interpretable explanations. In this study, we evaluated three public LLMs-GPT-4o-mini, DeepSeek v3, and Google Gemini 2.5 Flash-in calculating five clinical risk scores: CHA₂DS₂-VASc, HAS-BLED, Wells Score, Charlson Comorbidity Index, and Framingham Risk Score. We created 100 patient profiles (20 per score) representing diverse clinical scenarios and converted them into natural language clinical notes. These served as prompts for the LLMs to extract information and compute risk scores. We compared LLM-generated scores to reference scores from validated formulas using accuracy, precision, recall, F1 score, and Pearson correlation. GPT-4o-mini and Gemini 2.5 Flash outperformed DeepSeek v3, showing near-perfect agreement on most scores. However, all models struggled with the complex Framingham Risk Score, indicating challenges for general LLMs in complex risk calculations.

人工智能的最新进展推动了自然语言理解中的大型语言模型(llm),使新的医疗保健应用成为可能。虽然法学硕士可以分析健康数据,但由于不准确、偏差和难以解释复杂的医疗数据,直接预测患者风险评分可能不可靠。一种更可靠的方法是使用法学硕士来计算传统的临床风险评分——医学上广泛接受的经过验证的循证公式。这提高了有效性,透明度和安全性,依靠已建立的评分系统,而不是llm生成的风险评估,同时仍然允许llm通过清晰和可解释的解释来加强临床工作流程。在这项研究中,我们评估了三个公共llms - gft - 40 -mini, DeepSeek v3和谷歌Gemini 2.5 Flash-in,计算了五个临床风险评分:CHA₂DS₂-VASc, HAS-BLED, Wells评分,Charlson共病指数和Framingham风险评分。我们创建了100个代表不同临床场景的患者档案(每个分数20个),并将其转换为自然语言临床笔记。这些提示法学硕士提取信息并计算风险评分。我们使用准确性、精密度、召回率、F1分数和Pearson相关性将llm生成的分数与经过验证的公式的参考分数进行了比较。gpt - 40 -mini和Gemini 2.5 Flash的表现优于DeepSeek v3,在大多数得分上表现出近乎完美的一致。然而,所有模型都在复杂的Framingham风险评分中挣扎,这表明一般法学硕士在复杂风险计算方面面临挑战。
{"title":"Clinical Risk Computation by Large Language Models Using Validated Risk Scores.","authors":"Kaan Kara, Tuba Gunel","doi":"10.1007/s10916-025-02261-5","DOIUrl":"https://doi.org/10.1007/s10916-025-02261-5","url":null,"abstract":"<p><p>Recent advances in artificial intelligence have propelled Large Language Models (LLMs) in natural language understanding, enabling new healthcare applications. While LLMs can analyze health data, directly predicting patient risk scores can be unreliable due to inaccuracies, biases, and difficulty interpreting complex medical data. A more trustworthy approach uses LLMs to calculate traditional clinical risk scores-validated, evidence-based formulas widely accepted in medicine. This improves validity, transparency, and safety by relying on established scoring systems rather than LLM-generated risk assessments, while still allowing LLMs to enhance clinical workflows through clear and interpretable explanations. In this study, we evaluated three public LLMs-GPT-4o-mini, DeepSeek v3, and Google Gemini 2.5 Flash-in calculating five clinical risk scores: CHA₂DS₂-VASc, HAS-BLED, Wells Score, Charlson Comorbidity Index, and Framingham Risk Score. We created 100 patient profiles (20 per score) representing diverse clinical scenarios and converted them into natural language clinical notes. These served as prompts for the LLMs to extract information and compute risk scores. We compared LLM-generated scores to reference scores from validated formulas using accuracy, precision, recall, F1 score, and Pearson correlation. GPT-4o-mini and Gemini 2.5 Flash outperformed DeepSeek v3, showing near-perfect agreement on most scores. However, all models struggled with the complex Framingham Risk Score, indicating challenges for general LLMs in complex risk calculations.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"121"},"PeriodicalIF":5.7,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145191673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Pretraining Approach for Small-sample Training Employing Radiographs (PASTER): a Multimodal Transformer Trained by Chest Radiography and Free-text Reports. 利用x光片进行小样本训练的预训练方法(PASTER):由胸部x光片和自由文本报告训练的多模态变压器。
IF 5.7 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-09-30 DOI: 10.1007/s10916-025-02263-3
Kai-Chieh Chen, Matthew Kuo, Chun-Ho Lee, Hao-Chun Liao, Dung-Jang Tsai, Shing-An Lin, Chih-Wei Hsiang, Cheng-Kuang Chang, Kai-Hsiung Ko, Yi-Chih Hsu, Wei-Chou Chang, Guo-Shu Huang, Wen-Hui Fang, Chin-Sheng Lin, Shih-Hua Lin, Yuan-Hao Chen, Yi-Jen Hung, Chien-Sung Tsai, Chin Lin

While deep convolutional neural networks (DCNNs) have achieved remarkable performance in chest X-ray interpretation, their success typically depends on access to large-scale, expertly annotated datasets. However, collecting such data in real-world clinical settings can be difficult because of limited labeling resources, privacy concerns, and patient variability. In this study, we applied a multimodal Transformer pretrained on free-text reports and their paired CXRs to evaluate the effectiveness of this method in settings with limited labeled data. Our dataset consisted of more than 1 million CXRs, each accompanied by reports from board-certified radiologists and 31 structured labels. The results indicated that a linear model trained on embeddings from the pretrained model achieved AUCs of 0.907 and 0.903 on internal and external test sets, respectively, using only 128 cases and 384 controls; the results were comparable those of DenseNet trained on the entire dataset, whose AUCs were 0.908 and 0.903, respectively. Additionally, we demonstrated similar results by extending the application of this approach to a subset annotated with structured echocardiographic reports. Furthermore, this multimodal model exhibited excellent small sample learning capabilities when tested on external validation sets such as CheXpert and ChestX-ray14. This research significantly reduces the sample size necessary for future artificial intelligence advancements in CXR interpretation.

虽然深度卷积神经网络(DCNNs)在胸部x射线解释方面取得了显著的成绩,但它们的成功通常取决于对大规模、专业注释数据集的访问。然而,在现实世界的临床环境中收集这些数据可能很困难,因为有限的标签资源、隐私问题和患者的可变性。在这项研究中,我们应用了一个多模态Transformer对自由文本报告及其配对cxr进行预训练,以评估该方法在有限标记数据设置中的有效性。我们的数据集由100多万例cxr组成,每个cxr都附有委员会认证的放射科医生的报告和31个结构化标签。结果表明,在128例病例和384例对照中,基于预训练模型的嵌入训练的线性模型在内部和外部测试集上的auc分别为0.907和0.903;结果与DenseNet在整个数据集上训练的结果相当,其auc分别为0.908和0.903。此外,我们通过将这种方法的应用扩展到一个有结构化超声心动图报告注释的子集,证明了类似的结果。此外,当在外部验证集(如CheXpert和ChestX-ray14)上进行测试时,该多模态模型表现出出色的小样本学习能力。这项研究大大减少了未来人工智能在CXR解释方面取得进展所需的样本量。
{"title":"A Pretraining Approach for Small-sample Training Employing Radiographs (PASTER): a Multimodal Transformer Trained by Chest Radiography and Free-text Reports.","authors":"Kai-Chieh Chen, Matthew Kuo, Chun-Ho Lee, Hao-Chun Liao, Dung-Jang Tsai, Shing-An Lin, Chih-Wei Hsiang, Cheng-Kuang Chang, Kai-Hsiung Ko, Yi-Chih Hsu, Wei-Chou Chang, Guo-Shu Huang, Wen-Hui Fang, Chin-Sheng Lin, Shih-Hua Lin, Yuan-Hao Chen, Yi-Jen Hung, Chien-Sung Tsai, Chin Lin","doi":"10.1007/s10916-025-02263-3","DOIUrl":"https://doi.org/10.1007/s10916-025-02263-3","url":null,"abstract":"<p><p>While deep convolutional neural networks (DCNNs) have achieved remarkable performance in chest X-ray interpretation, their success typically depends on access to large-scale, expertly annotated datasets. However, collecting such data in real-world clinical settings can be difficult because of limited labeling resources, privacy concerns, and patient variability. In this study, we applied a multimodal Transformer pretrained on free-text reports and their paired CXRs to evaluate the effectiveness of this method in settings with limited labeled data. Our dataset consisted of more than 1 million CXRs, each accompanied by reports from board-certified radiologists and 31 structured labels. The results indicated that a linear model trained on embeddings from the pretrained model achieved AUCs of 0.907 and 0.903 on internal and external test sets, respectively, using only 128 cases and 384 controls; the results were comparable those of DenseNet trained on the entire dataset, whose AUCs were 0.908 and 0.903, respectively. Additionally, we demonstrated similar results by extending the application of this approach to a subset annotated with structured echocardiographic reports. Furthermore, this multimodal model exhibited excellent small sample learning capabilities when tested on external validation sets such as CheXpert and ChestX-ray14. This research significantly reduces the sample size necessary for future artificial intelligence advancements in CXR interpretation.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"120"},"PeriodicalIF":5.7,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145191630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Patient Identification Accuracy in Shared Child Health Records: a Hybrid Approach for the Lao Language Context. 提高共享儿童健康记录中患者识别的准确性:老挝语背景下的混合方法。
IF 5.7 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-09-26 DOI: 10.1007/s10916-025-02260-6
Thepphouthone Sorsavanh, Chang Liu, Goshiro Yamamoto, Yukiko Mori, Shinji Kobayashi, Tomohiro Kuroda

The Shared Child Health Record (SCHR) project in Lao People's Democratic Republic (PDR) aims to enhance pediatric health care services and health outcomes by enabling data exchange between health care systems. However, persistent challenges of duplication due to patient identification are hindered by non-Latin script complexities, including phonetic variations, a tonal alphabet, and temporary naming practices (e.g., placeholder names such as "Eanoi"). Existing patient-matching algorithms designed for Latin scripts underperform in this context. We assessed deterministic, probabilistic, and hybrid matching approaches using a Lao SCHR dataset of 20,433 records. A manual gold standard review (3,191 matches) validated their performance. Probabilistic matching employed the Fellegi-Sunter model with Jaro‒Winkler similarity, whereas the hybrid method combined deterministic rules (exact name/DOB matches) and probabilistic adjustments for unresolved cases. The hybrid and probabilistic methods consistently outperformed deterministic matching, achieving a 90% recall rate on the SCHR dataset. Despite its lower performance in Lao health records, the hybrid method resolved approximately 2,872 duplicates in SCHR. Challenges included twin records (shared identifiers) and temporary-to-permanent name transitions. This study is the first to adapt patient-matching methodologies for Lao's linguistic and infrastructural context. While hybrid methods show promise, performance gaps persist compared with those of Latin-based systems. These findings have significant implications with respect to improving the accuracy and efficiency of HIE systems in Lao PDR and other resource-limited settings.Clinical trial number: Not applicable.

老挝人民民主共和国(PDR)的共享儿童健康记录(SCHR)项目旨在通过实现卫生保健系统之间的数据交换,加强儿科卫生保健服务和卫生成果。然而,由于非拉丁字母的复杂性,包括语音变化、音调字母和临时命名实践(例如占位符名称,如“Eanoi”),阻碍了患者识别的持续重复挑战。在这种情况下,为拉丁脚本设计的现有患者匹配算法表现不佳。我们使用老挝SCHR的20,433条记录数据集评估了确定性、概率和混合匹配方法。手工金标准审查(3191场比赛)验证了他们的表现。概率匹配采用了具有Jaro-Winkler相似性的Fellegi-Sunter模型,而混合方法结合了确定性规则(确切名称/DOB匹配)和未解决案例的概率调整。混合和概率方法始终优于确定性匹配,在SCHR数据集上实现了90%的召回率。尽管混合方法在老挝卫生记录中的表现较差,但它解决了大约2,872个重复的SCHR。挑战包括双记录(共享标识符)和从临时到永久的名称转换。这项研究是第一个适应老挝语言和基础设施背景的患者匹配方法。虽然混合方法显示出希望,但与基于拉丁语的系统相比,性能差距仍然存在。这些发现对于提高老挝人民民主共和国和其他资源有限环境中HIE系统的准确性和效率具有重要意义。临床试验号:不适用。
{"title":"Enhancing Patient Identification Accuracy in Shared Child Health Records: a Hybrid Approach for the Lao Language Context.","authors":"Thepphouthone Sorsavanh, Chang Liu, Goshiro Yamamoto, Yukiko Mori, Shinji Kobayashi, Tomohiro Kuroda","doi":"10.1007/s10916-025-02260-6","DOIUrl":"10.1007/s10916-025-02260-6","url":null,"abstract":"<p><p>The Shared Child Health Record (SCHR) project in Lao People's Democratic Republic (PDR) aims to enhance pediatric health care services and health outcomes by enabling data exchange between health care systems. However, persistent challenges of duplication due to patient identification are hindered by non-Latin script complexities, including phonetic variations, a tonal alphabet, and temporary naming practices (e.g., placeholder names such as \"Eanoi\"). Existing patient-matching algorithms designed for Latin scripts underperform in this context. We assessed deterministic, probabilistic, and hybrid matching approaches using a Lao SCHR dataset of 20,433 records. A manual gold standard review (3,191 matches) validated their performance. Probabilistic matching employed the Fellegi-Sunter model with Jaro‒Winkler similarity, whereas the hybrid method combined deterministic rules (exact name/DOB matches) and probabilistic adjustments for unresolved cases. The hybrid and probabilistic methods consistently outperformed deterministic matching, achieving a 90% recall rate on the SCHR dataset. Despite its lower performance in Lao health records, the hybrid method resolved approximately 2,872 duplicates in SCHR. Challenges included twin records (shared identifiers) and temporary-to-permanent name transitions. This study is the first to adapt patient-matching methodologies for Lao's linguistic and infrastructural context. While hybrid methods show promise, performance gaps persist compared with those of Latin-based systems. These findings have significant implications with respect to improving the accuracy and efficiency of HIE systems in Lao PDR and other resource-limited settings.Clinical trial number: Not applicable.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"119"},"PeriodicalIF":5.7,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12474658/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145149405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Resectability Classification of Pancreatic Cancer CT Reports with Privacy-Preserving Open-Weight Large Language Models: A Multicenter Study. 基于隐私保护开放权重大语言模型的胰腺癌CT报告可切除性自动分类:一项多中心研究。
IF 5.7 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-09-24 DOI: 10.1007/s10916-025-02248-2
Jeong Hyun Lee, Ji Hye Min, Kyowon Gu, Seungchul Han, Jeong Ah Hwang, Seo-Youn Choi, Kyoung Doo Song, Jeong Eun Lee, Jisun Lee, Ji Eun Moon, Hasmik Adetyan, Ju Dong Yang

Purpose:  To evaluate the effectiveness of open-weight large language models (LLMs) in extracting key radiological features and determining National Comprehensive Cancer Network (NCCN) resectability status from free-text radiology reports for pancreatic ductal adenocarcinoma (PDAC). Methods. Prompts were developed using 30 fictitious reports, internally validated on 100 additional fictitious reports, and tested using 200 real reports from two institutions (January 2022 to December 2023). Two radiologists established ground truth for 18 key features and resectability status. Gemma-2-27b-it and Llama-3-70b-instruct models were evaluated using recall, precision, F1-score, extraction accuracy, and overall resectability accuracy. Statistical analyses included McNemar's test and mixed-effects logistic regression. Results. In internal validation, Llama had significantly higher recall than Gemma (99% vs. 95%, p < 0.01) and slightly higher extraction accuracy (98% vs. 97%). Llama also demonstrated higher overall resectability accuracy (93% vs. 91%). In the internal test set, both models achieved 96% recall and 96% extraction accuracy. Overall resectability accuracy was 95% for Llama and 93% for Gemma. In the external test set, both models had 93% recall. Extraction accuracy was 93% for Llama and 95% for Gemma. Gemma achieved higher overall resectability accuracy (89% vs. 83%), but the difference was not statistically significant (p > 0.05). Conclusion. Open-weight models accurately extracted key radiological features and determined NCCN resectability status from free-text PDAC reports. While internal dataset performance was robust, performance on external data decreased, highlighting the need for institution-specific optimization.

目的:评估开放重量大语言模型(LLMs)在从自由文本胰腺导管腺癌(PDAC)放射学报告中提取关键放射特征和确定国家综合癌症网络(NCCN)可切除状态方面的有效性。方法。使用30个虚构报告开发提示,对另外100个虚构报告进行内部验证,并使用来自两个机构(2022年1月至2023年12月)的200个真实报告进行测试。两名放射科医生建立了18个关键特征和可切除状态的基本事实。对gma -2-27b-it和llama -3-70b- directive模型进行召回率、精密度、f1评分、提取准确度和总体可切除准确度评估。统计分析包括McNemar检验和混合效应logistic回归。结果。在内部验证中,Llama的召回率显著高于Gemma(99%比95%,p 0.05)。结论。开重模型准确地提取了关键的放射学特征,并从自由文本PDAC报告中确定了NCCN的可切除状态。虽然内部数据集的性能是稳健的,但外部数据的性能下降,突出了机构特定优化的必要性。
{"title":"Automated Resectability Classification of Pancreatic Cancer CT Reports with Privacy-Preserving Open-Weight Large Language Models: A Multicenter Study.","authors":"Jeong Hyun Lee, Ji Hye Min, Kyowon Gu, Seungchul Han, Jeong Ah Hwang, Seo-Youn Choi, Kyoung Doo Song, Jeong Eun Lee, Jisun Lee, Ji Eun Moon, Hasmik Adetyan, Ju Dong Yang","doi":"10.1007/s10916-025-02248-2","DOIUrl":"10.1007/s10916-025-02248-2","url":null,"abstract":"<p><strong>Purpose: </strong> To evaluate the effectiveness of open-weight large language models (LLMs) in extracting key radiological features and determining National Comprehensive Cancer Network (NCCN) resectability status from free-text radiology reports for pancreatic ductal adenocarcinoma (PDAC). Methods. Prompts were developed using 30 fictitious reports, internally validated on 100 additional fictitious reports, and tested using 200 real reports from two institutions (January 2022 to December 2023). Two radiologists established ground truth for 18 key features and resectability status. Gemma-2-27b-it and Llama-3-70b-instruct models were evaluated using recall, precision, F1-score, extraction accuracy, and overall resectability accuracy. Statistical analyses included McNemar's test and mixed-effects logistic regression. Results. In internal validation, Llama had significantly higher recall than Gemma (99% vs. 95%, p < 0.01) and slightly higher extraction accuracy (98% vs. 97%). Llama also demonstrated higher overall resectability accuracy (93% vs. 91%). In the internal test set, both models achieved 96% recall and 96% extraction accuracy. Overall resectability accuracy was 95% for Llama and 93% for Gemma. In the external test set, both models had 93% recall. Extraction accuracy was 93% for Llama and 95% for Gemma. Gemma achieved higher overall resectability accuracy (89% vs. 83%), but the difference was not statistically significant (p > 0.05). Conclusion. Open-weight models accurately extracted key radiological features and determined NCCN resectability status from free-text PDAC reports. While internal dataset performance was robust, performance on external data decreased, highlighting the need for institution-specific optimization.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"118"},"PeriodicalIF":5.7,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145131065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating Pediatric Anesthesia Sustainability Metrics into Native Electronic Health Records: A Clinical Informatics Approach. 整合儿科麻醉可持续性指标到本地电子健康记录:临床信息学方法。
IF 5.7 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-09-22 DOI: 10.1007/s10916-025-02259-z
Mandy Lam, Ashley Wu, Karna Patel, Elaine Ng, Eric Greenwood, Clyde Matava

The growing emphasis on sustainability in healthcare has highlighted anesthetic gases as notable contrib utors to the sector's greenhouse gas emissions. While adult anesthesia practices have increasingly adopted mitigation strategies, such as using lower fresh gas flows and total intravenous anesthesia, pediatric anesthesia poses distinct challenges due to the unique physiological and pharmacological requirements of neonates, infants, and children. The use of third-party applications for accessing anesthesia medical record data is costly. This technical report describes the development and implementation of pediatric-specific anesthesia sustainability metrics and integrating these metrics in native electronic health record systems for real-time data capture and feedback. Using a nominal consensus group process, 24 pediatric-focused metrics were identified across key perioperative phases. Subsequent integration into Epic's Anesthesia module facilitated automated data collection and the creation of interactive dashboards, which offer both department-wide and individualized provider feedback. Our report describes the feasibility of designing novel pediatric-specific sustainability metrics that can be used within the electronic medical record to benchmark environmental goals in pediatric anesthesia practice.

医疗保健行业对可持续性的日益重视凸显了麻醉气体对该行业温室气体排放的显著贡献。虽然成人麻醉实践越来越多地采用缓解策略,例如使用较低的新鲜气体流量和全静脉麻醉,但由于新生儿、婴儿和儿童独特的生理和药理学要求,儿科麻醉面临着明显的挑战。使用第三方应用程序访问麻醉医疗记录数据是昂贵的。本技术报告描述了儿科特定麻醉可持续性指标的开发和实施,并将这些指标整合到本地电子健康记录系统中,以实现实时数据捕获和反馈。采用名义上的共识小组过程,在关键围手术期确定了24项以儿科为重点的指标。随后集成到Epic的Anesthesia模块中,促进了自动数据收集和交互式仪表板的创建,从而提供整个部门和个性化的供应商反馈。我们的报告描述了设计新颖的儿科特定可持续性指标的可行性,该指标可用于电子病历中,以基准儿科麻醉实践中的环境目标。
{"title":"Integrating Pediatric Anesthesia Sustainability Metrics into Native Electronic Health Records: A Clinical Informatics Approach.","authors":"Mandy Lam, Ashley Wu, Karna Patel, Elaine Ng, Eric Greenwood, Clyde Matava","doi":"10.1007/s10916-025-02259-z","DOIUrl":"https://doi.org/10.1007/s10916-025-02259-z","url":null,"abstract":"<p><p>The growing emphasis on sustainability in healthcare has highlighted anesthetic gases as notable contrib utors to the sector's greenhouse gas emissions. While adult anesthesia practices have increasingly adopted mitigation strategies, such as using lower fresh gas flows and total intravenous anesthesia, pediatric anesthesia poses distinct challenges due to the unique physiological and pharmacological requirements of neonates, infants, and children. The use of third-party applications for accessing anesthesia medical record data is costly. This technical report describes the development and implementation of pediatric-specific anesthesia sustainability metrics and integrating these metrics in native electronic health record systems for real-time data capture and feedback. Using a nominal consensus group process, 24 pediatric-focused metrics were identified across key perioperative phases. Subsequent integration into Epic's Anesthesia module facilitated automated data collection and the creation of interactive dashboards, which offer both department-wide and individualized provider feedback. Our report describes the feasibility of designing novel pediatric-specific sustainability metrics that can be used within the electronic medical record to benchmark environmental goals in pediatric anesthesia practice.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"117"},"PeriodicalIF":5.7,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145113341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Social Network Analysis of Secure Text Messaging Metadata During Clinical Deterioration in an Inpatient Children's Hospital Setting. 住院儿童医院临床恶化期间安全短信元数据的社会网络分析
IF 5.7 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-09-19 DOI: 10.1007/s10916-025-02250-8
Andrew Harold Smith, Brant Tudor, Vishnu Mohan, Mohamed A Rehman, Luis Ahumada

Mitigating clinical deterioration relies upon recognition (afferent limb) and interventions (efferent limb) by a healthcare team. Healthcare provider (HP) communication by text messaging plays a role in facilitating both limbs in the inpatient setting. We sought to quantitatively characterize healthcare provider team communications through the social network analysis (SNA) of secure text messages exchanged in the inpatient setting, and as they relate to a subgroup of patients demonstrating a deterioration during their hospitalization. Messages linked to inpatients exchanged between HPs over a 12-month period, including a cohort of messages linked to patients experiencing deterioration were analyzed using SNA. Subnetworks corresponding to individual patient encounters were constructed, including a series of subnetworks pertaining to patients with an impending clinical deterioration. Network and network participant characteristics were calculated and analyzed. From October 2022 through September 2023 there were 1,065,225 messages delivered by 3,272 HPs, associated with 4,328 inpatient hospital encounters, of which 120 hospital encounters were associated with a deterioration. SNA demonstrated significantly higher measures of eigenvector centrality among frontline providers (FLP) including advanced practice providers and housestaff, relative to attending physician (p < 0.001) and registered nurses (p < 0.001), consistent with greater influence of the FLP on information dissemination through the entire network. Within individual subnetworks associated with the care of patients experiencing a clinical deterioration, FLP participants demonstrated greater overall network influence (p = 0.032) relative to FLP counterparts in networks not associated with a deterioration, despite comparable numbers of participants and connections. Using SNA, we quantitatively characterized a text messaging network within an inpatient hospital setting, demonstrating the importance of FLPs on information dissemination, a finding demonstrated specifically within subnetworks dedicated to the care of individual deteriorating patients. Understanding characteristics of a dynamic communication network of healthcare providers may prove a valuable target in facilitating communication and in mitigating the risks of deterioration.IRB Approval: Johns Hopkins Medicine IRB (#CIR00419339).Clinical trial number: Not applicable.

减轻临床恶化依赖于医疗团队的识别(传入肢体)和干预(传出肢体)。通过短信进行的医疗保健提供者(HP)通信在住院患者环境中发挥着促进四肢的作用。我们试图通过在住院患者环境中交换的安全短信的社会网络分析(SNA)来定量表征医疗保健提供者团队通信,因为它们与住院期间表现出病情恶化的患者亚组相关。使用SNA分析了在12个月内hp之间交换的与住院患者相关的信息,包括与经历病情恶化的患者相关的一组信息。构建了与个体患者遭遇相对应的子网络,包括一系列与即将出现临床恶化的患者相关的子网络。计算并分析了网络和网络参与者的特征。从2022年10月到2023年9月,3272名hp发送了1,065225条信息,与4,328名住院患者相关,其中120名住院患者与病情恶化有关。相对于主治医生,SNA在包括高级执业医师和家政人员在内的一线提供者(FLP)中表现出显著更高的特征向量中心性测量(p
{"title":"Social Network Analysis of Secure Text Messaging Metadata During Clinical Deterioration in an Inpatient Children's Hospital Setting.","authors":"Andrew Harold Smith, Brant Tudor, Vishnu Mohan, Mohamed A Rehman, Luis Ahumada","doi":"10.1007/s10916-025-02250-8","DOIUrl":"10.1007/s10916-025-02250-8","url":null,"abstract":"<p><p>Mitigating clinical deterioration relies upon recognition (afferent limb) and interventions (efferent limb) by a healthcare team. Healthcare provider (HP) communication by text messaging plays a role in facilitating both limbs in the inpatient setting. We sought to quantitatively characterize healthcare provider team communications through the social network analysis (SNA) of secure text messages exchanged in the inpatient setting, and as they relate to a subgroup of patients demonstrating a deterioration during their hospitalization. Messages linked to inpatients exchanged between HPs over a 12-month period, including a cohort of messages linked to patients experiencing deterioration were analyzed using SNA. Subnetworks corresponding to individual patient encounters were constructed, including a series of subnetworks pertaining to patients with an impending clinical deterioration. Network and network participant characteristics were calculated and analyzed. From October 2022 through September 2023 there were 1,065,225 messages delivered by 3,272 HPs, associated with 4,328 inpatient hospital encounters, of which 120 hospital encounters were associated with a deterioration. SNA demonstrated significantly higher measures of eigenvector centrality among frontline providers (FLP) including advanced practice providers and housestaff, relative to attending physician (p < 0.001) and registered nurses (p < 0.001), consistent with greater influence of the FLP on information dissemination through the entire network. Within individual subnetworks associated with the care of patients experiencing a clinical deterioration, FLP participants demonstrated greater overall network influence (p = 0.032) relative to FLP counterparts in networks not associated with a deterioration, despite comparable numbers of participants and connections. Using SNA, we quantitatively characterized a text messaging network within an inpatient hospital setting, demonstrating the importance of FLPs on information dissemination, a finding demonstrated specifically within subnetworks dedicated to the care of individual deteriorating patients. Understanding characteristics of a dynamic communication network of healthcare providers may prove a valuable target in facilitating communication and in mitigating the risks of deterioration.IRB Approval: Johns Hopkins Medicine IRB (#CIR00419339).Clinical trial number: Not applicable.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"116"},"PeriodicalIF":5.7,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12449403/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145086339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large Language Models in Neurology Treatment Decision-Making: a Scoping Review. 神经病学治疗决策中的大型语言模型:范围综述。
IF 5.7 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-09-16 DOI: 10.1007/s10916-025-02254-4
Rushabh Shah, Fabrice Jotterand

This scoping review evaluates the expanding role of large language models (LLMs) in neurology, an area drawing growing interest of researchers and clinicians alike. A substantial existing body of literature supports the efficacy of LLMs for diagnostic applications. However, clinicians' emerging point of interest now lies in understanding the applications of LLMs in guiding treatment decisions. Our study therefore aims to synthesize and evaluate existing neurological studies focused on LLMs in treatment decision-making. A comprehensive search was conducted in the electronic databases OVID/Medline, Web of Science, and the Cochrane Library through September 18th, 2024. Inclusion criteria included original studies published within the last five years focused on evaluating the efficacy of LLMs in treatment decision-making in neurology. The protocol was registered on the Open Science Framework ( https://doi.org/10.17605/OSF.IO/Y6N3E ). Four studies were identified. ChatGPT was the LLM utilized in each article, though varying in model versions. Each study demonstrated positive outcomes across varying metrics, with models generally aligning with clinician decisions. However, the lack of observed studies and variability of neurological topics limit the generalizability of these AI tools. This scoping review analyzes the existing body of evidence on LLMs in treatment decision-making in neurology. While current studies suggest potential to support clinical care, there is insufficient evidence at this stage to claim outcome improvement. Findings are not yet generalizable across neurological practice, as existing promise appears limited to narrow use cases. Prospective validation across subspecialties is needed to support broader clinical application.

这篇综述评估了大型语言模型(llm)在神经病学中不断扩大的作用,这是一个吸引研究人员和临床医生越来越感兴趣的领域。大量现有文献支持llm在诊断应用中的有效性。然而,临床医生现在的兴趣点在于理解法学硕士在指导治疗决策中的应用。因此,我们的研究旨在综合和评估现有的神经学研究,重点关注llm在治疗决策中的作用。在OVID/Medline、Web of Science和Cochrane Library的电子数据库中进行了全面的检索,截止到2024年9月18日。纳入标准包括过去五年内发表的原始研究,重点是评估llm在神经病学治疗决策中的疗效。该方案已在开放科学框架(https://doi.org/10.17605/OSF.IO/Y6N3E)上注册。确定了四项研究。ChatGPT是每篇文章中使用的LLM,尽管模型版本有所不同。每项研究在不同的指标上都显示出积极的结果,模型通常与临床医生的决定一致。然而,缺乏观察性研究和神经学主题的可变性限制了这些人工智能工具的推广。本综述分析了llm在神经病学治疗决策中的现有证据。虽然目前的研究表明有可能支持临床护理,但现阶段没有足够的证据表明结果有所改善。研究结果还不能在整个神经学实践中推广,因为现有的前景似乎仅限于狭窄的用例。需要跨亚专科的前瞻性验证以支持更广泛的临床应用。
{"title":"Large Language Models in Neurology Treatment Decision-Making: a Scoping Review.","authors":"Rushabh Shah, Fabrice Jotterand","doi":"10.1007/s10916-025-02254-4","DOIUrl":"10.1007/s10916-025-02254-4","url":null,"abstract":"<p><p>This scoping review evaluates the expanding role of large language models (LLMs) in neurology, an area drawing growing interest of researchers and clinicians alike. A substantial existing body of literature supports the efficacy of LLMs for diagnostic applications. However, clinicians' emerging point of interest now lies in understanding the applications of LLMs in guiding treatment decisions. Our study therefore aims to synthesize and evaluate existing neurological studies focused on LLMs in treatment decision-making. A comprehensive search was conducted in the electronic databases OVID/Medline, Web of Science, and the Cochrane Library through September 18th, 2024. Inclusion criteria included original studies published within the last five years focused on evaluating the efficacy of LLMs in treatment decision-making in neurology. The protocol was registered on the Open Science Framework ( https://doi.org/10.17605/OSF.IO/Y6N3E ). Four studies were identified. ChatGPT was the LLM utilized in each article, though varying in model versions. Each study demonstrated positive outcomes across varying metrics, with models generally aligning with clinician decisions. However, the lack of observed studies and variability of neurological topics limit the generalizability of these AI tools. This scoping review analyzes the existing body of evidence on LLMs in treatment decision-making in neurology. While current studies suggest potential to support clinical care, there is insufficient evidence at this stage to claim outcome improvement. Findings are not yet generalizable across neurological practice, as existing promise appears limited to narrow use cases. Prospective validation across subspecialties is needed to support broader clinical application.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"115"},"PeriodicalIF":5.7,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145069653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Medical Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1