首页 > 最新文献

International Journal of Medical Informatics最新文献

英文 中文
Predictive value of machine learning for mortality risk in aortic dissection: a systematic review and meta-analysis 机器学习对主动脉夹层死亡风险的预测价值:系统回顾和荟萃分析
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-11 DOI: 10.1016/j.ijmedinf.2026.106271
Zhihong Han , Baixin Li , Jie Liu

Background

Aortic dissection (AD) is a critical cardiovascular disorder with substantial risks of short-term mortality. Some researchers have endeavored to utilize machine learning (ML) approaches to develop predictive models for the risk of mortality in AD. However, systematic evidence about the accuracy of these models remains scarce, which poses challenges to the development and enhancement of risk assessment tools. Therefore, this study seeks to systematically review the reliability of ML in forecasting the risk of mortality in AD.

Methods

A search was implemented through PubMed, Cochrane, Embase, and Web of Science up to September 11, 2025. The prediction model risk of bias (RoB) assessment tool (PROBAST) was leveraged to estimate the RoB of the included studies. Subgroup analyses were implemented based upon types of AD and time of death.

Results

In total, 35 studies were included, covering 19,838 patients with AD. The results showed that, within the training datasets, ML models demonstrated a sensitivity (SEN) of 0.75 (95% CI: 0.72–0.78) and specificity (SPE) of 0.77 (95% CI: 0.74–0.80) for predicting mortality in AD. Within the validation set, which mainly focused on TAAD, the SEN was 0.79 (95% CI: 0.74–0.84) and the SPE was 0.78 (95% CI: 0.68–0.85). For in-hospital mortality, the SEN was 0.78 (95% CI: 0.72–0.83) and the SPE was 0.77 (95% CI: 0.65–0.86); for out-of-hospital mortality, the SEN and SPE were 0.81–0.84 and 0.74–0.86.

Conclusion

ML models demonstrate remarkable accuracy in forecasting the risk of mortality in AD and show superior performance relative to existing scoring systems to some extent. Future research should incorporate more multi-center, multi-ethnic, and geographically varied cases to develop a more broadly applicable risk prediction tool and offer insights for the tailored prevention strategies.
主动脉夹层(AD)是一种严重的心血管疾病,具有短期死亡的重大风险。一些研究人员努力利用机器学习(ML)方法来开发阿尔茨海默病死亡风险的预测模型。然而,关于这些模型准确性的系统证据仍然很少,这对风险评估工具的开发和增强提出了挑战。因此,本研究旨在系统地回顾ML预测AD患者死亡风险的可靠性。方法检索截止到2025年9月11日的PubMed、Cochrane、Embase和Web of Science。运用预测模型偏倚风险评估工具(PROBAST)估计纳入研究的偏倚风险。根据AD类型和死亡时间进行亚组分析。结果共纳入35项研究,共19,838例AD患者。结果显示,在训练数据集中,ML模型预测AD死亡率的敏感性(SEN)为0.75 (95% CI: 0.72-0.78),特异性(SPE)为0.77 (95% CI: 0.74-0.80)。在主要关注TAAD的验证集中,SEN为0.79 (95% CI: 0.74-0.84), SPE为0.78 (95% CI: 0.68-0.85)。对于院内死亡率,SEN为0.78 (95% CI: 0.72-0.83), SPE为0.77 (95% CI: 0.65-0.86);院外死亡率的SEN和SPE分别为0.81 ~ 0.84和0.74 ~ 0.86。结论ml模型在预测AD死亡风险方面具有较好的准确性,在一定程度上优于现有评分系统。未来的研究应纳入更多的多中心、多民族和地域差异的病例,以开发更广泛适用的风险预测工具,并为量身定制的预防策略提供见解。
{"title":"Predictive value of machine learning for mortality risk in aortic dissection: a systematic review and meta-analysis","authors":"Zhihong Han ,&nbsp;Baixin Li ,&nbsp;Jie Liu","doi":"10.1016/j.ijmedinf.2026.106271","DOIUrl":"10.1016/j.ijmedinf.2026.106271","url":null,"abstract":"<div><h3>Background</h3><div>Aortic dissection (AD) is a critical cardiovascular disorder with substantial risks of short-term mortality. Some researchers have endeavored to utilize machine learning (ML) approaches to develop predictive models for the risk of mortality in AD. However, systematic evidence about the accuracy of these models remains scarce, which poses challenges to the development and enhancement of risk assessment tools. Therefore, this study seeks to systematically review the reliability of ML in forecasting the risk of mortality in AD.</div></div><div><h3>Methods</h3><div>A search was implemented through PubMed, Cochrane, Embase, and Web of Science up to September 11, 2025. The prediction model risk of bias (RoB) assessment tool (PROBAST) was leveraged to estimate the RoB of the included studies. Subgroup analyses were implemented based upon types of AD and time of death.</div></div><div><h3>Results</h3><div>In total, 35 studies were included, covering 19,838 patients with AD. The results showed that, within the training datasets, ML models demonstrated a sensitivity (SEN) of 0.75 (95% CI: 0.72–0.78) and specificity (SPE) of 0.77 (95% CI: 0.74–0.80) for predicting mortality in AD. Within the validation set, which mainly focused on TAAD, the SEN was 0.79 (95% CI: 0.74–0.84) and the SPE was 0.78 (95% CI: 0.68–0.85). For in-hospital mortality, the SEN was 0.78 (95% CI: 0.72–0.83) and the SPE was 0.77 (95% CI: 0.65–0.86); for out-of-hospital mortality, the SEN and SPE were 0.81–0.84 and 0.74–0.86.</div></div><div><h3>Conclusion</h3><div>ML models demonstrate remarkable accuracy in forecasting the risk of mortality in AD and show superior performance relative to existing scoring systems to some extent. Future research should incorporate more multi-center, multi-ethnic, and geographically varied cases to develop a more broadly applicable risk prediction tool and offer insights for the tailored prevention strategies.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106271"},"PeriodicalIF":4.1,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated extraction of fluoropyrimidine treatment and treatment-related toxicities from clinical notes using natural language processing 使用自然语言处理从临床记录中自动提取氟嘧啶治疗和治疗相关毒性
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-10 DOI: 10.1016/j.ijmedinf.2026.106276
Xizhi Wu , Madeline S. Kreider , Philip E. Empey , Chenyu Li , Yanshan Wang

Objective

Fluoropyrimidines are widely prescribed for colorectal and breast cancers, but are associated with toxicities such as hand-foot syndrome and cardiotoxicity. Since toxicity documentation is often embedded in clinical notes, we aimed to develop and evaluate natural language processing (NLP) methods to extract treatment and toxicity information.

Materials and methods

We constructed a gold-standard dataset of 236 clinical notes from 204,165 adult oncology patients. Domain experts annotated categories related to treatment regimens and toxicities. We developed rule-based, machine learning-based (Random Forest [RF], Support Vector Machine [SVM], Logistic Regression [LR]), deep learning-based (BERT, ClinicalBERT), and large language models (LLM)-based NLP approaches (zero-shot and error analysis prompting). A 5-fold cross validation were conducted to validate each model.

Results

Error analysis prompting achieved optimal precision, recall, and F1 scores for treatment (F1 = 1.000) and toxicities extraction (F1 = 0.965), whereas zero-shot perform moderately (treatment F1 = 0.889, toxicities extraction F1 = 0.854) Rule-based reached F1 = 1.000 for treatment and F1 = 0.904 for toxicities extraction. LR and SVM ranked second and fourth for toxicities extraction (LR F1 = 0.914, SVM F1 = 0.903). Deep learning and RF underperformed, with performance of BERT reached F1 = 0.792 for treatment and F1 = 0.837 for toxicities extraction.,ClinicalBERT reached F1 = 0.797 for treatment and F1 = 0.884 for toxicities extraction). RF reached F1 = 0.745 for treatment and F1 = 0.853 for toxicities extraction.

Discussion

LMM-based error analysis outperformed all others, followed by machine learning methods. Machine learning and deep learning methods were limited by small training data and showed limited generalizability, particularly for rare categories.

Conclusion

LLM-based error analysis most effectively extracted fluoropyrimidine treatment and toxicity information from clinical notes, and has strong potential to support oncology research and pharmacovigilance.
目的氟嘧啶广泛用于结直肠癌和乳腺癌,但与手足综合征和心脏毒性等毒性有关。由于毒性文件通常嵌入在临床记录中,我们旨在开发和评估自然语言处理(NLP)方法来提取治疗和毒性信息。材料与方法我们构建了一个金标准数据集,包含来自204,165名成年肿瘤患者的236份临床记录。领域专家注释了与治疗方案和毒性有关的类别。我们开发了基于规则的、基于机器学习的(随机森林[RF]、支持向量机[SVM]、逻辑回归[LR])、基于深度学习的(BERT、ClinicalBERT)和基于大型语言模型(LLM)的NLP方法(零射击和错误分析提示)。对每个模型进行5次交叉验证。结果serror分析提示在治疗和毒理提取的精密度、召回率和F1得分(F1 = 1.000)均达到了最佳水平(F1 = 0.965),而zero-shot表现中等(治疗F1 = 0.889,毒理提取F1 = 0.854),基于规则的治疗F1 = 1.000,毒理提取F1 = 0.904。LR和SVM的毒性提取效果分别为2、4位(LR F1 = 0.914, SVM F1 = 0.903)。深度学习和RF表现不佳,BERT在治疗方面的表现为F1 = 0.792,在毒性提取方面的表现为F1 = 0.837。,ClinicalBERT在治疗方面达到F1 = 0.797,毒性提取方面达到F1 = 0.884)。治疗组RF为F1 = 0.745,毒副作用提取组RF为F1 = 0.853。讨论基于lmm的误差分析优于所有其他方法,其次是机器学习方法。机器学习和深度学习方法受到小型训练数据的限制,并且泛化能力有限,特别是对于罕见的类别。结论基于llm的误差分析能最有效地从临床记录中提取氟嘧啶的治疗和毒性信息,在支持肿瘤研究和药物警戒方面具有很强的潜力。
{"title":"Automated extraction of fluoropyrimidine treatment and treatment-related toxicities from clinical notes using natural language processing","authors":"Xizhi Wu ,&nbsp;Madeline S. Kreider ,&nbsp;Philip E. Empey ,&nbsp;Chenyu Li ,&nbsp;Yanshan Wang","doi":"10.1016/j.ijmedinf.2026.106276","DOIUrl":"10.1016/j.ijmedinf.2026.106276","url":null,"abstract":"<div><h3>Objective</h3><div>Fluoropyrimidines are widely prescribed for colorectal and breast cancers, but are associated with toxicities such as hand-foot syndrome and cardiotoxicity. Since toxicity documentation is often embedded in clinical notes, we aimed to develop and evaluate natural language processing (NLP) methods to extract treatment and toxicity information.</div></div><div><h3>Materials and methods</h3><div>We constructed a gold-standard dataset of 236 clinical notes from 204,165 adult oncology patients. Domain experts annotated categories related to treatment regimens and toxicities. We developed rule-based, machine learning-based (Random Forest [RF], Support Vector Machine [SVM], Logistic Regression [LR]), deep learning-based (BERT, ClinicalBERT), and large language models (LLM)-based NLP approaches (zero-shot and error analysis prompting). A 5-fold cross validation were conducted to validate each model.</div></div><div><h3>Results</h3><div>Error analysis prompting achieved optimal precision, recall, and F1 scores for treatment (F1 = 1.000) and toxicities extraction (F1 = 0.965), whereas zero-shot perform moderately (treatment F1 = 0.889, toxicities extraction F1 = 0.854) Rule-based reached F1 = 1.000 for treatment and F1 = 0.904 for toxicities extraction. LR and SVM ranked second and fourth for toxicities extraction (LR F1 = 0.914, SVM F1 = 0.903). Deep learning and RF underperformed, with performance of BERT reached F1 = 0.792 for treatment and F1 = 0.837 for toxicities extraction.,ClinicalBERT reached F1 = 0.797 for treatment and F1 = 0.884 for toxicities extraction). RF reached F1 = 0.745 for treatment and F1 = 0.853 for toxicities extraction.</div></div><div><h3>Discussion</h3><div>LMM-based error analysis outperformed all others, followed by machine learning methods. Machine learning and deep learning methods were limited by small training data and showed limited generalizability, particularly for rare categories.</div></div><div><h3>Conclusion</h3><div>LLM-based error analysis most effectively extracted fluoropyrimidine treatment and toxicity information from clinical notes, and has strong potential to support oncology research and pharmacovigilance.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106276"},"PeriodicalIF":4.1,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An old disease, a new linguistic challenge for large language models: patient education on psoriasis and psoriatic arthritis in an underrepresented medical language 一种古老的疾病,对大型语言模型的新的语言挑战:在代表性不足的医学语言中对银屑病和银屑病关节炎的患者教育
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-10 DOI: 10.1016/j.ijmedinf.2025.106246
Ahmet Ugur Atilan, Niyazi Cetin

Objective

Large Language Models (LLMs) are increasingly applied to patient education, yet their performance in languages that are relatively underrepresented in medical-domain corpora and large language model training datasets remains underexplored. Psoriasis and psoriatic arthritis (PsA) are chronic, immune-mediated diseases requiring lifelong patient engagement, making them suitable conditions to evaluate the clarity, reliability, and inclusivity of AI-generated educational content. To assess the comprehensibility, scientific reliability, and patient-centered communication of Turkish patient education materials for psoriasis vulgaris and PsA generated by seven state-of-the-art LLMs.

Methods

A cross-sectional analysis compared outputs from ChatGPT-4o, Gemini 2.0 Flash, Claude 3.7 Sonnet, Grok 3, Qwen 2.5, DeepSeek R1, and Mistral Large 2. Brochures were produced using standardized zero-shot prompts and evaluated via the Ateşman readability index and the DISCERN instrument. Overall differences in DISCERN scores across the seven models were assessed using a Friedman test, followed by Bonferroni-adjusted Wilcoxon signed-rank post-hoc analyses.

Results

Readability scores ranged from 61.6 to 80.2 (mean = 71.3 ± 6.9), with ChatGPT-4o and Qwen 2.5 generating the most accessible texts. DISCERN reliability scores ranged from 38.5 to 60.5, with Claude 3.7 Sonnet and Gemini 2.0 Flash showing the highest accuracy. Models prioritizing factual precision produced denser language, while conversational models favored fluency but sacrificed depth. Notable variation was observed, with only Claude 3.7 Sonnet and Gemini 2.0 Flash consistently reflecting patient-centered perspectives.

Conclusion

LLMs showed observable differences in balancing clarity and reliability when generating health education leaflets in Turkish. Most outputs appeared to lack explicit psychosocial framing and emphasis on shared decision-making, which may suggest the need for more culturally adaptive training, clinician oversight, and locally grounded validation frameworks to support safe and inclusive AI-based patient education.
大型语言模型(llm)越来越多地应用于患者教育,但它们在医学领域语料库和大型语言模型训练数据集中相对代表性不足的语言中的表现仍未得到充分探索。银屑病和银屑病关节炎(PsA)是一种慢性、免疫介导的疾病,需要患者终身参与,因此它们是评估人工智能生成的教育内容的清晰度、可靠性和包容性的合适条件。评估由7位最先进的法学硕士生成的土耳其寻常型牛皮癣和PsA患者教育材料的可理解性、科学可靠性和以患者为中心的交流。方法横断面分析比较chatgpt - 40、Gemini 2.0 Flash、Claude 3.7 Sonnet、Grok 3、Qwen 2.5、DeepSeek R1和Mistral Large 2的输出。使用标准化的零射击提示制作小册子,并通过ate可读性指数和DISCERN仪器进行评估。七个模型中辨别得分的总体差异采用弗里德曼检验进行评估,随后采用bonferroni调整的Wilcoxon符号秩事后分析。结果可读性评分范围为61.6 ~ 80.2(平均= 71.3±6.9),其中chatgpt - 40和qwen2.5生成的文本可读性最高。DISCERN的可靠性得分从38.5到60.5不等,克劳德3.7十四行诗和双子座2.0闪光显示出最高的准确性。优先考虑事实准确性的模型产生了更密集的语言,而会话模型倾向于流利,但牺牲了深度。观察到显著的差异,只有克劳德3.7十四行诗和双子座2.0闪光一致地反映了以患者为中心的观点。结论llm在制作土耳其语健康教育宣传单的平衡清晰度和可靠性方面存在显著差异。大多数产出似乎缺乏明确的社会心理框架和对共同决策的强调,这可能表明需要更多的文化适应性培训、临床医生监督和基于当地的验证框架,以支持安全和包容的基于人工智能的患者教育。
{"title":"An old disease, a new linguistic challenge for large language models: patient education on psoriasis and psoriatic arthritis in an underrepresented medical language","authors":"Ahmet Ugur Atilan,&nbsp;Niyazi Cetin","doi":"10.1016/j.ijmedinf.2025.106246","DOIUrl":"10.1016/j.ijmedinf.2025.106246","url":null,"abstract":"<div><h3>Objective</h3><div>Large Language Models (LLMs) are increasingly applied to patient education, yet their performance in languages that are relatively underrepresented in medical-domain corpora and large language model training datasets remains underexplored. Psoriasis and psoriatic arthritis (PsA) are chronic, immune-mediated diseases requiring lifelong patient engagement, making them suitable conditions to evaluate the clarity, reliability, and inclusivity of AI-generated educational content. To assess the comprehensibility, scientific reliability, and patient-centered communication of Turkish patient education materials for psoriasis vulgaris and PsA generated by seven state-of-the-art LLMs.</div></div><div><h3>Methods</h3><div>A cross-sectional analysis compared outputs from ChatGPT-4o, Gemini 2.0 Flash, Claude 3.7 Sonnet, Grok 3, Qwen 2.5, DeepSeek R1, and Mistral Large 2. Brochures were produced using standardized zero-shot prompts and evaluated via the Ateşman readability index and the DISCERN instrument. Overall differences in DISCERN scores across the seven models were assessed using a Friedman test, followed by Bonferroni-adjusted Wilcoxon signed-rank post-hoc analyses.</div></div><div><h3>Results</h3><div>Readability scores ranged from 61.6 to 80.2 (mean = 71.3 ± 6.9), with ChatGPT-4o and Qwen 2.5 generating the most accessible texts. DISCERN reliability scores ranged from 38.5 to 60.5, with Claude 3.7 Sonnet and Gemini 2.0 Flash showing the highest accuracy. Models prioritizing factual precision produced denser language, while conversational models favored fluency but sacrificed depth. Notable variation was observed, with only Claude 3.7 Sonnet and Gemini 2.0 Flash consistently reflecting patient-centered perspectives.</div></div><div><h3>Conclusion</h3><div>LLMs showed observable differences in balancing clarity and reliability when generating health education leaflets in Turkish. Most outputs appeared to lack explicit psychosocial framing and emphasis on shared decision-making, which may suggest the need for more culturally adaptive training, clinician oversight, and locally grounded validation frameworks to support safe and inclusive AI-based patient education.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106246"},"PeriodicalIF":4.1,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empowering caregivers of individuals with autism spectrum disorder through sensor-based monitoring of emotional dysregulation: A scoping review 通过基于传感器的情绪失调监测赋予自闭症谱系障碍患者照顾者权力:范围综述
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-09 DOI: 10.1016/j.ijmedinf.2026.106262
Moid Sandhu , Siddique Latif , Andrew Bayor , Wei Lu , Mahnoosh Kholghi , Deepa Prabhu , David Silvera-Tawil
Objective: This paper critically reviews existing work in sensor-based emotional dysregulation monitoring to support caregivers of individuals diagnosed with autism spectrum disorder (ASD).
Methods: A systematic literature search was conducted across six databases (Google Scholar, IEEE Xplore, Scopus, ACM Digital Library, Web of Science, and PubMed) covering publications from January 1, 2016, to September 30, 2025.
Results: Thirty-two studies met inclusion criteria, comprising 27 focused on sensor-based emotional dysregulation detection and 5 addressing intervention or support mechanisms. These studies suggest that sensor-based technologies have potential for continuous physiological monitoring, facilitating early detection and intervention to support emotional dysregulation episodes. Critical deficiencies were identified in real-time alerting capabilities, autonomous intervention deployment, self-regulation framework integration, system reliability, long-term sustainability, user interface design, and cross-environment scalability.
Conclusion: There is a significant need to develop real-time emotion monitoring systems to empower caregivers in delivering timely, targeted interventions for individuals diagnosed with ASD. Future research should prioritise the development of real-time alert systems, autonomous intervention protocols, and solutions optimised for reliability, sustainability, usability, and adaptability across heterogeneous care settings.
目的:本文综述了基于传感器的情绪失调监测的现有工作,以支持自闭症谱系障碍(ASD)患者的护理人员。方法:对6个数据库(b谷歌Scholar、IEEE Xplore、Scopus、ACM Digital Library、Web of Science和PubMed)进行系统文献检索,检索时间为2016年1月1日至2025年9月30日。结果:32项研究符合纳入标准,其中27项关注基于传感器的情绪失调检测,5项关注干预或支持机制。这些研究表明,基于传感器的技术具有持续生理监测的潜力,有助于早期发现和干预,以支持情绪失调发作。在实时警报能力、自主干预部署、自我调节框架集成、系统可靠性、长期可持续性、用户界面设计和跨环境可扩展性方面发现了关键缺陷。结论:迫切需要开发实时情绪监测系统,使护理人员能够为ASD患者提供及时、有针对性的干预措施。未来的研究应优先发展实时警报系统、自主干预协议和解决方案,以优化可靠性、可持续性、可用性和跨异构护理环境的适应性。
{"title":"Empowering caregivers of individuals with autism spectrum disorder through sensor-based monitoring of emotional dysregulation: A scoping review","authors":"Moid Sandhu ,&nbsp;Siddique Latif ,&nbsp;Andrew Bayor ,&nbsp;Wei Lu ,&nbsp;Mahnoosh Kholghi ,&nbsp;Deepa Prabhu ,&nbsp;David Silvera-Tawil","doi":"10.1016/j.ijmedinf.2026.106262","DOIUrl":"10.1016/j.ijmedinf.2026.106262","url":null,"abstract":"<div><div><em>Objective:</em> This paper critically reviews existing work in sensor-based emotional dysregulation monitoring to support caregivers of individuals diagnosed with autism spectrum disorder (ASD).</div><div><em>Methods:</em> A systematic literature search was conducted across six databases (Google Scholar, IEEE Xplore, Scopus, ACM Digital Library, Web of Science, and PubMed) covering publications from January 1, 2016, to September 30, 2025.</div><div><em>Results:</em> Thirty-two studies met inclusion criteria, comprising 27 focused on sensor-based emotional dysregulation detection and 5 addressing intervention or support mechanisms. These studies suggest that sensor-based technologies have potential for continuous physiological monitoring, facilitating early detection and intervention to support emotional dysregulation episodes. Critical deficiencies were identified in real-time alerting capabilities, autonomous intervention deployment, self-regulation framework integration, system reliability, long-term sustainability, user interface design, and cross-environment scalability.</div><div><em>Conclusion:</em> There is a significant need to develop real-time emotion monitoring systems to empower caregivers in delivering timely, targeted interventions for individuals diagnosed with ASD. Future research should prioritise the development of real-time alert systems, autonomous intervention protocols, and solutions optimised for reliability, sustainability, usability, and adaptability across heterogeneous care settings.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106262"},"PeriodicalIF":4.1,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From promise to practice: strengthening evidence for AI conversational agents in healthcare 从承诺到实践:加强医疗保健中AI会话代理的证据
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-09 DOI: 10.1016/j.ijmedinf.2026.106264
Yang Gao, Yingjie Lu, Xiaofei Li
{"title":"From promise to practice: strengthening evidence for AI conversational agents in healthcare","authors":"Yang Gao,&nbsp;Yingjie Lu,&nbsp;Xiaofei Li","doi":"10.1016/j.ijmedinf.2026.106264","DOIUrl":"10.1016/j.ijmedinf.2026.106264","url":null,"abstract":"","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106264"},"PeriodicalIF":4.1,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the safety of patient-centred discharge medication instructions generated by an AI model 评估人工智能模型生成的以患者为中心的出院用药说明的安全性
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-08 DOI: 10.1016/j.ijmedinf.2026.106272
Michael Tang , Kristina Markova , Kristian Stanceski , Sharleen Zhong , Marguerite Tracy , Linda Koria , Sarita Lo , Xumou Zhang , Angela Pan , Jinman Kim , Julie Ayre , Adam G. Dunn

Background

The use of AI to support patient-centred communication could improve health outcomes but little is known about the equity of AI tools. We evaluated the completeness and accuracy of an AI tool that produces patient-centred medication information for patients following discharge from hospital, for different patient groups.

Methods

We evaluated differences in the completeness and safety of AI-generated (GPT-4o) patient-centred medication instructions across age groups, patient complexity, and insurance type. AI-generated medication instructions were evaluated by clinical experts for the proportion of medications that were correctly represented, described in Universal Medication Schedule (UMS) form, and presence of safety issues. We tested for significant differences in completeness and safety between groups in 140 discharge summaries sampled from the Medical Information Mark for Intensive Care (MIMIC) database.

Results

The proportion of patient-centred discharge instructions where all medications were included was 95 % (133/140) with a median of 6.0 medications (IQR 3.0–10.0). For most of the 140 cases, all medications from the discharge summary were correctly included (median 100 % included, IQR 83.3 %–100 %) and new medications were rarely added by AI, but a lower proportion of medications were presented in UMS format (median 22.5 %, IQR 0.0 %–92.5 %). Despite most medications being included, potential safety issues were identified in 69.3 % (97/140). There was no evidence of a difference in the correctness of included medications across age groups (p = 0.70), patient complexity (p = 0.72), or insurance type (p = 0.70). There was no evidence of a difference in proportion of medications in UMS format across age groups (p = 0.88), patient complexity (p = 0.94), or insurance type (p = 0.49). There was evidence of a difference in the proportion of cases with at least one potential safety issue across age groups (p = 0.031), patient complexity (p < 0.001) and insurance types (p = 0.047).

Conclusions

We found evidence of a difference in safety issues in AI-generated medication instructions for older, more complex patients, and patients with certain types of insurance. Health system and contextual differences could create unexpected variations in AI-generated outputs. Studies of AI-generated messaging for patients should consider the severity and likelihood of safety issues, localised trials, and ongoing auditing.
使用人工智能支持以患者为中心的沟通可以改善健康结果,但人们对人工智能工具的公平性知之甚少。我们评估了一种人工智能工具的完整性和准确性,该工具为不同患者群体的出院患者提供以患者为中心的用药信息。方法我们评估了人工智能生成的(gpt - 40)以患者为中心的用药说明在不同年龄组、患者复杂性和保险类型方面的完整性和安全性差异。临床专家对人工智能生成的药物说明进行了评估,以通用药物时间表(UMS)形式正确描述药物的比例,以及是否存在安全问题。我们从重症监护医疗信息标志(MIMIC)数据库中抽取140份出院摘要,对两组之间的完整性和安全性进行了显著差异测试。结果以患者为中心的出院说明书中包含所有药物的比例为95%(133/140),中位数为6.0种药物(IQR 3.0 ~ 10.0)。在140例病例中,大多数病例出院总结中的所有药物均被正确纳入(中位数100%纳入,IQR为83.3% - 100%),AI添加新药物的情况很少,但以UMS格式呈现的药物比例较低(中位数22.5%,IQR为0.0% - 92.5%)。尽管纳入了大多数药物,但69.3%(97/140)的研究发现了潜在的安全性问题。没有证据表明不同年龄组(p = 0.70)、患者复杂程度(p = 0.72)或保险类型(p = 0.70)在纳入药物的正确性方面存在差异。没有证据表明不同年龄组的UMS格式药物比例(p = 0.88)、患者复杂程度(p = 0.94)或保险类型(p = 0.49)存在差异。有证据表明,在不同年龄组(p = 0.031)、患者复杂程度(p < 0.001)和保险类型(p = 0.047)中,存在至少一种潜在安全问题的病例比例存在差异。我们发现,对于年龄较大、病情较复杂的患者和有特定类型保险的患者,人工智能生成的药物说明在安全问题上存在差异。卫生系统和环境的差异可能会在人工智能生成的输出中产生意想不到的变化。对人工智能生成的患者信息的研究应考虑安全问题的严重性和可能性、局部试验和持续审计。
{"title":"Assessing the safety of patient-centred discharge medication instructions generated by an AI model","authors":"Michael Tang ,&nbsp;Kristina Markova ,&nbsp;Kristian Stanceski ,&nbsp;Sharleen Zhong ,&nbsp;Marguerite Tracy ,&nbsp;Linda Koria ,&nbsp;Sarita Lo ,&nbsp;Xumou Zhang ,&nbsp;Angela Pan ,&nbsp;Jinman Kim ,&nbsp;Julie Ayre ,&nbsp;Adam G. Dunn","doi":"10.1016/j.ijmedinf.2026.106272","DOIUrl":"10.1016/j.ijmedinf.2026.106272","url":null,"abstract":"<div><h3>Background</h3><div>The use of AI to support patient-centred communication could improve health outcomes but little is known about the equity of AI tools. We evaluated the completeness and accuracy of an AI tool that produces patient-centred medication information for patients following discharge from hospital, for different patient groups.</div></div><div><h3>Methods</h3><div>We evaluated differences in the completeness and safety of AI-generated (GPT-4o) patient-centred medication instructions across age groups, patient complexity, and insurance type. AI-generated medication instructions were evaluated by clinical experts for the proportion of medications that were correctly represented, described in Universal Medication Schedule (UMS) form, and presence of safety issues. We tested for significant differences in completeness and safety between groups in 140 discharge summaries sampled from the Medical Information Mark for Intensive Care (MIMIC) database.</div></div><div><h3>Results</h3><div>The proportion of patient-centred discharge instructions where all medications were included was 95 % (133/140) with a median of 6.0 medications (IQR 3.0–10.0). For most of the 140 cases, all medications from the discharge summary were correctly included (median 100 % included, IQR 83.3 %–100 %) and new medications were rarely added by AI, but a lower proportion of medications were presented in UMS format (median 22.5 %, IQR 0.0 %–92.5 %). Despite most medications being included, potential safety issues were identified in 69.3 % (97/140). There was no evidence of a difference in the correctness of included medications across age groups (p = 0.70), patient complexity (p = 0.72), or insurance type (p = 0.70). There was no evidence of a difference in proportion of medications in UMS format across age groups (p = 0.88), patient complexity (p = 0.94), or insurance type (p = 0.49). There was evidence of a difference in the proportion of cases with at least one potential safety issue across age groups (p = 0.031), patient complexity (p &lt; 0.001) and insurance types (p = 0.047).</div></div><div><h3>Conclusions</h3><div>We found evidence of a difference in safety issues in AI-generated medication instructions for older, more complex patients, and patients with certain types of insurance. Health system and contextual differences could create unexpected variations in AI-generated outputs. Studies of AI-generated messaging for patients should consider the severity and likelihood of safety issues, localised trials, and ongoing auditing.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106272"},"PeriodicalIF":4.1,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable machine learning-based prediction of liver metastasis risk in elderly patients with small cell lung Cancer: A study based on the SEER database and external validation in a Chinese cohort 基于可解释机器学习的老年小细胞肺癌患者肝转移风险预测:基于SEER数据库和中国队列外部验证的研究
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-08 DOI: 10.1016/j.ijmedinf.2026.106274
Hang Chen , Wenchao Dai , Jun Yang , Xin Dang , Li Jiang

Purpose

Small cell lung cancer (SCLC) is a highly aggressive malignancy with a high incidence of liver metastases, particularly among elderly patients, which significantly worsens survival outcomes. However, efficient predictive tools targeting this population remain scarce. This study aimed to develop and validate an interpretable machine learning-based model to re-stratify the risk of liver metastasis in elderly patients with SCLC after completion of routine staging evaluation at initial diagnosis.

Methods

A total of 10,080 patients aged ≥60 years with histologically confirmed SCLC were included from the SEER database (2010–2017) and the Affiliated Hospital of North Sichuan Medical College, China (2010–2024). Patients from SEER were randomly assigned to a training set (n = 7719) and an internal validation set (n = 1930), while 431 patients from China comprised the external validation set. Feature selection was performed using the Boruta algorithm, identifying 11 key variables. Seven ML models, namely, Logistic Regression, Naïve Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, XGBoost, and LightGBM, were developed to compare their predictive performance. The optimal model was further interpreted using SHAP (SHapley Additive exPlanations).

Results

The incidence of liver metastasis was approximately 32.89%, 35.39%, and 32.71% in the training, internal validation, and external validation sets, respectively. Comparative analysis across models demonstrated that, in the internal validation set, XGBoost achieved the best overall discriminative performance, with an AUC of 0.820, slightly outperforming LightGBM (0.819), logistic regression (0.813), and random forest (0.811). In the external validation set, the performance of all models declined. Given its relatively superior predictive performance, XGBoost was selected as the final model for interpretability analyses. SHAP analysis indicated that LDS/EDS, tumor stage, bone metastasis, and brain metastasis were the most influential features contributing to the model predictions.

Conclusion

The XGBoost-based model exhibited moderate predictive value and satisfactory interpretability in assessing the risk of liver metastasis in patients with SCLC, suggesting its potential utility as an adjunctive decision-support tool following initial diagnostic staging. Nevertheless, its generalizability across different populations requires further validation, and localized recalibration may be necessary prior to broader clinical implementation.
小细胞肺癌(SCLC)是一种高度侵袭性的恶性肿瘤,肝转移发生率高,尤其是在老年患者中,这明显恶化了生存结果。然而,针对这一人群的有效预测工具仍然很少。本研究旨在开发和验证一个可解释的基于机器学习的模型,以重新分层老年SCLC患者在初始诊断完成常规分期评估后的肝转移风险。方法从SEER数据库(2010-2017年)和川北医学院附属医院(2010-2024年)共纳入10080例年龄≥60岁组织学证实的SCLC患者。来自SEER的患者被随机分配到训练集(n = 7719)和内部验证集(n = 1930),而来自中国的431名患者组成外部验证集。采用Boruta算法进行特征选择,识别出11个关键变量。开发了逻辑回归、Naïve贝叶斯、支持向量机(SVM)、决策树、随机森林、XGBoost和LightGBM 7种ML模型,比较它们的预测性能。使用SHapley加性解释(SHapley Additive explanation)进一步解释最优模型。结果训练组、内部验证组和外部验证组的肝转移发生率分别约为32.89%、35.39%和32.71%。跨模型对比分析表明,在内部验证集中,XGBoost的整体判别性能最好,AUC为0.820,略优于LightGBM(0.819)、logistic回归(0.813)和随机森林(0.811)。在外部验证集中,所有模型的性能都下降了。鉴于其相对优越的预测性能,我们选择XGBoost作为可解释性分析的最终模型。SHAP分析表明,LDS/EDS、肿瘤分期、骨转移和脑转移是对模型预测影响最大的特征。结论基于xgboost的模型在评估SCLC患者肝转移风险方面具有中等的预测价值和令人满意的可解释性,提示其作为初始诊断分期后辅助决策支持工具的潜在用途。然而,其在不同人群中的普遍性需要进一步验证,在更广泛的临床应用之前,可能需要进行局部重新校准。
{"title":"Interpretable machine learning-based prediction of liver metastasis risk in elderly patients with small cell lung Cancer: A study based on the SEER database and external validation in a Chinese cohort","authors":"Hang Chen ,&nbsp;Wenchao Dai ,&nbsp;Jun Yang ,&nbsp;Xin Dang ,&nbsp;Li Jiang","doi":"10.1016/j.ijmedinf.2026.106274","DOIUrl":"10.1016/j.ijmedinf.2026.106274","url":null,"abstract":"<div><h3>Purpose</h3><div>Small cell lung cancer (SCLC) is a highly aggressive malignancy with a high incidence of liver metastases, particularly among elderly patients, which significantly worsens survival outcomes. However, efficient predictive tools targeting this population remain scarce. This study aimed to develop and validate an interpretable machine learning-based model to re-stratify the risk of liver metastasis in elderly patients with SCLC after completion of routine staging evaluation at initial diagnosis.</div></div><div><h3>Methods</h3><div>A total of 10,080 patients aged ≥60 years with histologically confirmed SCLC were included from the SEER database (2010–2017) and the Affiliated Hospital of North Sichuan Medical College, China (2010–2024). Patients from SEER were randomly assigned to a training set (n = 7719) and an internal validation set (n = 1930), while 431 patients from China comprised the external validation set. Feature selection was performed using the Boruta algorithm, identifying 11 key variables. Seven ML models, namely, Logistic Regression, Naïve Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, XGBoost, and LightGBM, were developed to compare their predictive performance. The optimal model was further interpreted using SHAP (SHapley Additive exPlanations).</div></div><div><h3>Results</h3><div>The incidence of liver metastasis was approximately 32.89%, 35.39%, and 32.71% in the training, internal validation, and external validation sets, respectively. Comparative analysis across models demonstrated that, in the internal validation set, XGBoost achieved the best overall discriminative performance, with an AUC of 0.820, slightly outperforming LightGBM (0.819), logistic regression (0.813), and random forest (0.811). In the external validation set, the performance of all models declined. Given its relatively superior predictive performance, XGBoost was selected as the final model for interpretability analyses. SHAP analysis indicated that LDS/EDS, tumor stage, bone metastasis, and brain metastasis were the most influential features contributing to the model predictions.</div></div><div><h3>Conclusion</h3><div>The XGBoost-based model exhibited moderate predictive value and satisfactory interpretability in assessing the risk of liver metastasis in patients with SCLC, suggesting its potential utility as an adjunctive decision-support tool following initial diagnostic staging. Nevertheless, its generalizability across different populations requires further validation, and localized recalibration may be necessary prior to broader clinical implementation.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106274"},"PeriodicalIF":4.1,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incorporating patient history into the insulin sensitivity prediction in intensive care by feedforward neural network models 应用前馈神经网络模型将患者病史纳入重症监护患者胰岛素敏感性预测。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-08 DOI: 10.1016/j.ijmedinf.2026.106273
Bálint Szabó , J.Geoffrey Chase , Balázs Benyó

Background and Objective

Insulin sensitivity prediction is crucial for model-based treatment in Intensive Care Unit patients, particularly those with hyperglycemia. However, predicting insulin sensitivity is challenging due to inter- and intra-patient variability.

Methods

Different neural network models are proposed and compared for predicting insulin sensitivity, including recurrent and feedforward versions of the Classification Deep Neural Network and Mixture Density Network models. These models were trained using 1879 patient records containing 123,988 insulin sensitivity values from three intensive care patient cohorts in three different countries.

Results

Results show that using patient history in prediction models can improve the accuracy of insulin sensitivity predictions. The Mixture Density Network model provided more accurate predictions, measured by a problem-specific metric that expresses clinical requirements. We demonstrated that even using up to 12 h of historical data can improve prediction accuracy.

Conclusion

This study highlights the potential of recurrent neural network models in predicting insulin sensitivity in Intensive Care Unit patients. Our findings suggest that using recurrent neural networks and incorporating patient history can lead to more accurate predictions. These results are generalizable due to the large and diverse dataset employed, which included patients from three different cohorts in three care settings.
背景和目的:胰岛素敏感性预测对于重症监护病房患者,特别是高血糖患者的模型治疗至关重要。然而,由于患者之间和患者内部的可变性,预测胰岛素敏感性是具有挑战性的。方法:提出并比较了用于预测胰岛素敏感性的不同神经网络模型,包括循环和前馈版本的分类深度神经网络和混合密度网络模型。这些模型使用1879例患者记录进行训练,其中包含来自三个不同国家的三个重症监护患者队列的123,988个胰岛素敏感性值。结果:结果表明,在预测模型中使用患者病史可以提高胰岛素敏感性预测的准确性。混合密度网络模型提供了更准确的预测,通过表达临床需求的特定问题度量来测量。我们证明即使使用长达12小时的历史数据也可以提高预测精度。结论:本研究强调了循环神经网络模型在预测重症监护病房患者胰岛素敏感性方面的潜力。我们的研究结果表明,使用循环神经网络并结合患者病史可以做出更准确的预测。由于采用了庞大而多样化的数据集,其中包括来自三个不同护理环境的三个不同队列的患者,因此这些结果具有普遍性。
{"title":"Incorporating patient history into the insulin sensitivity prediction in intensive care by feedforward neural network models","authors":"Bálint Szabó ,&nbsp;J.Geoffrey Chase ,&nbsp;Balázs Benyó","doi":"10.1016/j.ijmedinf.2026.106273","DOIUrl":"10.1016/j.ijmedinf.2026.106273","url":null,"abstract":"<div><h3>Background and Objective</h3><div>Insulin sensitivity prediction is crucial for model-based treatment in Intensive Care Unit patients, particularly those with hyperglycemia. However, predicting insulin sensitivity is challenging due to inter- and intra-patient variability.</div></div><div><h3>Methods</h3><div>Different neural network models are proposed and compared for predicting insulin sensitivity, including recurrent and feedforward versions of the Classification Deep Neural Network and Mixture Density Network models. These models were trained using 1879 patient records containing 123,988 insulin sensitivity values from three intensive care patient cohorts in three different countries.</div></div><div><h3>Results</h3><div>Results show that using patient history in prediction models can improve the accuracy of insulin sensitivity predictions. The Mixture Density Network model provided more accurate predictions, measured by a problem-specific metric that expresses clinical requirements. We demonstrated that even using up to 12 h of historical data can improve prediction accuracy.</div></div><div><h3>Conclusion</h3><div>This study highlights the potential of recurrent neural network models in predicting insulin sensitivity in Intensive Care Unit patients. Our findings suggest that using recurrent neural networks and incorporating patient history can lead to more accurate predictions. These results are generalizable due to the large and diverse dataset employed, which included patients from three different cohorts in three care settings.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106273"},"PeriodicalIF":4.1,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large Language Models’ Performances regarding logical observation identifiers names and codes mapping in laboratory medicine: A comparative analysis of ChatGPT-4.0, Gemini, and Perplexity 大型语言模型在检验医学中关于逻辑观察标识符名称和代码映射的性能:ChatGPT-4.0、Gemini和Perplexity的比较分析。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-06 DOI: 10.1016/j.ijmedinf.2026.106270
Shinae Yu , Eun-Jung Cho , Sollip Kim , Kuenyoul Park , Min-Sun Kim , YeJin Oh , Hyejin Ryu

Objectives

This study aimed to assess the feasibility and practical utility of using large language models (LLMs) for Logical Observation Identifiers Names and Codes (LOINC) mapping to standardise healthcare data in the field of laboratory medicine. We evaluated the accuracy and applicability of three LLMs—ChatGPT-4.0 (OpenAI), Gemini 1.5 (Google DeepMind), and Perplexity AI (Perplexity.ai)—in mapping laboratory test items, which typically require considerable institutional-level standardisation efforts.

Methods

A total of 75 representative laboratory test items, including 55 clinical chemistry and 20 hematology tests commonly used in clinical practice, were selected. Six board-certified clinical pathologists independently mapped each test item to its appropriate LOINC code. A consensus mapping was established by the experts and used as the gold standard. Each LLM’s output was compared to this consensus, and the results were categorised as complete match (CM), partial match (PM), or mismatch (MM) based on agreement with the reference.

Results

Overall paired ordinal analyses demonstrated a significant difference in LOINC code-mapping performance among the three models, with Gemini performing significantly worse than both ChatGPT-4.0 and Perplexity AI, and no significant difference between ChatGPT-4.0 and Perplexity AI. ChatGPT-4.0 achieved the highest CM rate in clinical chemistry (58.2%), whereas Perplexity AI performed best in hematology (55.0%). Gemini showed the highest MM rates, particularly in hematology (80.0%), while partial matches were largely attributable to method-related discrepancies rather than fully incorrect mappings.

Conclusion

Structured inputs, localisation to domestic laboratory practices, and expert oversight are critical to improving the reliability of LLM-generated LOINC mappings. While LLMs can reduce workload by generating candidate mappings, human validation remains essential to ensure clinical accuracy. Future improvements should focus on algorithmic refinement, error feedback integration, and adaptation to diverse laboratory settings to enhance accuracy and generalisability in real-world laboratory settings.
目的:本研究旨在评估使用大型语言模型(llm)进行逻辑观察标识符名称和代码(LOINC)映射的可行性和实用性,以标准化实验室医学领域的医疗保健数据。我们评估了三种LLMs-ChatGPT-4.0 (OpenAI), Gemini 1.5(谷歌DeepMind)和Perplexity AI (Perplexity)的准确性和适用性。在绘制实验室测试项目中,这通常需要相当大的机构级别的标准化努力。方法:选取75项具有代表性的实验室检测项目,其中临床常用的化学检测项目55项,血液学检测项目20项。六名委员会认证的临床病理学家独立地将每个测试项目映射到相应的LOINC代码。专家们建立了共识图,并将其作为金标准。将每个LLM的输出与此共识进行比较,并根据与参考文献的一致性将结果分类为完全匹配(CM),部分匹配(PM)或不匹配(MM)。结果:总体配对顺序分析表明,三种模型在LOINC代码映射性能上存在显著差异,Gemini的表现明显低于ChatGPT-4.0和Perplexity AI,而ChatGPT-4.0和Perplexity AI之间无显著差异。ChatGPT-4.0在临床化学中的CM率最高(58.2%),而Perplexity AI在血液学中的CM率最高(55.0%)。双子座的MM率最高,特别是血液学(80.0%),而部分匹配主要是由于方法相关的差异,而不是完全不正确的映射。结论:结构化输入、本地化到国内实验室实践以及专家监督对于提高llm生成的LOINC映射的可靠性至关重要。虽然llm可以通过生成候选映射来减少工作量,但人工验证仍然是确保临床准确性的关键。未来的改进应集中在算法改进、误差反馈集成和适应不同的实验室环境,以提高在现实世界实验室环境中的准确性和通用性。
{"title":"Large Language Models’ Performances regarding logical observation identifiers names and codes mapping in laboratory medicine: A comparative analysis of ChatGPT-4.0, Gemini, and Perplexity","authors":"Shinae Yu ,&nbsp;Eun-Jung Cho ,&nbsp;Sollip Kim ,&nbsp;Kuenyoul Park ,&nbsp;Min-Sun Kim ,&nbsp;YeJin Oh ,&nbsp;Hyejin Ryu","doi":"10.1016/j.ijmedinf.2026.106270","DOIUrl":"10.1016/j.ijmedinf.2026.106270","url":null,"abstract":"<div><h3>Objectives</h3><div>This study aimed to assess the feasibility and practical utility of using large language models (LLMs) for Logical Observation Identifiers Names and Codes (LOINC) mapping to standardise healthcare data in the field of laboratory medicine. We evaluated the accuracy and applicability of three LLMs—ChatGPT-4.0 (OpenAI), Gemini 1.5 (Google DeepMind), and Perplexity AI (<span><span>Perplexity.ai</span><svg><path></path></svg></span>)—in mapping laboratory test items, which typically require considerable institutional-level standardisation efforts.</div></div><div><h3>Methods</h3><div>A total of 75 representative laboratory test items, including 55 clinical chemistry and 20 hematology tests commonly used in clinical practice, were selected. Six board-certified clinical pathologists independently mapped each test item to its appropriate LOINC code. A consensus mapping was established by the experts and used as the gold standard. Each LLM’s output was compared to this consensus, and the results were categorised as complete match (CM), partial match (PM), or mismatch (MM) based on agreement with the reference.</div></div><div><h3>Results</h3><div>Overall paired ordinal analyses demonstrated a significant difference in LOINC code-mapping performance among the three models, with Gemini performing significantly worse than both ChatGPT-4.0 and Perplexity AI, and no significant difference between ChatGPT-4.0 and Perplexity AI. ChatGPT-4.0 achieved the highest CM rate in clinical chemistry (58.2%), whereas Perplexity AI performed best in hematology (55.0%). Gemini showed the highest MM rates, particularly in hematology (80.0%), while partial matches were largely attributable to method-related discrepancies rather than fully incorrect mappings.</div></div><div><h3>Conclusion</h3><div>Structured inputs, localisation to domestic laboratory practices, and expert oversight are critical to improving the reliability of LLM-generated LOINC mappings. While LLMs can reduce workload by generating candidate mappings, human validation remains essential to ensure clinical accuracy. Future improvements should focus on algorithmic refinement, error feedback integration, and adaptation to diverse laboratory settings to enhance accuracy and generalisability in real-world laboratory settings.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106270"},"PeriodicalIF":4.1,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145935900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Association between the platelet to white blood cell ratio and short term mortality in critically ill patients with atherosclerotic cardiovascular disease: A retrospective study and machine learning with external validation 动脉粥样硬化性心血管疾病危重患者血小板/白细胞比率与短期死亡率之间的关系:一项回顾性研究和外部验证的机器学习
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-06 DOI: 10.1016/j.ijmedinf.2026.106267
Zhantao Cao , Hewei Qin , Yue Niu , Zilu Zhang , Guoju Dong

Background

The platelet to white blood cell ratio (PWR) has shown prognostic value in many diseases. Yet its predictive utility for patients with atherosclerotic cardiovascular disease (ASCVD) who receive care in the intensive care unit (ICU) remains uncertain. We examined whether PWR at ICU admission is associated with short term all cause mortality among ICU patients with ASCVD.

Methods

We used the MIMIC IV and eICU databases to study the association between PWR and 30 day all cause mortality in critically ill patients with ASCVD. Patients were grouped by PWR quartiles. Collinearity was checked with the variance inflation factor (VIF). Nonlinearity was assessed with restricted cubic splines(RCS). Survival was compared with Kaplan Meier(KM) curves and the log rank test. Hazard ratios were estimated with stratified and adjusted Cox models. We also built machine learning models that included PWR and clinical features selected with the Boruta algorithm to predict 30 day mortality. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC). External validation in the eICU database was used to assess generalizability.

Results

A total of 10 943 ICU patients with ASCVD were included, 62 % were men, and the median age was 71 years. The highest PWR quartile had a lower 30 day all cause mortality than the lowest quartile, 11 % versus 15 % (p < 0.001). In multivariable Cox models the highest quartile had a lower risk (HR 0.80, 95 % CI 0.67 to 0.95, p = 0.012). RCS suggested a nonlinear association. Age modified the association, with a stronger protective effect in patients younger than 70 years (HR 0.64, 95 % CI 0.47 to 0.87, interaction p < 0.001). The best machine learning model achieved an AUC of 0.812 in internal validation and 0.80 in external validation. SHAP analysis showed that higher PWR was linked to a lower predicted risk of death.

Conclusion

PWR independently predicts 30 day all cause mortality in ICU patients with ASCVD. These findings support the use of PWR for risk stratification and to inform management in critical care for ASCVD.
背景:血小板与白细胞比值(PWR)在许多疾病中显示出预后价值。然而,它对在重症监护病房(ICU)接受治疗的动脉粥样硬化性心血管疾病(ASCVD)患者的预测效用仍不确定。我们研究了ICU患者入院时的PWR是否与ASCVD患者的短期全因死亡率相关。方法:我们使用MIMIC IV和eICU数据库研究重症ASCVD患者PWR与30天全因死亡率之间的关系。患者按PWR四分位数分组。用方差膨胀因子(VIF)检验共线性。用受限三次样条(RCS)评价非线性。生存率比较采用Kaplan Meier(KM)曲线和log rank检验。采用分层和调整后的Cox模型估计风险比。我们还建立了机器学习模型,其中包括用Boruta算法选择的PWR和临床特征,以预测30天死亡率。模型的性能由受者工作特征曲线下面积(AUC)来评价。使用eICU数据库中的外部验证来评估通用性。结果:共纳入10 943例ICU ASCVD患者,男性占62%,中位年龄71岁。PWR最高的四分位数30天的全因死亡率低于最低的四分位数,分别为11%和15% (p)结论:PWR独立预测ASCVD ICU患者30天的全因死亡率。这些发现支持使用PWR进行风险分层,并为ASCVD的重症监护管理提供信息。
{"title":"Association between the platelet to white blood cell ratio and short term mortality in critically ill patients with atherosclerotic cardiovascular disease: A retrospective study and machine learning with external validation","authors":"Zhantao Cao ,&nbsp;Hewei Qin ,&nbsp;Yue Niu ,&nbsp;Zilu Zhang ,&nbsp;Guoju Dong","doi":"10.1016/j.ijmedinf.2026.106267","DOIUrl":"10.1016/j.ijmedinf.2026.106267","url":null,"abstract":"<div><h3>Background</h3><div>The platelet to white blood cell ratio (PWR) has shown prognostic value in many diseases. Yet its predictive utility for patients with atherosclerotic cardiovascular disease (ASCVD) who receive care in the intensive care unit (ICU) remains uncertain. We examined whether PWR at ICU admission is associated with short term all cause mortality among ICU patients with ASCVD.</div></div><div><h3>Methods</h3><div>We used the MIMIC IV and eICU databases to study the association between PWR and 30 day all cause mortality in critically ill patients with ASCVD. Patients were grouped by PWR quartiles. Collinearity was checked with the variance inflation factor (VIF). Nonlinearity was assessed with restricted cubic splines(RCS). Survival was compared with Kaplan Meier(KM) curves and the log rank test. Hazard ratios were estimated with stratified and adjusted Cox models. We also built machine learning models that included PWR and clinical features selected with the Boruta algorithm to predict 30 day mortality. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC). External validation in the eICU database was used to assess generalizability.</div></div><div><h3>Results</h3><div>A total of 10 943 ICU patients with ASCVD were included, 62 % were men, and the median age was 71 years. The highest PWR quartile had a lower 30 day all cause mortality than the lowest quartile, 11 % versus 15 % (<em>p</em> &lt; 0.001). In multivariable Cox models the highest quartile had a lower risk (HR 0.80, 95 % CI 0.67 to 0.95, <em>p</em> = 0.012). RCS suggested a nonlinear association. Age modified the association, with a stronger protective effect in patients younger than 70 years (HR 0.64, 95 % CI 0.47 to 0.87, interaction <em>p</em> &lt; 0.001). The best machine learning model achieved an AUC of 0.812 in internal validation and 0.80 in external validation. SHAP analysis showed that higher PWR was linked to a lower predicted risk of death.</div></div><div><h3>Conclusion</h3><div>PWR independently predicts 30 day all cause mortality in ICU patients with ASCVD. These findings support the use of PWR for risk stratification and to inform management in critical care for ASCVD.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106267"},"PeriodicalIF":4.1,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Medical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1