Pub Date : 2026-01-14DOI: 10.1016/j.ijmedinf.2026.106291
Zhihao Lei
{"title":"“Calibration or contamination?” Reassessing the evaluation of large language models for clinical mortality prediction","authors":"Zhihao Lei","doi":"10.1016/j.ijmedinf.2026.106291","DOIUrl":"10.1016/j.ijmedinf.2026.106291","url":null,"abstract":"","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106291"},"PeriodicalIF":4.1,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1016/j.ijmedinf.2026.106277
Manuri De Silva , Alice Voskoboynik , Sailavan Ramesh , Janice Campbell , Saravanan Satkumaran , Daryl R. Cheng
Objective
Communicable diseases, especially seasonal respiratory illnesses, contribute significantly to paediatric hospital presentations and admissions. Existing surveillance systems often require retrospective manual data collation and focus on either demographic or clinical data, not both. The Communicable Diseases Platform (CDP) is a dynamic data platform that aggregates both data types for all communicable disease presentations to The Royal Children’s Hospital Melbourne (RCH).
Methods
In the pilot phase, the CDP extracted de-identified aggregated data from hospital electronic medical records for patients with positive respiratory swabs. A dashboard displayed positivity rate and cumulative hospital admissions trends from 2016 to 2025, further filterable by pathogen, age, presentation type and interventions.
Discussion
The CDP improves understanding of clinical profiles, disease burden and seasonal patterns, supporting better outbreak control, patient flow prediction and clinical surveillance. Future developments include immunisation data integration and machine learning algorithm evaluation for real-time vaccine effectiveness estimations and communicable disease predictive modelling.
{"title":"Communicable diseases platform (CDP): Real-Time clinical analytics for infections","authors":"Manuri De Silva , Alice Voskoboynik , Sailavan Ramesh , Janice Campbell , Saravanan Satkumaran , Daryl R. Cheng","doi":"10.1016/j.ijmedinf.2026.106277","DOIUrl":"10.1016/j.ijmedinf.2026.106277","url":null,"abstract":"<div><h3>Objective</h3><div>Communicable diseases, especially seasonal respiratory illnesses, contribute significantly to paediatric hospital presentations and admissions. Existing surveillance systems often require retrospective manual data collation and focus on either demographic or clinical data, not both. The Communicable Diseases Platform (CDP) is a dynamic data platform that aggregates both data types for all communicable disease presentations to The Royal Children’s Hospital Melbourne (RCH).</div></div><div><h3>Methods</h3><div>In the pilot phase, the CDP extracted de-identified aggregated data from hospital electronic medical records for patients with positive respiratory swabs. A dashboard displayed positivity rate and cumulative hospital admissions trends from 2016 to 2025, further filterable by pathogen, age, presentation type and interventions.</div></div><div><h3>Discussion</h3><div>The CDP improves understanding of clinical profiles, disease burden and seasonal patterns, supporting better outbreak control, patient flow prediction and clinical surveillance. Future developments include immunisation data integration and machine learning algorithm evaluation for real-time vaccine effectiveness estimations and communicable disease predictive modelling.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106277"},"PeriodicalIF":4.1,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1016/j.ijmedinf.2026.106275
Wenyong Wang , Mahnaz Samadbeik , Gaurav Puri , Donald S.A. McLeod , Elton Lobo , Tuan Duong , Titus Kirwa , Clair Sullivan
Background
Electronic Medical Records (EMRs) aim to improve efficiency, safety, and quality of care. However, the impact of EMR implementation, particularly in outpatient diabetes care, remains underexplored. This study explored clinicians’ perspectives on EMR use in diabetes outpatient care.
Methods
This qualitative study, conducted in line with COREQ guidelines, involved four focus groups with 22 clinicians (doctors, nurses, and allied health) at a metropolitan diabetes service in Queensland, Australia. Data were analysed using deductive content analysis, guided by the Quintuple Aim and Technology Acceptance Model/Unified Theory of Acceptance and Use of Technology frameworks.
Results
Clinicians reported mixed outcomes across the Quintuple Aim domains, shaped by technology adoption constructs. Facilitators such as improved efficiency, access to patient information, and prescribing safety reflected perceived usefulness and positive attitudes, contributing to favourable outcomes across multiple Quintuple Aim. Barriers such as navigation complexity, technical issues, alert fatigue, and overwhelming training led to negative outcomes in EMR use. Tensions around documentation practices and patient expectations of system use, resulted in mixed outcomes. Overall, clinicians viewed EMRs as essential, but sustained adoption required improved usability, tailored training, and better system integration.
Conclusion
This study concludes that while the EMRs improved safety, efficiency, and access to information, their design and implementation also introduced burdens that negatively affected clinician experience. EMRs significantly shape the healthcare workforce, influencing workflow, wellbeing, and professional engagement. In outpatient diabetes care, specific workflow challenges such as glycaemic data integration highlight that existing EMR designs may not fully support the complexity of chronic disease management. To maximise benefits, EMR initiatives should be approached as quality improvement activities, with role-specific training, reliable infrastructure, and clinician involvement in system optimisation. Future research should address usability challenges, enhance integration, and ensure that both clinician and patient perspectives guide digital health transformation.
{"title":"Clinicians’ perspectives on electronic medical records use in diabetes outpatient Care: A qualitative study","authors":"Wenyong Wang , Mahnaz Samadbeik , Gaurav Puri , Donald S.A. McLeod , Elton Lobo , Tuan Duong , Titus Kirwa , Clair Sullivan","doi":"10.1016/j.ijmedinf.2026.106275","DOIUrl":"10.1016/j.ijmedinf.2026.106275","url":null,"abstract":"<div><h3>Background</h3><div>Electronic Medical Records (EMRs) aim to improve efficiency, safety, and quality of care. However, the impact of EMR implementation, particularly in outpatient diabetes care, remains underexplored. This study explored clinicians’ perspectives on EMR use in diabetes outpatient care.</div></div><div><h3>Methods</h3><div>This qualitative study, conducted in line with COREQ guidelines, involved four focus groups with 22 clinicians (doctors, nurses, and allied health) at a metropolitan diabetes service in Queensland, Australia. Data were analysed using deductive content analysis, guided by the Quintuple Aim and Technology Acceptance Model/Unified Theory of Acceptance and Use of Technology frameworks.</div></div><div><h3>Results</h3><div>Clinicians reported mixed outcomes across the Quintuple Aim domains, shaped by technology adoption constructs. Facilitators such as improved efficiency, access to patient information, and prescribing safety reflected perceived usefulness and positive attitudes, contributing to favourable outcomes across multiple Quintuple Aim. Barriers such as navigation complexity, technical issues, alert fatigue, and overwhelming training led to negative outcomes in EMR use. Tensions around documentation practices and patient expectations of system use, resulted in mixed outcomes<strong>.</strong> Overall, clinicians viewed EMRs as essential, but sustained adoption required improved usability, tailored training, and better system integration.</div></div><div><h3>Conclusion</h3><div>This study concludes that while the EMRs improved safety, efficiency, and access to information, their design and implementation also introduced burdens that negatively affected clinician experience. EMRs significantly shape the healthcare workforce, influencing workflow, wellbeing, and professional engagement. In outpatient diabetes care, specific workflow challenges such as glycaemic data integration highlight that existing EMR designs may not fully support the complexity of chronic disease management. To maximise benefits, EMR initiatives should be approached as quality improvement activities, with role-specific training, reliable infrastructure, and clinician involvement in system optimisation. Future research should address usability challenges, enhance integration, and ensure that both clinician and patient perspectives guide digital health transformation.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106275"},"PeriodicalIF":4.1,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1016/j.ijmedinf.2026.106271
Zhihong Han , Baixin Li , Jie Liu
Background
Aortic dissection (AD) is a critical cardiovascular disorder with substantial risks of short-term mortality. Some researchers have endeavored to utilize machine learning (ML) approaches to develop predictive models for the risk of mortality in AD. However, systematic evidence about the accuracy of these models remains scarce, which poses challenges to the development and enhancement of risk assessment tools. Therefore, this study seeks to systematically review the reliability of ML in forecasting the risk of mortality in AD.
Methods
A search was implemented through PubMed, Cochrane, Embase, and Web of Science up to September 11, 2025. The prediction model risk of bias (RoB) assessment tool (PROBAST) was leveraged to estimate the RoB of the included studies. Subgroup analyses were implemented based upon types of AD and time of death.
Results
In total, 35 studies were included, covering 19,838 patients with AD. The results showed that, within the training datasets, ML models demonstrated a sensitivity (SEN) of 0.75 (95% CI: 0.72–0.78) and specificity (SPE) of 0.77 (95% CI: 0.74–0.80) for predicting mortality in AD. Within the validation set, which mainly focused on TAAD, the SEN was 0.79 (95% CI: 0.74–0.84) and the SPE was 0.78 (95% CI: 0.68–0.85). For in-hospital mortality, the SEN was 0.78 (95% CI: 0.72–0.83) and the SPE was 0.77 (95% CI: 0.65–0.86); for out-of-hospital mortality, the SEN and SPE were 0.81–0.84 and 0.74–0.86.
Conclusion
ML models demonstrate remarkable accuracy in forecasting the risk of mortality in AD and show superior performance relative to existing scoring systems to some extent. Future research should incorporate more multi-center, multi-ethnic, and geographically varied cases to develop a more broadly applicable risk prediction tool and offer insights for the tailored prevention strategies.
{"title":"Predictive value of machine learning for mortality risk in aortic dissection: a systematic review and meta-analysis","authors":"Zhihong Han , Baixin Li , Jie Liu","doi":"10.1016/j.ijmedinf.2026.106271","DOIUrl":"10.1016/j.ijmedinf.2026.106271","url":null,"abstract":"<div><h3>Background</h3><div>Aortic dissection (AD) is a critical cardiovascular disorder with substantial risks of short-term mortality. Some researchers have endeavored to utilize machine learning (ML) approaches to develop predictive models for the risk of mortality in AD. However, systematic evidence about the accuracy of these models remains scarce, which poses challenges to the development and enhancement of risk assessment tools. Therefore, this study seeks to systematically review the reliability of ML in forecasting the risk of mortality in AD.</div></div><div><h3>Methods</h3><div>A search was implemented through PubMed, Cochrane, Embase, and Web of Science up to September 11, 2025. The prediction model risk of bias (RoB) assessment tool (PROBAST) was leveraged to estimate the RoB of the included studies. Subgroup analyses were implemented based upon types of AD and time of death.</div></div><div><h3>Results</h3><div>In total, 35 studies were included, covering 19,838 patients with AD. The results showed that, within the training datasets, ML models demonstrated a sensitivity (SEN) of 0.75 (95% CI: 0.72–0.78) and specificity (SPE) of 0.77 (95% CI: 0.74–0.80) for predicting mortality in AD. Within the validation set, which mainly focused on TAAD, the SEN was 0.79 (95% CI: 0.74–0.84) and the SPE was 0.78 (95% CI: 0.68–0.85). For in-hospital mortality, the SEN was 0.78 (95% CI: 0.72–0.83) and the SPE was 0.77 (95% CI: 0.65–0.86); for out-of-hospital mortality, the SEN and SPE were 0.81–0.84 and 0.74–0.86.</div></div><div><h3>Conclusion</h3><div>ML models demonstrate remarkable accuracy in forecasting the risk of mortality in AD and show superior performance relative to existing scoring systems to some extent. Future research should incorporate more multi-center, multi-ethnic, and geographically varied cases to develop a more broadly applicable risk prediction tool and offer insights for the tailored prevention strategies.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106271"},"PeriodicalIF":4.1,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.ijmedinf.2026.106276
Xizhi Wu , Madeline S. Kreider , Philip E. Empey , Chenyu Li , Yanshan Wang
Objective
Fluoropyrimidines are widely prescribed for colorectal and breast cancers, but are associated with toxicities such as hand-foot syndrome and cardiotoxicity. Since toxicity documentation is often embedded in clinical notes, we aimed to develop and evaluate natural language processing (NLP) methods to extract treatment and toxicity information.
Materials and methods
We constructed a gold-standard dataset of 236 clinical notes from 204,165 adult oncology patients. Domain experts annotated categories related to treatment regimens and toxicities. We developed rule-based, machine learning-based (Random Forest [RF], Support Vector Machine [SVM], Logistic Regression [LR]), deep learning-based (BERT, ClinicalBERT), and large language models (LLM)-based NLP approaches (zero-shot and error analysis prompting). A 5-fold cross validation were conducted to validate each model.
Results
Error analysis prompting achieved optimal precision, recall, and F1 scores for treatment (F1 = 1.000) and toxicities extraction (F1 = 0.965), whereas zero-shot perform moderately (treatment F1 = 0.889, toxicities extraction F1 = 0.854) Rule-based reached F1 = 1.000 for treatment and F1 = 0.904 for toxicities extraction. LR and SVM ranked second and fourth for toxicities extraction (LR F1 = 0.914, SVM F1 = 0.903). Deep learning and RF underperformed, with performance of BERT reached F1 = 0.792 for treatment and F1 = 0.837 for toxicities extraction.,ClinicalBERT reached F1 = 0.797 for treatment and F1 = 0.884 for toxicities extraction). RF reached F1 = 0.745 for treatment and F1 = 0.853 for toxicities extraction.
Discussion
LMM-based error analysis outperformed all others, followed by machine learning methods. Machine learning and deep learning methods were limited by small training data and showed limited generalizability, particularly for rare categories.
Conclusion
LLM-based error analysis most effectively extracted fluoropyrimidine treatment and toxicity information from clinical notes, and has strong potential to support oncology research and pharmacovigilance.
{"title":"Automated extraction of fluoropyrimidine treatment and treatment-related toxicities from clinical notes using natural language processing","authors":"Xizhi Wu , Madeline S. Kreider , Philip E. Empey , Chenyu Li , Yanshan Wang","doi":"10.1016/j.ijmedinf.2026.106276","DOIUrl":"10.1016/j.ijmedinf.2026.106276","url":null,"abstract":"<div><h3>Objective</h3><div>Fluoropyrimidines are widely prescribed for colorectal and breast cancers, but are associated with toxicities such as hand-foot syndrome and cardiotoxicity. Since toxicity documentation is often embedded in clinical notes, we aimed to develop and evaluate natural language processing (NLP) methods to extract treatment and toxicity information.</div></div><div><h3>Materials and methods</h3><div>We constructed a gold-standard dataset of 236 clinical notes from 204,165 adult oncology patients. Domain experts annotated categories related to treatment regimens and toxicities. We developed rule-based, machine learning-based (Random Forest [RF], Support Vector Machine [SVM], Logistic Regression [LR]), deep learning-based (BERT, ClinicalBERT), and large language models (LLM)-based NLP approaches (zero-shot and error analysis prompting). A 5-fold cross validation were conducted to validate each model.</div></div><div><h3>Results</h3><div>Error analysis prompting achieved optimal precision, recall, and F1 scores for treatment (F1 = 1.000) and toxicities extraction (F1 = 0.965), whereas zero-shot perform moderately (treatment F1 = 0.889, toxicities extraction F1 = 0.854) Rule-based reached F1 = 1.000 for treatment and F1 = 0.904 for toxicities extraction. LR and SVM ranked second and fourth for toxicities extraction (LR F1 = 0.914, SVM F1 = 0.903). Deep learning and RF underperformed, with performance of BERT reached F1 = 0.792 for treatment and F1 = 0.837 for toxicities extraction.,ClinicalBERT reached F1 = 0.797 for treatment and F1 = 0.884 for toxicities extraction). RF reached F1 = 0.745 for treatment and F1 = 0.853 for toxicities extraction.</div></div><div><h3>Discussion</h3><div>LMM-based error analysis outperformed all others, followed by machine learning methods. Machine learning and deep learning methods were limited by small training data and showed limited generalizability, particularly for rare categories.</div></div><div><h3>Conclusion</h3><div>LLM-based error analysis most effectively extracted fluoropyrimidine treatment and toxicity information from clinical notes, and has strong potential to support oncology research and pharmacovigilance.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106276"},"PeriodicalIF":4.1,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.ijmedinf.2025.106246
Ahmet Ugur Atilan, Niyazi Cetin
Objective
Large Language Models (LLMs) are increasingly applied to patient education, yet their performance in languages that are relatively underrepresented in medical-domain corpora and large language model training datasets remains underexplored. Psoriasis and psoriatic arthritis (PsA) are chronic, immune-mediated diseases requiring lifelong patient engagement, making them suitable conditions to evaluate the clarity, reliability, and inclusivity of AI-generated educational content. To assess the comprehensibility, scientific reliability, and patient-centered communication of Turkish patient education materials for psoriasis vulgaris and PsA generated by seven state-of-the-art LLMs.
Methods
A cross-sectional analysis compared outputs from ChatGPT-4o, Gemini 2.0 Flash, Claude 3.7 Sonnet, Grok 3, Qwen 2.5, DeepSeek R1, and Mistral Large 2. Brochures were produced using standardized zero-shot prompts and evaluated via the Ateşman readability index and the DISCERN instrument. Overall differences in DISCERN scores across the seven models were assessed using a Friedman test, followed by Bonferroni-adjusted Wilcoxon signed-rank post-hoc analyses.
Results
Readability scores ranged from 61.6 to 80.2 (mean = 71.3 ± 6.9), with ChatGPT-4o and Qwen 2.5 generating the most accessible texts. DISCERN reliability scores ranged from 38.5 to 60.5, with Claude 3.7 Sonnet and Gemini 2.0 Flash showing the highest accuracy. Models prioritizing factual precision produced denser language, while conversational models favored fluency but sacrificed depth. Notable variation was observed, with only Claude 3.7 Sonnet and Gemini 2.0 Flash consistently reflecting patient-centered perspectives.
Conclusion
LLMs showed observable differences in balancing clarity and reliability when generating health education leaflets in Turkish. Most outputs appeared to lack explicit psychosocial framing and emphasis on shared decision-making, which may suggest the need for more culturally adaptive training, clinician oversight, and locally grounded validation frameworks to support safe and inclusive AI-based patient education.
{"title":"An old disease, a new linguistic challenge for large language models: patient education on psoriasis and psoriatic arthritis in an underrepresented medical language","authors":"Ahmet Ugur Atilan, Niyazi Cetin","doi":"10.1016/j.ijmedinf.2025.106246","DOIUrl":"10.1016/j.ijmedinf.2025.106246","url":null,"abstract":"<div><h3>Objective</h3><div>Large Language Models (LLMs) are increasingly applied to patient education, yet their performance in languages that are relatively underrepresented in medical-domain corpora and large language model training datasets remains underexplored. Psoriasis and psoriatic arthritis (PsA) are chronic, immune-mediated diseases requiring lifelong patient engagement, making them suitable conditions to evaluate the clarity, reliability, and inclusivity of AI-generated educational content. To assess the comprehensibility, scientific reliability, and patient-centered communication of Turkish patient education materials for psoriasis vulgaris and PsA generated by seven state-of-the-art LLMs.</div></div><div><h3>Methods</h3><div>A cross-sectional analysis compared outputs from ChatGPT-4o, Gemini 2.0 Flash, Claude 3.7 Sonnet, Grok 3, Qwen 2.5, DeepSeek R1, and Mistral Large 2. Brochures were produced using standardized zero-shot prompts and evaluated via the Ateşman readability index and the DISCERN instrument. Overall differences in DISCERN scores across the seven models were assessed using a Friedman test, followed by Bonferroni-adjusted Wilcoxon signed-rank post-hoc analyses.</div></div><div><h3>Results</h3><div>Readability scores ranged from 61.6 to 80.2 (mean = 71.3 ± 6.9), with ChatGPT-4o and Qwen 2.5 generating the most accessible texts. DISCERN reliability scores ranged from 38.5 to 60.5, with Claude 3.7 Sonnet and Gemini 2.0 Flash showing the highest accuracy. Models prioritizing factual precision produced denser language, while conversational models favored fluency but sacrificed depth. Notable variation was observed, with only Claude 3.7 Sonnet and Gemini 2.0 Flash consistently reflecting patient-centered perspectives.</div></div><div><h3>Conclusion</h3><div>LLMs showed observable differences in balancing clarity and reliability when generating health education leaflets in Turkish. Most outputs appeared to lack explicit psychosocial framing and emphasis on shared decision-making, which may suggest the need for more culturally adaptive training, clinician oversight, and locally grounded validation frameworks to support safe and inclusive AI-based patient education.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106246"},"PeriodicalIF":4.1,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1016/j.ijmedinf.2026.106262
Moid Sandhu , Siddique Latif , Andrew Bayor , Wei Lu , Mahnoosh Kholghi , Deepa Prabhu , David Silvera-Tawil
Objective: This paper critically reviews existing work in sensor-based emotional dysregulation monitoring to support caregivers of individuals diagnosed with autism spectrum disorder (ASD).
Methods: A systematic literature search was conducted across six databases (Google Scholar, IEEE Xplore, Scopus, ACM Digital Library, Web of Science, and PubMed) covering publications from January 1, 2016, to September 30, 2025.
Results: Thirty-two studies met inclusion criteria, comprising 27 focused on sensor-based emotional dysregulation detection and 5 addressing intervention or support mechanisms. These studies suggest that sensor-based technologies have potential for continuous physiological monitoring, facilitating early detection and intervention to support emotional dysregulation episodes. Critical deficiencies were identified in real-time alerting capabilities, autonomous intervention deployment, self-regulation framework integration, system reliability, long-term sustainability, user interface design, and cross-environment scalability.
Conclusion: There is a significant need to develop real-time emotion monitoring systems to empower caregivers in delivering timely, targeted interventions for individuals diagnosed with ASD. Future research should prioritise the development of real-time alert systems, autonomous intervention protocols, and solutions optimised for reliability, sustainability, usability, and adaptability across heterogeneous care settings.
目的:本文综述了基于传感器的情绪失调监测的现有工作,以支持自闭症谱系障碍(ASD)患者的护理人员。方法:对6个数据库(b谷歌Scholar、IEEE Xplore、Scopus、ACM Digital Library、Web of Science和PubMed)进行系统文献检索,检索时间为2016年1月1日至2025年9月30日。结果:32项研究符合纳入标准,其中27项关注基于传感器的情绪失调检测,5项关注干预或支持机制。这些研究表明,基于传感器的技术具有持续生理监测的潜力,有助于早期发现和干预,以支持情绪失调发作。在实时警报能力、自主干预部署、自我调节框架集成、系统可靠性、长期可持续性、用户界面设计和跨环境可扩展性方面发现了关键缺陷。结论:迫切需要开发实时情绪监测系统,使护理人员能够为ASD患者提供及时、有针对性的干预措施。未来的研究应优先发展实时警报系统、自主干预协议和解决方案,以优化可靠性、可持续性、可用性和跨异构护理环境的适应性。
{"title":"Empowering caregivers of individuals with autism spectrum disorder through sensor-based monitoring of emotional dysregulation: A scoping review","authors":"Moid Sandhu , Siddique Latif , Andrew Bayor , Wei Lu , Mahnoosh Kholghi , Deepa Prabhu , David Silvera-Tawil","doi":"10.1016/j.ijmedinf.2026.106262","DOIUrl":"10.1016/j.ijmedinf.2026.106262","url":null,"abstract":"<div><div><em>Objective:</em> This paper critically reviews existing work in sensor-based emotional dysregulation monitoring to support caregivers of individuals diagnosed with autism spectrum disorder (ASD).</div><div><em>Methods:</em> A systematic literature search was conducted across six databases (Google Scholar, IEEE Xplore, Scopus, ACM Digital Library, Web of Science, and PubMed) covering publications from January 1, 2016, to September 30, 2025.</div><div><em>Results:</em> Thirty-two studies met inclusion criteria, comprising 27 focused on sensor-based emotional dysregulation detection and 5 addressing intervention or support mechanisms. These studies suggest that sensor-based technologies have potential for continuous physiological monitoring, facilitating early detection and intervention to support emotional dysregulation episodes. Critical deficiencies were identified in real-time alerting capabilities, autonomous intervention deployment, self-regulation framework integration, system reliability, long-term sustainability, user interface design, and cross-environment scalability.</div><div><em>Conclusion:</em> There is a significant need to develop real-time emotion monitoring systems to empower caregivers in delivering timely, targeted interventions for individuals diagnosed with ASD. Future research should prioritise the development of real-time alert systems, autonomous intervention protocols, and solutions optimised for reliability, sustainability, usability, and adaptability across heterogeneous care settings.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106262"},"PeriodicalIF":4.1,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1016/j.ijmedinf.2026.106264
Yang Gao, Yingjie Lu, Xiaofei Li
{"title":"From promise to practice: strengthening evidence for AI conversational agents in healthcare","authors":"Yang Gao, Yingjie Lu, Xiaofei Li","doi":"10.1016/j.ijmedinf.2026.106264","DOIUrl":"10.1016/j.ijmedinf.2026.106264","url":null,"abstract":"","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106264"},"PeriodicalIF":4.1,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.ijmedinf.2026.106272
Michael Tang , Kristina Markova , Kristian Stanceski , Sharleen Zhong , Marguerite Tracy , Linda Koria , Sarita Lo , Xumou Zhang , Angela Pan , Jinman Kim , Julie Ayre , Adam G. Dunn
Background
The use of AI to support patient-centred communication could improve health outcomes but little is known about the equity of AI tools. We evaluated the completeness and accuracy of an AI tool that produces patient-centred medication information for patients following discharge from hospital, for different patient groups.
Methods
We evaluated differences in the completeness and safety of AI-generated (GPT-4o) patient-centred medication instructions across age groups, patient complexity, and insurance type. AI-generated medication instructions were evaluated by clinical experts for the proportion of medications that were correctly represented, described in Universal Medication Schedule (UMS) form, and presence of safety issues. We tested for significant differences in completeness and safety between groups in 140 discharge summaries sampled from the Medical Information Mark for Intensive Care (MIMIC) database.
Results
The proportion of patient-centred discharge instructions where all medications were included was 95 % (133/140) with a median of 6.0 medications (IQR 3.0–10.0). For most of the 140 cases, all medications from the discharge summary were correctly included (median 100 % included, IQR 83.3 %–100 %) and new medications were rarely added by AI, but a lower proportion of medications were presented in UMS format (median 22.5 %, IQR 0.0 %–92.5 %). Despite most medications being included, potential safety issues were identified in 69.3 % (97/140). There was no evidence of a difference in the correctness of included medications across age groups (p = 0.70), patient complexity (p = 0.72), or insurance type (p = 0.70). There was no evidence of a difference in proportion of medications in UMS format across age groups (p = 0.88), patient complexity (p = 0.94), or insurance type (p = 0.49). There was evidence of a difference in the proportion of cases with at least one potential safety issue across age groups (p = 0.031), patient complexity (p < 0.001) and insurance types (p = 0.047).
Conclusions
We found evidence of a difference in safety issues in AI-generated medication instructions for older, more complex patients, and patients with certain types of insurance. Health system and contextual differences could create unexpected variations in AI-generated outputs. Studies of AI-generated messaging for patients should consider the severity and likelihood of safety issues, localised trials, and ongoing auditing.
{"title":"Assessing the safety of patient-centred discharge medication instructions generated by an AI model","authors":"Michael Tang , Kristina Markova , Kristian Stanceski , Sharleen Zhong , Marguerite Tracy , Linda Koria , Sarita Lo , Xumou Zhang , Angela Pan , Jinman Kim , Julie Ayre , Adam G. Dunn","doi":"10.1016/j.ijmedinf.2026.106272","DOIUrl":"10.1016/j.ijmedinf.2026.106272","url":null,"abstract":"<div><h3>Background</h3><div>The use of AI to support patient-centred communication could improve health outcomes but little is known about the equity of AI tools. We evaluated the completeness and accuracy of an AI tool that produces patient-centred medication information for patients following discharge from hospital, for different patient groups.</div></div><div><h3>Methods</h3><div>We evaluated differences in the completeness and safety of AI-generated (GPT-4o) patient-centred medication instructions across age groups, patient complexity, and insurance type. AI-generated medication instructions were evaluated by clinical experts for the proportion of medications that were correctly represented, described in Universal Medication Schedule (UMS) form, and presence of safety issues. We tested for significant differences in completeness and safety between groups in 140 discharge summaries sampled from the Medical Information Mark for Intensive Care (MIMIC) database.</div></div><div><h3>Results</h3><div>The proportion of patient-centred discharge instructions where all medications were included was 95 % (133/140) with a median of 6.0 medications (IQR 3.0–10.0). For most of the 140 cases, all medications from the discharge summary were correctly included (median 100 % included, IQR 83.3 %–100 %) and new medications were rarely added by AI, but a lower proportion of medications were presented in UMS format (median 22.5 %, IQR 0.0 %–92.5 %). Despite most medications being included, potential safety issues were identified in 69.3 % (97/140). There was no evidence of a difference in the correctness of included medications across age groups (p = 0.70), patient complexity (p = 0.72), or insurance type (p = 0.70). There was no evidence of a difference in proportion of medications in UMS format across age groups (p = 0.88), patient complexity (p = 0.94), or insurance type (p = 0.49). There was evidence of a difference in the proportion of cases with at least one potential safety issue across age groups (p = 0.031), patient complexity (p < 0.001) and insurance types (p = 0.047).</div></div><div><h3>Conclusions</h3><div>We found evidence of a difference in safety issues in AI-generated medication instructions for older, more complex patients, and patients with certain types of insurance. Health system and contextual differences could create unexpected variations in AI-generated outputs. Studies of AI-generated messaging for patients should consider the severity and likelihood of safety issues, localised trials, and ongoing auditing.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106272"},"PeriodicalIF":4.1,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}