Background: Large language models (LLMs), such as GPT-3.5 and GPT-4 (OpenAI), have been transforming virtual patient systems in medical education by providing scalable and cost-effective alternatives to standardized patients. However, systematic evaluations of their performance, particularly for multimorbidity scenarios involving multiple coexisting diseases, are still limited.
Objective: This systematic review aimed to evaluate LLM-based virtual patient systems for medical history-taking, addressing four research questions: (1) simulated patient types and disease scope, (2) performance-enhancing techniques, (3) experimental designs and evaluation metrics, and (4) dataset characteristics and availability.
Methods: Following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020, 9 databases were searched (January 1, 2020, to August 18, 2025). Nontransformer LLMs and non-history-taking tasks were excluded. Multidimensional quality and bias assessments were conducted.
Results: A total of 39 studies were included, screened by one computer science researcher under supervision. LLM-based virtual patient systems mainly simulated internal medicine and mental health disorders, with many addressing distinct single disease types but few covering multimorbidity or rare conditions. Techniques like role-based prompts, few-shot learning, multiagent frameworks, knowledge graph (KG) integration (top-k accuracy 16.02%), and fine-tuning enhanced dialogue and diagnostic accuracy. Multimodal inputs (eg, speech and imaging) improved immersion and realism. Evaluations, typically involving 10-50 students and 3-10 experts, demonstrated strong performance (top-k accuracy: 0.45-0.98, hallucination rate: 0.31%-5%, System Usability Scale [SUS] ≥80). However, small samples, inconsistent metrics, and limited controls restricted generalizability. Common datasets such as MIMIC-III (Medical Information Mart for Intensive Care-III) exhibited intensive care unit (ICU) bias and lacked diversity, affecting reproducibility and external validity.
Conclusions: Included studies showed moderate risk of bias, inconsistent metrics, small cohorts, and limited dataset transparency. LLM-based virtual patient systems excel in simulating multiple disease types but lack multimorbidity patient representation. KGs improve top-k accuracy and support structured disease representation and reasoning. Future research should prioritize hybrid KG-chain-of-thought architectures integrated with open-source KGs (eg, UMLS [Unified Medical Language System] and SNOMED-CT [Systematized Nomenclature of Medicine - Clinical Terms]), parameter-efficient fine-tuning, dialogue compression, multimodal LLMs, standardized metrics, larger cohorts, and open-access multimodal datasets to further enhance realism, diagnostic accuracy, fairness, and educational utility.
Background: Accurately predicting left ventricular ejection fraction (LVEF) recovery after percutaneous coronary intervention (PCI) in patients with chronic coronary syndrome (CCS) is crucial for clinical decision-making.
Objective: This study aimed to develop and compare multiple machine learning (ML) models to predict LVEF recovery and identify key contributing features.
Methods: We retrospectively analyzed 520 patients with CCS from the Clinical Deep Data Accumulation System database. Patients were categorized into 4 binary classification tasks based on baseline LVEF (≥50% or <50%) and degree of recovery: (1) good recovery, defined as an LVEF increase of >10% compared with ≤0%; and (2) normal recovery, defined as an LVEF increase of 0% to 10% compared with ≤0%. For each task, 3 feature selection strategies (all features, least absolute shrinkage and selection operator [LASSO] regression, and recursive feature elimination [RFE]) were combined with 4 ML algorithms (extreme gradient boosting [XGBoost], categorical boosting, light gradient boosting machine, and random forest), resulting in 48 models. Models were evaluated using 10-fold cross-validation and assessed by the area under the curve (AUC), decision curve analysis, and calibration plots.
Results: The highest AUCs were achieved by RFE combined with XGBoost (AUC=0.93) for preserved LVEF with good recovery, LASSO combined with XGBoost (AUC=0.79) for preserved LVEF with normal recovery, LASSO combined with XGBoost (AUC=0.88) for reduced LVEF with good recovery, and RFE combined with XGBoost (AUC=0.84) for reduced LVEF with normal recovery. Shapley Additive Explanation analysis identified uric acid, platelets, hematocrit, brain natriuretic peptide, glycated hemoglobin, glucose, creatinine, baseline LVEF, left ventricular end-diastolic internal diameter, heart rate, R wave amplitude in V5, and R wave amplitude in V6 as important predictive factors of LVEF recovery.
Conclusions: ML models incorporating feature selection strategies demonstrated strong predictive performance for LVEF recovery after PCI. These interpretable models may support clinical decision-making and can improve the management of patients with CCS after PCI.
Background: Although electronic medical records (EMRs) play a vital role in strengthening the health care system by improving efficiency, data management, and patient care, their development in Ethiopia is still in its early stages. Hence, most public health care facilities manage their patient information using paper-based recording, which results in errors, delays, and reduced service quality.
Objective: This study aims to determine the level of acceptance of the EMR system and describe contributing factors.
Methods: A cross-sectional study was conducted at health care facilities in Bahir City, Northwestern Ethiopia. A total of 322 health workers participated in the study, drawn from 5 health facilities that have implemented the EMR system. Descriptive statistics and bivariate and multivariate binary logistic regression were done to determine factors associated with EMR acceptance computed from mediating factors (perceived ease of use and perceived usefulness), and which is more appropriate in early-stage implementation.
Results: Out of the total 322 respondents, 256 (73%) respondents with 95% CI 67.4-78.2 had a good acceptance of using EMRs. In regression analysis, significant predictors including work experience over 10 years (odds ratio [OR] 14.32, 95% CI 4.60-44.58), income dissatisfaction (OR 0.28, 95% CI 0.10-0.82), owning a personal computer (OR 11.08, 95% CI 4.03-30.24), EMR-specific training (OR 4.71, 95% CI 1.52-14.54), basic electronic health management information system/district health information system 2 training (OR 3.06, 95% CI 1.02-9.17), and system usability (OR 38.24, 95% CI 12.26-119.27) were identified.
Conclusions: The study demonstrated a moderate level of EMR acceptance among health care workers, with system usability identified as the strongest predictor. Significant factors influencing EMR acceptance included longer work experience, ownership of a personal computer, and prior EMR or electronic health management information system/district health information system 2 training. Context-specific strategies are needed to enhance system usability, provide targeted digital health training, and improve access to technological resources in order to support broader EMR adoption in health care settings.
Background: Psoriasis is a chronic inflammatory skin disorder that has been increasingly linked to metabolic imbalances, particularly obesity. Conventional anthropometric indicators such as BMI and waist circumference (WC) may not sufficiently capture body fat distribution or reflect metabolic risk. The body roundness index (BRI), which integrates both height and waist measurements, has emerged as a potentially superior metric, though its relevance to psoriasis risk remains underexplored.
Objective: This study aimed to investigate the use of BRI as a digital biomarker for assessing psoriasis risk and to compare its predictive strength against BMI and WC across various demographic and metabolic subgroups using data from a nationally representative sample.
Methods: A cross-sectional analysis was conducted using data from 13,798 adults aged 20 to 59 years who participated in the National Health and Nutrition Examination Survey between 2003 and 2006 as well as between 2009 and 2014. Psoriasis status was self-reported. Anthropometric measures (BRI, BMI, and WC) were calculated from standardized physical assessments. Weighted multivariable logistic regression models and restricted cubic spline analyses were used to examine associations while adjusting for demographic, metabolic, and lifestyle variables. A nomogram was constructed to quantify the relative predictive contributions of each metric.
Results: BRI exhibited a strong linear association with psoriasis risk (odds ratio [OR] 1.11 per unit increase, 95% CI 1.05-1.17; P<.001), outperforming BMI (OR 1.03) and WC (OR 1.01). Tertile analysis revealed a 1.73-fold increased risk of psoriasis in the highest BRI group (P=.003). Subgroup analyses confirmed consistent associations across age, sex, race or ethnicity, and metabolic status (P for interaction >.05). The nomogram highlighted BRI as the most influential predictor, indicated by its broad scoring range.
Conclusions: BRI shows stronger and more consistent associations with psoriasis risk than BMI or WC, supporting its potential role as a digital biomarker for early risk stratification. Incorporating BRI into clinical decision-making tools may enhance personalized approaches to psoriasis prevention and management.
Background: Autism spectrum disorder (ASD) is a prevalent neurodevelopmental condition that can be quite difficult to diagnose due to a lack of objective diagnostic methods in the currently used behavioral assessments. Recent work has shown that children with ASD have a higher incidence of motor control differences. A compilation of studies indicates that between 50% and 88% of the children with ASD have issues with movement control based on standardized motor assessments or parent-reported questionnaires.
Objective: In this study, we assess a variety of deep learning approaches for the classification of ASD, utilizing data collected via inertial measurement unit (IMU) hand tracking during goal-directed arm movements.
Methods: IMU hand tracking data were recorded from 41 school-aged children both with and without an ASD diagnosis to track their arm movements during a reach-to-clean up task. The IMU data were then preprocessed using a moving average and z score normalization to prepare the data for deep learning models. We evaluated the effectiveness of different deep learning models using the preprocessed data and a k-fold validation approach, as well as a patient-separated approach.
Results: The best result was achieved with a convolutional autoencoder combined with long short-term memory layers, reaching an accuracy of 90.21% and an F1-score of 90.02%. Once the convolutional autoencoder+long short-term memory was determined to be the most effective model for this datatype, it was retrained and evaluated with a patient-separated dataset to assess the generalization capability of the model, achieving an accuracy of 91.87% and an F1-score of 93.66%.
Conclusions: Our deep learning approach demonstrates that our models hold potential for facilitating ASD diagnosis in clinical settings. This work validates that there are significant differences between the physical movements of typically developing children and children with ASD, and these differences can be identified by analyzing hand-eye coordination skills. Additionally, we have validated that small-scale models can still achieve a high accuracy and good generalization when classifying medical data, opening the door for future research into diagnostic models that may not require massive amounts of data.

