Background: Considering sex and gender improves research quality, innovation, and social equity, while ignoring them leads to inaccuracies and inefficiency in study results. Despite increasing attention on sex- and gender-sensitive medicine, challenges remain with accurately representing gender due to its dynamic and context-specific nature.
Objective: This work aims to contribute to the implementation of a standard for collecting and assessing gender-specific data in German university hospitals and associated research facilities.
Methods: We carried out a review to identify and categorize state-of-the-art gender scores. We systematically assessed 22 publications regarding the applicability and practicability of their proposed gender scores. Specifically, we evaluated the use of these gender scores on German research data from routine clinical practice, using the Medical Informatics Initiative core dataset (MII CDS).
Results: Different methods for assessing gender have been proposed, but no standardized and validated gender score is available for health research. Most gender scores target epidemiological or public health research where questions about social aspects and life habits are already part of the questionnaires. However, it is challenging to apply concepts for gender scoring on clinical data. The MII CDS, for example, lacks all variables currently being recorded in gender scores. Although some of the required variables are indeed present in routine clinical data, they need to become part of the MII CDS.
Conclusions: To enable gender-specific retrospective analysis of routine clinical data, we recommend updating and expanding the MII CDS by including more gender-relevant information. For this purpose, we provide concrete action steps on how gender-related variables can be captured in routine clinical practice and represented in a machine-readable way.
Background: Minimally invasive posterior lumbar interbody fusion (MIS-PLIF) is commonly performed to treat degenerative lumbar spinal conditions. Patients of advanced age often present with multiple comorbidities and reduced physiological reserves, influencing surgical risks and recovery. The growing aging population has led to a rising demand for care for older adults, posing significant challenges for health care systems worldwide.
Objective: This study aimed to identify the associations between different age groups and MIS-PLIF outcomes.
Methods: This study retrospectively analyzed data from the United States Nationwide Inpatient Sample collected between 2016 and 2020. Patients aged ≥60 years who underwent MIS-PLIF were eligible for inclusion in this study. Patients were categorized into age groups (60-69, 70-79, and ≥80 y). Logistic and linear regressions were used to determine the associations between the study variables and outcomes, including in-hospital mortality, complications, nonroutine discharge, and length of stay.
Results: A total of 785 patients aged ≥60 (mean age 69.4, SD 0.2) years who underwent MIS-PLIF were included in the analysis, and 18.7% (147/785) experienced at least one complication. After adjustment, compared with patients aged 60 to 69 years, the risk of nonroutine discharge was significantly increased in patients aged 70 to 79 years (adjusted odds ratio 2.33, 95% CI 1.57-3.46; P<.001) and ≥80 years (adjusted odds ratio 4.79, 95% CI 2.64-8.67; P<.001). No significant differences in the risk of complications or length of hospital stay were observed across the age groups.
Conclusions: In older patients undergoing MIS-PLIF, advanced age is an independent predictor of nonroutine discharge. Furthermore, our findings suggest that age alone is not an independent risk factor for complications or extended hospital stays among older patients. These findings underscore that MIS-PLIF is a viable option for older patients, for whom extra attention may still be needed for postoperative care. Implementing age-stratified management for older patients undergoing MIS-PLIF may have important clinical policy implications.
Unstructured: Objective: We employed the free artificial intelligence (AI) tool Google NotebookLM®, powered by the large language model (LLM) Gemini 2.0, to construct a medical decision-making aid for diagnosing and managing airway diseases, and subsequently evaluated its functionality and performance in clinical workflow. Methods: After feeding this tool with relevant published clinical guidelines for these diseases, we evaluated the feasibility of the system regarding its behavior, ability, and potential, and made simulated cases and used this system to solve associated medical problems. The test and simulation questions were designed by a pulmonologist, and the appropriateness (focusing on accuracy and completeness) of AI responses were judged by three pulmonologists independently. The system was then deployed in an emergency department (ED) setting, where it was tested by medical staff (n=20) to see how it affected the process of clinical consultation. Test opinions were collected through questionnaire. Results: Most (58/84=66.7%) of the specialists' ratings regarding AI responses were above average. The inter-rater reliability was moderate on accuracy (Intraclass correlation coefficient (ICC)=0.612, P<.001) and good on completeness (ICC=0.773, P<.001). When deployed in an ED setting, this system could respond with reasonable answers, enhance the literacy of personnel about these diseases. The potential to save the time spent in consultation did not reach statistical significance (Kolmogorov-Smirnov D=.223, P=.237>.05) across all participants, but indicated a favorable outcome if we analyzed only physicians' responses. Conclusions: This system is customizable, cost-efficient, and accessible by clinicians and allied professionals without any computer coding experience in treating airway diseases. It provides convincing guideline-based recommendations, increases the staff's medical literacy, and potentially saves physicians' time spent on consultation. It warrants further evaluation in other medical disciplines and healthcare environments.
Background: Large language models (LLMs), such as GPT-3.5 and GPT-4 (OpenAI), have been transforming virtual patient systems in medical education by providing scalable and cost-effective alternatives to standardized patients. However, systematic evaluations of their performance, particularly for multimorbidity scenarios involving multiple coexisting diseases, are still limited.
Objective: This systematic review aimed to evaluate LLM-based virtual patient systems for medical history-taking, addressing four research questions: (1) simulated patient types and disease scope, (2) performance-enhancing techniques, (3) experimental designs and evaluation metrics, and (4) dataset characteristics and availability.
Methods: Following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020, 9 databases were searched (January 1, 2020, to August 18, 2025). Nontransformer LLMs and non-history-taking tasks were excluded. Multidimensional quality and bias assessments were conducted.
Results: A total of 39 studies were included, screened by one computer science researcher under supervision. LLM-based virtual patient systems mainly simulated internal medicine and mental health disorders, with many addressing distinct single disease types but few covering multimorbidity or rare conditions. Techniques like role-based prompts, few-shot learning, multiagent frameworks, knowledge graph (KG) integration (top-k accuracy 16.02%), and fine-tuning enhanced dialogue and diagnostic accuracy. Multimodal inputs (eg, speech and imaging) improved immersion and realism. Evaluations, typically involving 10-50 students and 3-10 experts, demonstrated strong performance (top-k accuracy: 0.45-0.98, hallucination rate: 0.31%-5%, System Usability Scale [SUS] ≥80). However, small samples, inconsistent metrics, and limited controls restricted generalizability. Common datasets such as MIMIC-III (Medical Information Mart for Intensive Care-III) exhibited intensive care unit (ICU) bias and lacked diversity, affecting reproducibility and external validity.
Conclusions: Included studies showed moderate risk of bias, inconsistent metrics, small cohorts, and limited dataset transparency. LLM-based virtual patient systems excel in simulating multiple disease types but lack multimorbidity patient representation. KGs improve top-k accuracy and support structured disease representation and reasoning. Future research should prioritize hybrid KG-chain-of-thought architectures integrated with open-source KGs (eg, UMLS [Unified Medical Language System] and SNOMED-CT [Systematized Nomenclature of Medicine - Clinical Terms]), parameter-efficient fine-tuning, dialogue compression, multimodal LLMs, standardized metrics, larger cohorts, and open-access multimodal datasets to further enhance realism, diagnostic accuracy, fairness, and educational utility.
Background: AI enabled CRM platforms are increasingly used in healthcare to improve patient services, but real world evidence about how these systems influence affordability, adherence, and access remains limited. Many enterprises adopt CRM workflows without clear governance, operational definitions, or measurement standards, which creates inconsistent outcomes and low adoption.
Objective: To summarize early operational lessons from four large enterprise implementations of AI enabled CRM platforms and describe program level changes in affordability support, therapy initiation time, and therapy discontinuation rates.
Methods: A case informed thematic analysis was conducted across four enterprise CRM implementations between 2019 and 2024. Programs included large national healthcare organizations serving more than 500,000 patients annually. Aggregated, de identified operational dashboards and governance documents were reviewed. Adoption was defined as the proportion of active CRM users among provisioned patient service users. Baseline values were taken from pre implementation operations and compared with stabilized post implementation periods. No patient level or identifiable data were used, and institutional review board approval was not required.
Results: Programs that aligned CRM workflows with patient centered outcomes showed higher adoption. Active user rates reached more than 85 percent compared with less than 60 percent in programs without structured governance. CRM supported affordability checks showed increased completion rates within service teams. Therapy initiation time improved in programs that used AI assisted triage. Program level therapy discontinuation rates decreased when proactive risk flags were incorporated into CRM workflows. These changes reflect descriptive pre post operational signals and not causal estimates.
Conclusions: AI enabled CRM platforms can support improvements in patient service operations when supported by clear governance and well defined metrics. Observed improvements in affordability support, initiation time, and discontinuation rates were program level trends that require further study with more rigorous designs. The findings provide early lessons for organizations implementing AI driven CRM systems in healthcare.
Clinicaltrial:

