Background: Knee osteoarthritis (KOA) is one of the most prevalent chronic musculoskeletal disorders among the older adult population. Screening populations at risk of rapid progression of osteoarthritis and implementing appropriate early intervention strategies is advantageous for the treatment and prognosis of affected patients.
Objective: This study aimed to construct and validate a nomogram model based on x-ray radiomics to effectively identify individuals experiencing progression of KOA pain.
Methods: The Foundation for the National Institutes of Health Biomarkers Consortium included a total of 600 participants who were classified as pain progressors (n=297, 49.5%) and non-pain progressors (n=303, 50.5%) according to an increase in the Western Ontario and McMaster Universities Osteoarthritis Index pain score of ≥9 points (on a scale from 0 to 100) during the follow-up period of 24 to 48 months. X-rays that lacked defined spacing in the DICOM image were excluded. Fully automatic selection of subchondral bone regions on the inner and outer edges of the tibia and femur as regions of interest and extraction of radiomics features for different combinations of regions of interest were conducted. Least absolute shrinkage and selection operator regression was used to select features and generate a radiomics score using Shapley additive explanations for interpretability. The radiomics score, along with clinical indicators, was incorporated into nomograms using a multivariable logistic regression model. The subgroup analysis focused solely on the progression of pain and cases with no progression at all. The receiver operating characteristic curve, along with calibration and decision curves, was used to assess the discriminative performance.
Results: A total of 450 participants were included in the study. Shapley additive explanations analysis identified Wavelet-HH_gldm_HighGrayLevelEmphasis as the primary radiomics feature. Nomogram 1 and nomogram 2 for predicting KOA pain progression achieved area under the curve values of 0.766 and 0.753, respectively, with mean absolute errors of 0.012 and 0.008, respectively, in the calibration curves. Decision curve analysis showed a positive net benefit across a range of threshold probabilities. In subgroup analyses, nomogram 3 and nomogram 4 yielded areas under the curve of 0.795 and 0.740, respectively.
Conclusions: The nomograms based on x-ray radiomics demonstrated excellent predictive capability and accuracy in forecasting the progression of KOA pain.
Background: Psychiatric disorders are diagnostically challenging and often rely on subjective clinical judgment, particularly in resource-limited settings. Large language models (LLMs) have demonstrated potential in supporting psychiatric diagnosis; however, robust evidence from large-scale, real-world clinical data remains limited.
Objective: This study aimed to evaluate and compare the diagnostic performance of multiple LLMs for psychiatric disorders using multicenter real-world electronic health records (EHRs).
Methods: We retrospectively analyzed 9923 inpatient EHRs collected from 6 psychiatric centers across China, encompassing all ICD-10 (International Statistical Classification of Diseases, Tenth Revision) psychiatric categories. In total, 3 LLMs-GPT-4.0 (OpenAI), GPT-3.5 (OpenAI), and GLM-4-Plus (Zhipu AI)-were evaluated against physician-confirmed discharge diagnoses. Diagnostic performance was assessed using strict accuracy criteria and lenient classification metrics, with subgroup analyses conducted across diagnostic categories and age groups.
Results: GPT-4.0 achieved the highest overall strict diagnostic accuracy (71.7%) and the highest weighted F1-score under lenient evaluation (0.881), particularly for high-prevalence disorders, such as mood disorders and schizophrenia spectrum disorders. Diagnostic performance varied across age groups, with the highest accuracy observed in older adult patients (up to 79.5%) and lower accuracy in adolescents. Across centers, model performance remained stable, with no significant intercenter differences.
Conclusions: LLMs-especially GPT-4.0-demonstrate promising capability in supporting psychiatric diagnosis using real-world EHRs. However, diagnostic performance varies by age group and disorder category. LLMs should be regarded as assistive tools rather than replacements for clinical judgment, and further validation is needed before routine clinical implementation.
Background: Adult-type diffuse glioma (ADG) is the most common primary malignant tumor of the central nervous system. Its highly invasive nature, marked heterogeneity, and resistance to therapy contribute to a high risk of recurrence and poor prognosis. At present, the lack of reliable prognostic tools poses a significant barrier to the development of individualized treatment strategies.
Objective: This study aimed to develop an effective prognostic model for ADG by integrating multiple machine learning algorithms, in order to enhance the precision of individualized clinical decision-making.
Methods: In this retrospective study, 160 newly diagnosed patients with ADG who underwent surgical resection and histopathological confirmation at our institution between June 2019 and September 2021 were included. A total of 32 variables, including clinical characteristics, molecular biomarkers, and preoperative hematological indicators, were collected. Overall survival (OS) and progression-free survival (PFS) were defined as the study endpoints. Feature selection was performed using least absolute shrinkage and selection operator regression, extreme gradient boosting, and random forest algorithms. Kaplan-Meier survival curves and log-rank tests were used for survival analysis. Multivariate Cox proportional hazards models were constructed to identify independent prognostic factors, and nomograms were developed accordingly. The model's discriminative ability, calibration, and clinical utility were evaluated using the concordance index, area under the receiver operating characteristic curve (area under the curve), calibration plots, and Kaplan-Meier analysis.
Results: Age, neutrophil percentage-to-albumin ratio (NPAR), and platelet-to-mean platelet volume ratio were identified as independent prognostic factors for OS, while age and NPAR were independent predictors for PFS (all P<.001). The prognostic models based on these variables demonstrated good predictive performance, with concordance index values of 0.731 and 0.763 for the training and validation cohorts in the OS model, respectively. The PFS model also showed robust performance. Area under the curve values and calibration curves further supported the models' accuracy and stability. Risk stratification analysis revealed clear survival differences between risk groups (all P<.05), indicating strong clinical applicability.
Conclusions: This study is the first to identify preoperative NPAR as a significant prognostic biomarker for ADG using machine learning approaches. The prognostic model incorporating NPAR, platelet-to-mean platelet volume ratio, and age demonstrated favorable predictive performance, offering a novel perspective for accurate risk stratification and personalized treatment in patients with ADG.
Background: Electronic health records (EHRs) have the potential to improve service delivery through record keeping and monitoring health outcomes. As countries move toward universal health coverage, digital health tools such as EHRs are essential for achieving this goal. However, EHR implementation in middle-income countries like South Africa faces obstacles.
Objective: This study explores the reasons behind a stalled implementation of the electronic tick register (E-tick) system (an electronic version of a paper primary health care register to record services provided), using the Consolidated Framework for Implementation Research.
Methods: Using a qualitative design, in-depth interviews were conducted with 38 participants to explore their perceptions and experiences, and the factors surrounding the success and stalling of E-ticks. Participants included managers, stakeholders, implementers, and end users from the 3 implementation clinics. Data was collected using semistructured interview guides. The Thematic and Consolidated Framework for Implementation Research framework analysis (innovation, inner setting, individual characteristics, implementation process, and outer setting) was applied.
Results: The E-tick system was designed to improve data quality in paper health registers, addressing inaccuracies in reporting to district and provincial health departments (Innovation domain). Implementers iteratively developed the system through user input from managers and clinicians, and stakeholder engagement of software developers, funders, health managers, and decision-makers from the provincial health department (individual characteristics). Although the system was initially well adopted by end users, it stalled primarily due to outer setting factors, which included a change of developers, funding cuts, and limited support at the provincial health department level due to capacity gaps, political appointments, and mistrust stemming from corruption and abuse of the tender system. Moreover, resistance to leveraging lessons from locally developed small-scale systems further constrained institutional support for the E-tick.
Conclusions: Although successful implementation of EHRs can be facilitated by strong user engagement and co-design, outer setting factors such as governance, funding, and policy alignment can pose significant threats to sustainability. This underscores the importance of effective synergy between top-down and bottom-up processes for successful implementation.
Unlabelled: Retrieval-augmented generation (RAG) systems have emerged as a powerful technique to enhance the capabilities of large language models by enabling them to access external, up-to-date knowledge in real time, and RAG systems are being increasingly adopted by researchers in the medical field. In this viewpoint article, we explore the ethical imperatives for implementing RAG systems in clinical nursing environments, with particular attention to how these technologies affect patient care quality and safety. The purpose of this paper is to examine the ethical risks introduced by RAG-enhanced large language models in clinical nursing and to propose strategic guidelines for their responsible implementation. Key considerations include ensuring accuracy, fairness, transparency, and accountability, as well as maintaining essential human oversight, as discussed through a structured analysis. We argue that robust data governance, explainable artificial intelligence (AI) techniques, and continuous monitoring are critical components of a responsible RAG implementation strategy. Ultimately, realizing the benefits of RAG while mitigating ethical concerns requires sustained collaboration among health care professionals, AI developers, and policymakers, fostering a future where AI supports patient safety, reduces disparities, and improves the quality of nursing care.
Background: Considering sex and gender improves research quality, innovation, and social equity, while ignoring them leads to inaccuracies and inefficiency in study results. Despite increasing attention on sex- and gender-sensitive medicine, challenges remain with accurately representing gender due to its dynamic and context-specific nature.
Objective: This work aims to contribute to the implementation of a standard for collecting and assessing gender-specific data in German university hospitals and associated research facilities.
Methods: We carried out a review to identify and categorize state-of-the-art gender scores. We systematically assessed 22 publications regarding the applicability and practicability of their proposed gender scores. Specifically, we evaluated the use of these gender scores on German research data from routine clinical practice, using the Medical Informatics Initiative core dataset (MII CDS).
Results: Different methods for assessing gender have been proposed, but no standardized and validated gender score is available for health research. Most gender scores target epidemiological or public health research where questions about social aspects and life habits are already part of the questionnaires. However, it is challenging to apply concepts for gender scoring on clinical data. The MII CDS, for example, lacks all variables currently being recorded in gender scores. Although some of the required variables are indeed present in routine clinical data, they need to become part of the MII CDS.
Conclusions: To enable gender-specific retrospective analysis of routine clinical data, we recommend updating and expanding the MII CDS by including more gender-relevant information. For this purpose, we provide concrete action steps on how gender-related variables can be captured in routine clinical practice and represented in a machine-readable way.

