Aim: The aim of the study is to introduce a new model, OsteoFusionFormer, namely a dual transformer model for automatic classification of knee osteoporosis into three groups: Normal, Osteopenia, and Osteoporosis. The objective was to overcome single-branch transformer limitations by incorporating anatomical global context and fine-grained bone features to increase diagnostic accuracy.
Method: OsteoFusionFormer combines two parallel arms, a Vision Transformer (ViT) for global anatomical representation and a Bone-Aware Transformer (BAT) for localised bone-specific features. These are combined with a hierarchical dual fusion strategy. First, cross-attention enables feature-level fusion between ViT and BAT embeddings. Next, a confidence-weighted decision-level fusion employs an auxiliary gating network to compute adaptive weights (α1, α2), yielding a soft-ensemble prediction. Ablation studies systematically remove modules (ViT, BAT, gating, CLAHE) to assess contributions. Interpretability is assessed via attention maps and Grad-CAM++.
Results: OsteoFusionFormer had 96.8% overall accuracy, which exceeds the accuracy of ViT-only (91.3%), BAT-only (90.1%), and late fusion averaging (93.6%). Ablation verified a drop in performance without BAT (-5.5%), ViT (-6.7%), gating (-3.2%), and CLAHE (-4.4%). Performance was verified with 15 new kinds of bone-specific indicators: Bone-Aware Accuracy: 96.8%, Trabecular Sensitivity Index: 95.2%, Cortical Degeneration Detection Rate: 97.6%, Joint Space Narrowing Recall: 96.1%, Bone Class Specificity: 97.2%, Osteopenia Detection Precision: 92.4%, Bone Focus Ratio: 91.8%, Bone Entropy Index: 0.26 bits, Visual interpretability showed good expert agreement (BIAS: 87.3%).
Conclusion: By combining global and local bone features, OsteoFusionFormer provides better accuracy, diagnosis sensitivity, and structure focus with an explainability guarantee.
Background: The use of cemented versus cementless components in total knee arthroplasty (TKA) remains a subject of ongoing debate, with a recent rise in cementless TKA. In this study, we assessed the robustness of outcomes reported in randomized controlled trials (RCTs) comparing cemented and cementless components in TKA.
Methods: PubMed, Embase, and Medline were queried from January 1, 2010-February 28, 2024 for RCTs with intervention arms stratified for cemented and cementless TKA components. The fragility index (FI) and reverse fragility index (rFI) were defined as the number of outcome event reversals needed to alter outcome significance. The fragility quotient (FQ) was determined by dividing the FI or rFI by the study sample size.
Results: From 176 screened RCTs, 13 studies met inclusion criteria, yielding 48 total outcomes. There were 13 statistically significant outcomes which had a median FI of 4. Thus, the significance of these outcomes may be lost by 4 patient outcome event reversals. Furthermore, an outcome event reversal in 4.1% of patients (median FQ 0.041) may be sufficient to reverse statistically significant outcomes. The median rFI across the 35 non-significant outcomes was found to be 5. In a subanalysis by outcome type, outcomes relating to component migration were found to be most fragile with a median FI of just 2. The median FI for radiolucency and complication outcomes were both equal to 5, while the median FI's for function and satisfaction were both 8. In 46% of outcomes, the number of patients lost to follow-up exceeded the outcome FI/rFI.
Conclusion: RCTs comparing cemented versus cementless TKA have considerable clinical implications in surgical decision-making, yet the outcomes in these studies are statistically fragile. Standardized reporting of FI, rFI, and FQ metrics may facilitate a more comprehensive assessment of the stability of study outcomes in TKA RCTs.
Purpose: To evaluate the reliability and clinical applicability of the three most commonly used large language models (LLMs) (ChatGPT, Gemini, and Claude) and a domain-specific artificial intelligence (AI) platform (OpenEvidence) in providing recommendations for acute isolated meniscal pathology, to compare their accuracy, and to assess the consistency between American Academy of Orthopedic Surgeons (AAOS) Clinical Practice Guidelines (CPG) recommendations and AI-generated guidance.
Methods: An exploratory cross-sectional benchmarking analysis evaluated concordance of three large language models (ChatGPT, Gemini, Claude) and one domain-specific AI (OpenEvidence) with 2024 AAOS clinical practice guidelines for acute isolated meniscal pathology. Nine guideline recommendations were converted into standardized questions and presented to each AI model on the same day. Three sports medicine orthopedic specialists independently assessed responses as concordant or discordant, with disagreements resolved by majority decision. Statistical analysis used SPSS 29, employing Cochran's Q test for concordance assessment and Fleiss' kappa for inter-rater reliability.
Results: OpenEvidence achieved perfect concordance (9/9, 100%), followed by ChatGPT (8/9, 89%), Gemini and Claude (both 7/9, 78%). Overall concordance rate was 86% (31/36). Concordance was 100% for strong and consensus recommendations, 75% for moderate and limited recommendations. Cochran's Q test showed no significant difference among models (Q = 3.00, p = 0.392). Inter-rater reliability demonstrated almost perfect agreement (κ = 0.825, 95% CI: 0.637-1.014).
Conclusions: Although ChatGPT, Gemini, and Claude demonstrated high concordance with the AAOS CPG for acute isolated meniscal pathology, their responses were not consistently guideline-concordant. OpenEvidence achieved the highest descriptive concordance rate (100%); however, statistical superiority could not be established due to the limited number of guideline items. This exploratory benchmarking analysis suggests that domain-specific AI models may represent a valuable tool for retrieving information on acute isolated meniscal injuries.
Clinical relevance: The difference between domain-specific AI model and general LLMs underscore the need to educate the general public and clinicians about the limitations of general-purpose chatbots, emphasizing that LLM outputs should be interpreted with caution in real-world practice, while tools like OpenEvidence exist for evidence-based information.
Background (including aims of the study): Reliable quantification of dynamic anterior tibia translation and internal tibia rotation is essential for advancing anterior cruciate ligament research beyond static clinical assessments. While optoelectronic motion capture combined with functional calibration techniques has been proposed as an accessible approach, evidence regarding its robustness across examiners, days, and task demands remains limited. This commentary aims to critically appraise the methodological and translational implications of recently reported reliability estimates for dynamic tibiofemoral measurements during walking and jump-landing tasks.
Method: A focused methodological evaluation was undertaken, examining the analytical framework used to assess inter-examiner and inter-day reliability, the interpretation of time-series intraclass correlation coefficients and standard error of measurement, and the alignment of these metrics with clinically meaningful inference. Particular attention was paid to task dependency, biological versus methodological variability, and the relevance of reliability estimates across different phases of movement.
Results: The reported findings indicate generally good-to-excellent reliability for dynamic tibiofemoral measures, with superior consistency during high-load phases of jump landings. However, reliance on pointwise time-normalized metrics may obscure phase-specific variability, and relative error estimates expressed as percentages of range of motion lack direct clinical anchoring. Differences between inter-examiner and inter-day reliability further suggest sensitivity to behavioral and task-execution factors that were not formally quantified.
Conclusion: The evaluated approach represents an important step toward accessible assessment of dynamic knee stability. Future studies may benefit from integrating event-based reliability metrics, contextualizing measurement error against clinically relevant thresholds, and accounting for variability in movement strategy to enhance translational applicability in anterior cruciate ligament research.
Background: Total knee arthroplasty (TKA) is a prevalent orthopaedic procedure often accompanied by significant postoperative pain. Central sensitization, marked by increased nervous system reactivity, may influence opioid consumption variability after TKA. Investigating central sensitization, particularly regarding sex differences, could enhance postoperative care and opioid prescribing practices for TKA patients.
Methods: Patients scheduled for TKA were assessed preoperatively and at 2 and 6 weeks postoperatively. Data collected included demographics, PROMIS-29, Brief Pain Inventory (BPI), Pain Catastrophizing Scale, Central Sensitization Inventory (CSI), and quantitative sensory testing. Participants recorded opioid use in home diaries over 6 weeks, measured in morphine milligram equivalents (MME) and days of use. Analyses involved descriptive statistics, sex differences, correlations, and regression models.
Results: Thirty-nine participants (59% females) were enrolled. CSI scores indicated sub-clinical central sensitization in a majority of patients. No sex differences were observed in patient-reported outcomes, but males consumed significantly more opioids than females (median MME: 248 vs 519, p = 0.023), with similar durations of use (median days: 18 vs 16, p = 0.50). Pain levels correlated with opioid use duration (rs = 0.50, p = 0.001), stronger in males (rs = 0.69) than females (rs = 0.33). Central sensitization correlated with opioid use duration in males (rs = 0.81, p < 0.001), unlike females (rs = -0.11, p = 0.64).
Conclusion: Central sensitization significantly impacts opioid use in males post-TKA, challenging existing beliefs. Males with preoperative CSI scores indicating central sensitization showed sustained opioid use above norms, suggesting CSI as a predictive tool for postoperative opioid consumption. Recognizing sex-specific differences in central sensitization could improve pain management and opioid prescribing in TKA.

