To evaluate the feasibility of correlation-weighted averaging factor (CWAF) in liver diffusion-weighted imaging (DWI).
Materials and methods
This prospective study included 103 participants who underwent liver MRI. DWI were reconstructed using both original data (DWIOriginal) and CWAF-corrected data (DWICWAF). Two radiologists independently assessed high-b DWI images for overall image quality, image noise, hepatic edge sharpness, and lesion conspicuity in the right and left lobes using five-point scales. Signal intensity ratio (SIR) and apparent diffusion coefficient (ADC) values were measured in four hepatic segments and in liver lesions, with lesion measurements analyzed separately for each lobe. These parameters were compared between the two image sets.
Results
The scores for overall image quality (P < 0.001), image noise (P < 0.001), and hepatic edge sharpness in the right lobe (P = 0.001) were higher in DWIOriginal compared with DWICWAF. In contrast, hepatic edge sharpness (P < 0.001) and lesion conspicuity (P < 0.001) in the left lobe were superior in DWICWAF. Liver and lesion SIRs were higher in DWICWAF across all segments than in DWIOriginal (P < 0.007). Liver ADC values were lower in DWICWAF than in DWIOriginal in all segments (P < 0.001). Lesion ADC values were also lower in DWICWAF than in DWIOriginal in the right lobe (P < 0.001) but were not different in the left lobe (P = 0.48).
Conclusion
CWAF improved hepatic edge sharpness and lesion conspicuity in the left lobe, although overall image quality was slightly reduced. ADC values were generally lower in DWICWAF than in DWIOriginal.
{"title":"Enhancing liver diffusion-weighted imaging quality with correlation-weighted averaging: notable benefits in the left hepatic lobe","authors":"Tetsuro Kaga , Yoshifumi Noda , Masashi Asano , Nobuyuki Kawai , Shingo Omata , Yukiko Takai , Satoshi Ido , Kimihiro Kajita , Abdelazim Elsayed Elhelaly , Hirohiko Imai , Hiroki Kato , Masayuki Matsuo","doi":"10.1016/j.ejrad.2026.112680","DOIUrl":"10.1016/j.ejrad.2026.112680","url":null,"abstract":"<div><h3>Purpose</h3><div>To evaluate the feasibility of correlation-weighted averaging factor (CWAF) in liver diffusion-weighted imaging (DWI).</div></div><div><h3>Materials and methods</h3><div>This prospective study included 103 participants who underwent liver MRI. DWI were reconstructed using both original data (DWI<sub>Original</sub>) and CWAF-corrected data (DWI<sub>CWAF</sub>). Two radiologists independently assessed high-<em>b</em> DWI images for overall image quality, image noise, hepatic edge sharpness, and lesion conspicuity in the right and left lobes using five-point scales. Signal intensity ratio (SIR) and apparent diffusion coefficient (ADC) values were measured in four hepatic segments and in liver lesions, with lesion measurements analyzed separately for each lobe. These parameters were compared between the two image sets.</div></div><div><h3>Results</h3><div>The scores for overall image quality (<em>P</em> < 0.001), image noise (<em>P</em> < 0.001), and hepatic edge sharpness in the right lobe (<em>P</em> = 0.001) were higher in DWI<sub>Original</sub> compared with DWI<sub>CWAF</sub>. In contrast, hepatic edge sharpness (<em>P</em> < 0.001) and lesion conspicuity (<em>P</em> < 0.001) in the left lobe were superior in DWI<sub>CWAF</sub>. Liver and lesion SIRs were higher in DWI<sub>CWAF</sub> across all segments than in DWI<sub>Original</sub> (<em>P</em> < 0.007). Liver ADC values were lower in DWI<sub>CWAF</sub> than in DWI<sub>Original</sub> in all segments (<em>P</em> < 0.001). Lesion ADC values were also lower in DWI<sub>CWAF</sub> than in DWI<sub>Original</sub> in the right lobe (<em>P</em> < 0.001) but were not different in the left lobe (<em>P</em> = 0.48).</div></div><div><h3>Conclusion</h3><div>CWAF improved hepatic edge sharpness and lesion conspicuity in the left lobe, although overall image quality was slightly reduced. ADC values were generally lower in DWI<sub>CWAF</sub> than in DWI<sub>Original</sub>.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"196 ","pages":"Article 112680"},"PeriodicalIF":3.3,"publicationDate":"2026-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146074550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.ejrad.2026.112688
Joanna F Dipnall, Thomas O'Donnell, Richard S Page, Raphael Hau, Richard de Steiger, Andrew Bucknill, Andrew Oppy, Elton Edwards, Dinesh Varma, Ronan A Lyons, Peter Cameron, William Veitch, Emily Doole, Berwout Wiltschut, Leah Sleaby, Adil Zia, Robin Lee, Belinda J Gabbe
Purpose: This study used different metrics to assess the reliability of radiology text and images in Distal Radial Fractures (DRF) classifications using classifiers with varying levels of experience.
Methods: A random sample of 534 patients (16 + years) admitted to two major trauma centres for > 24 h for DRF management with 1,269 radiology images and radiology text reports were reviewed. Eight classifiers, with varying levels of experience, were randomly assigned patients, with overlap, to classify four different DRF classifications, nine radiological features and one treatment type: (two interns (802 text/images), three registrars (1,079 text/images), three orthopaedic trauma specialists (740 text/images)). The agreement measures utilised were: Percentage agreement (PA), Brennan/ Prediger coefficient, Cohen/Conger Kappa, Fleiss kappa, Gwet's AC, Krippendorff's alpha coefficient; all with 95% confidence intervals.
Results: For DRF classifications, the ulnar fracture (81%, 77%-86%) then AO Level 1 (67%, 60%-74%) had the highest PA; AO Level 3 had the lowest (29%, 23%-34%). For radiological features: highest PA was the presence/absence of tear drop/volar rim fragment (97%, 96%-98%) and severe dorsal comminution (97%, 96%-98%); lowest was ulnar variance (70%, 57%-83%). Treatment had high PA (96%, 87%-100%). Differences across classifier experience were not significant.
Conclusions: Even with descriptive texts from the radiology reports and x-ray images, DRF classification is complex and classifier experience not affecting classification. Generally, above fair agreement and interrater reliability was achieved, but the type and complexity of the classification task and the choice of agreement coefficient were important considerations in the reporting of agreement and reliability of the data.
{"title":"Inter-rater reliability of a classification systems for distal radius fractures using radiology text and x-rays: what really matters?","authors":"Joanna F Dipnall, Thomas O'Donnell, Richard S Page, Raphael Hau, Richard de Steiger, Andrew Bucknill, Andrew Oppy, Elton Edwards, Dinesh Varma, Ronan A Lyons, Peter Cameron, William Veitch, Emily Doole, Berwout Wiltschut, Leah Sleaby, Adil Zia, Robin Lee, Belinda J Gabbe","doi":"10.1016/j.ejrad.2026.112688","DOIUrl":"https://doi.org/10.1016/j.ejrad.2026.112688","url":null,"abstract":"<p><strong>Purpose: </strong>This study used different metrics to assess the reliability of radiology text and images in Distal Radial Fractures (DRF) classifications using classifiers with varying levels of experience.</p><p><strong>Methods: </strong>A random sample of 534 patients (16 + years) admitted to two major trauma centres for > 24 h for DRF management with 1,269 radiology images and radiology text reports were reviewed. Eight classifiers, with varying levels of experience, were randomly assigned patients, with overlap, to classify four different DRF classifications, nine radiological features and one treatment type: (two interns (802 text/images), three registrars (1,079 text/images), three orthopaedic trauma specialists (740 text/images)). The agreement measures utilised were: Percentage agreement (PA), Brennan/ Prediger coefficient, Cohen/Conger Kappa, Fleiss kappa, Gwet's AC, Krippendorff's alpha coefficient; all with 95% confidence intervals.</p><p><strong>Results: </strong>For DRF classifications, the ulnar fracture (81%, 77%-86%) then AO Level 1 (67%, 60%-74%) had the highest PA; AO Level 3 had the lowest (29%, 23%-34%). For radiological features: highest PA was the presence/absence of tear drop/volar rim fragment (97%, 96%-98%) and severe dorsal comminution (97%, 96%-98%); lowest was ulnar variance (70%, 57%-83%). Treatment had high PA (96%, 87%-100%). Differences across classifier experience were not significant.</p><p><strong>Conclusions: </strong>Even with descriptive texts from the radiology reports and x-ray images, DRF classification is complex and classifier experience not affecting classification. Generally, above fair agreement and interrater reliability was achieved, but the type and complexity of the classification task and the choice of agreement coefficient were important considerations in the reporting of agreement and reliability of the data.</p>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"196 ","pages":"112688"},"PeriodicalIF":3.3,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146124464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.ejrad.2026.112666
Haotian Yuan , Lin Yuan , Jiapeng Chen , Naixu Shi , Detong Lin , Xinyu Wang , Chenfei Kong , Xiaofeng Wang
Head and neck squamous cell carcinoma (HNSCC) is a highly heterogeneous malignancy characterized by altered lactate metabolism, where traditional prognostic indicators are insufficient for precision medicine. This study aimed to construct an enhanced CT radiomics model integrated with lactate metabolism gene-related (LMGR) genomic signatures for HNSCC prognosis using TCGA and TCIA databases. A cohort of 399 HNSCC patients was analyzed. Analysis of 204 lactate-related genes identified 24 differentially expressed LMGR genes (DELMGR). Univariate Cox regression revealed that among these, PKLR, IL19, and CXCL9 exhibited protective effects (HR = 0.932, 0.885, and 0.931, respectively). A lactate classification score (LCS) was derived from the analysis of these three genes, demonstrating a significant correlation with overall survival (OS) in both univariate (HR = 1.807, 95 % CI: 1.346–2.424, P < 0.001) and multivariate assessments (HR = 1.772, 95 % CI: 1.296–2.424, P < 0.001). From enhanced CT images, 2060 radiomic features were extracted. Subsequently, after feature selection using mRMR and RFE algorithms, a support vector machine (SVM) model was built to predict LCS, which generated a radiomics score (RS). The model demonstrated AUC values of 0.773 and 0.760 in the training and validation datasets, respectively. The RS distribution significantly differed between lactate subtypes in the training cohort (P < 0.001), with specifically higher RS in the high-risk LCS group. High RS was associated with poor OS (HR = 3.582, 95 % CI: 1.240–10.348, P = 0.018) and was correlated with clinical features such as the perineural invasion and the margin status. Mechanistic analysis indicated that the high RS group was enriched in an immunosuppressive microenvironment and was associated with fatty acid metabolism pathways. This enhanced CT-based radiomics model effectively predicts lactate-based stratification, demonstrating potential prognostic value in HNSCC and providing novel biomarkers as well as a non-invasive predictive tool for prognostic assessment.
头颈部鳞状细胞癌(HNSCC)是一种高度异质性的恶性肿瘤,其特征是乳酸代谢改变,传统的预后指标不足以用于精准医学。本研究旨在利用TCGA和TCIA数据库,构建结合乳酸代谢基因相关(LMGR)基因组特征的HNSCC预后增强CT放射组学模型。对399例HNSCC患者进行队列分析。对204个乳酸相关基因进行分析,鉴定出24个差异表达LMGR基因(DELMGR)。单因素Cox回归结果显示,其中PKLR、IL19和CXCL9具有保护作用(HR分别为0.932、0.885和0.931)。通过对这三个基因的分析得出乳酸盐分类评分(LCS),显示单因素(HR = 1.807, 95% CI: 1.346-2.424, P < 0.001)和多因素评估(HR = 1.772, 95% CI: 1.296-2.424, P < 0.001)与总生存率(OS)有显著相关性。从增强CT图像中提取2060个放射学特征。随后,在使用mRMR和RFE算法进行特征选择后,建立支持向量机(SVM)模型来预测LCS,并生成放射组学评分(RS)。该模型在训练集和验证集上的AUC分别为0.773和0.760。训练队列中不同乳酸亚型的RS分布差异显著(P < 0.001),其中高危LCS组RS更高。RS高与OS差相关(HR = 3.582, 95% CI: 1.240 ~ 10.348, P = 0.018),并与神经周围侵袭、切缘状况等临床特征相关。机制分析表明,高RS组在免疫抑制微环境中富集,与脂肪酸代谢途径有关。这种增强的基于ct的放射组学模型有效地预测了基于乳酸盐的分层,显示了HNSCC的潜在预后价值,并提供了新的生物标志物以及用于预后评估的非侵入性预测工具。
{"title":"Development and validation of a radiomics model for lactate metabolism genes-based stratification and prognostic prediction in head and neck squamous cell carcinoma","authors":"Haotian Yuan , Lin Yuan , Jiapeng Chen , Naixu Shi , Detong Lin , Xinyu Wang , Chenfei Kong , Xiaofeng Wang","doi":"10.1016/j.ejrad.2026.112666","DOIUrl":"10.1016/j.ejrad.2026.112666","url":null,"abstract":"<div><div>Head and neck squamous cell carcinoma (HNSCC) is a highly heterogeneous malignancy characterized by altered lactate metabolism, where traditional prognostic indicators are insufficient for precision medicine. This study aimed to construct an enhanced CT radiomics model integrated with lactate metabolism gene-related (LMGR) genomic signatures for HNSCC prognosis using TCGA and TCIA databases. A cohort of 399 HNSCC patients was analyzed. Analysis of 204 lactate-related genes identified 24 differentially expressed LMGR genes (DELMGR). Univariate Cox regression revealed that among these, PKLR, IL19, and CXCL9 exhibited protective effects (HR = 0.932, 0.885, and 0.931, respectively). A lactate classification score (LCS) was derived from the analysis of these three genes, demonstrating a significant correlation with overall survival (OS) in both univariate (HR = 1.807, 95 % CI: 1.346–2.424, P < 0.001) and multivariate assessments (HR = 1.772, 95 % CI: 1.296–2.424, P < 0.001). From enhanced CT images, 2060 radiomic features were extracted. Subsequently, after feature selection using mRMR and RFE algorithms, a support vector machine (SVM) model was built to predict LCS, which generated a radiomics score (RS). The model demonstrated AUC values of 0.773 and 0.760 in the training and validation datasets, respectively. The RS distribution significantly differed between lactate subtypes in the training cohort (P < 0.001), with specifically higher RS in the high-risk LCS group. High RS was associated with poor OS (HR = 3.582, 95 % CI: 1.240–10.348, P = 0.018) and was correlated with clinical features such as the perineural invasion and the margin status. Mechanistic analysis indicated that the high RS group was enriched in an immunosuppressive microenvironment and was associated with fatty acid metabolism pathways. This enhanced CT-based radiomics model effectively predicts lactate-based stratification, demonstrating potential prognostic value in HNSCC and providing novel biomarkers as well as a non-invasive predictive tool for prognostic assessment.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"196 ","pages":"Article 112666"},"PeriodicalIF":3.3,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146035334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.ejrad.2026.112687
Lei Cai , Xiaoyu Tong , Jingyi Ju , Zhaoyang Li , Deying Wen , Shuang Liu , Huilou Liang , Yufang Wang , Jiayu Sun
Rationale and Objectives
To investigate the diagnostic value of the enhancement slope in an 18-second ultrafast dynamic contrast-enhanced MRI (DCE-MRI) using differential subsampling with Cartesian ordering (DISCO) in quantifying Crohn’s Disease (CD) activity.
Materials and Methods
In a prospective cohort, 41CD patients (141 diseased segments) underwent endoscopy and 3.0 T magnetic resonance enterography (MRE). The DISCO sequence was employed for ultrafast DCE scanning. Using endoscopic results as the gold standard, the slope of the dynamic enhancement curve (K) and the magnetic resonance index of activity (MaRIA) were calculated. The correlations between the K value, MaRIA, relative contrast enhancement (RCE), and simple endoscopic activity score for Crohn’s disease (SES-CD) were analyzed. Diagnostic performance for categorizing CD activity (remission, mild, moderate–severe) was assessed by receiver operating characteristic (ROC) curve. To optimize the MaRIA for CD activity assessment, a modified MaRIA index was constructed by substituting the RCE in the original MaRIA with the K value. The diagnostic efficacy of this modified MaRIA was further validated, and its performance was compared with the original MaRIA to verify its clinical utility in distinguishing different CD activity states.
Results
The K showed positive correlation with the SES-CD score (r = 0.77, P < 0.001) and the MaRIA score (r = 0.70, P < 0.001), while the correlation between RCE and the SES-CD score was relatively weak (r = 0.54, P < 0.001). For diagnosing moderate-to-severe CD, the K showed an AUC of 0.916 (95% CI: 0.869, 0.950); when using the clinically relevant cutoff of 22.64, it yielded a sensitivity of 87.56% and a specificity of 83.44%. Notably, there was no significant difference in diagnostic performance between K and MaRIA. However, the AUC for diagnosing remission and mild was 0.609 (95% CI: 0.512, 0.701) and 0.889 (95% CI: 0.829, 0.934), respectively, slightly lower than that of MaRIA. The modified-MaRIA score demonstrated high diagnostic efficacy in differentiating remission-phase, mild, and moderate-to-severe CD, with AUC values of 0.947 (95% CI: 0.902, 0.970), 0.964 (95% CI: 0.922, 0.987), and 0.981 (95% CI: 0.936, 0.998), respectively. Additionally, its sensitivity and specificity both exceed 85%.
Conclusion
The 18-second ultrafast DCE-MRI enhancement slope streamlines workflow while serving as a robust noninvasive biomarker for CD activity. This methodology exhibits strong diagnostic efficacy in distinguishing mild and moderate-to-severe Crohn’s disease. Furthermore, incorporating K into MaRIA enhances the detection of remission-phase CD.
{"title":"Enhancement slope of ultrafast dynamic contrast-enhanced MRI: a promising biomarker for assessing Crohn’s disease activity","authors":"Lei Cai , Xiaoyu Tong , Jingyi Ju , Zhaoyang Li , Deying Wen , Shuang Liu , Huilou Liang , Yufang Wang , Jiayu Sun","doi":"10.1016/j.ejrad.2026.112687","DOIUrl":"10.1016/j.ejrad.2026.112687","url":null,"abstract":"<div><h3>Rationale and Objectives</h3><div>To investigate the diagnostic value of the enhancement slope in an 18-second ultrafast dynamic contrast-enhanced MRI (DCE-MRI) using differential subsampling with Cartesian ordering (DISCO) in quantifying Crohn’s Disease (CD) activity.</div></div><div><h3>Materials and Methods</h3><div>In a prospective cohort, 41CD patients (141 diseased segments) underwent endoscopy and 3.0 T magnetic resonance enterography (MRE). The DISCO sequence was employed for ultrafast DCE scanning. Using endoscopic results as the gold standard, the slope of the dynamic enhancement curve (K) and the magnetic resonance index of activity (MaRIA) were calculated. The correlations between the K value, MaRIA, relative contrast enhancement (RCE), and simple endoscopic activity score for Crohn’s disease (SES-CD) were analyzed. Diagnostic performance for categorizing CD activity (remission, mild, moderate–severe) was assessed by receiver operating characteristic (ROC) curve. To optimize the MaRIA for CD activity assessment, a modified MaRIA index was constructed by substituting the RCE in the original MaRIA with the K value. The diagnostic efficacy of this modified MaRIA was further validated, and its performance was compared with the original MaRIA to verify its clinical utility in distinguishing different CD activity states.</div></div><div><h3>Results</h3><div>The K showed positive correlation with the SES-CD score (r = 0.77, P < 0.001) and the MaRIA score (r = 0.70, P < 0.001), while the correlation between RCE and the SES-CD score was relatively weak (r = 0.54, P < 0.001). For diagnosing moderate-to-severe CD, the K showed an AUC of 0.916 (95% CI: 0.869, 0.950); when using the clinically relevant cutoff of 22.64, it yielded a sensitivity of 87.56% and a specificity of 83.44%. Notably, there was no significant difference in diagnostic performance between K and MaRIA. However, the AUC for diagnosing remission and mild was 0.609 (95% CI: 0.512, 0.701) and 0.889 (95% CI: 0.829, 0.934), respectively, slightly lower than that of MaRIA. The modified-MaRIA score demonstrated high diagnostic efficacy in differentiating remission-phase, mild, and moderate-to-severe CD, with AUC values of 0.947 (95% CI: 0.902, 0.970), 0.964 (95% CI: 0.922, 0.987), and 0.981 (95% CI: 0.936, 0.998), respectively. Additionally, its sensitivity and specificity both exceed 85%.</div></div><div><h3>Conclusion</h3><div>The 18-second ultrafast DCE-MRI enhancement slope streamlines workflow while serving as a robust noninvasive biomarker for CD activity. This methodology exhibits strong diagnostic efficacy in distinguishing mild and moderate-to-severe Crohn’s disease. Furthermore, incorporating K into MaRIA enhances the detection of remission-phase CD.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"196 ","pages":"Article 112687"},"PeriodicalIF":3.3,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146074485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.ejrad.2026.112689
Jianliang Lu , Keith Wan-Hang Chiu , Chelsea Chan , Ho-Ming Cheng , Jian Zhou , Justin Christopher NG , Fanny Fong Yi Tang , Wai Kuen Kan , Philip Leung Ho Yu , Wai-Kay Seto
Background
This study evaluates the performance of a general-purpose (GPT-4) and a medically fine-tuned (Med-LM) large language model (LLM) in classifying liver lesions from unstructured Computed Tomography (CT) reports.
Methods
Consecutive CT reports (2014–2020) from five institutions were input into GPT-4 and Med-LM with simple (sp) and optimised (op) prompts. Lesion- and patient-level performance were benchmarked against LI-RADS scores assigned by two radiologists, and report quality was analysed using a 5-point Likert scale.
Results
A total of 296 CT reports (mean age, 64.6 years ± 11.3 [SD]; 193 men; 654 lesions) were included. Lesion- and patient-level accuracies for LI-RADS scoring ranged from 40.8% (Med-LMsp) to 61.3% (Med-LMop) and from 27.7% (Med-LMsp) to 52.4% (Med-LMop), respectively. When dichotomized into malignant and benign lesions, lesion- and patient-level accuracies rose to 56.1% (GPT-4sp) − 82.3% (Med-LMop) and 71.3% (Med-LMsp) – 86.5% (Med-LMop). Med-LMop demonstrated the highest performance in all analyses and was statistically superior to other models (all p < 0.001). Non-classification rates ranged between 12.7% (Med-LMop) and 40.5% (GPT-4sp), particularly for benign lesions. Kappa values were weak to moderate between the two reviewers in different aspects of report quality (0.471–0.766), and Likert scores for lesion information differed significantly between correctly and incorrectly classified lesions (all p ≤ 0.04). Repeatability varied widely from 12.7% (Med-LMop) to 39.0% (GPT-4sp).
Conclusions
Med-LM outperforms GPT-4 in classifying liver lesions from unstructured CT reports with both models better at detecting malignancy than full LI-RADS classification. However, high misclassification rates and inconsistent repeatability hinder their clinical use.
{"title":"Characterising liver lesions from free-text computer tomography reports – A real-world multicentre analysis","authors":"Jianliang Lu , Keith Wan-Hang Chiu , Chelsea Chan , Ho-Ming Cheng , Jian Zhou , Justin Christopher NG , Fanny Fong Yi Tang , Wai Kuen Kan , Philip Leung Ho Yu , Wai-Kay Seto","doi":"10.1016/j.ejrad.2026.112689","DOIUrl":"10.1016/j.ejrad.2026.112689","url":null,"abstract":"<div><h3>Background</h3><div>This study evaluates the performance of a general-purpose (GPT-4) and a medically fine-tuned (Med-LM) large language model (LLM) in classifying liver lesions from unstructured Computed Tomography (CT) reports.</div></div><div><h3>Methods</h3><div>Consecutive CT reports (2014–2020) from five institutions were input into GPT-4 and Med-LM with<!--> <!-->simple (sp)<!--> <!-->and<!--> <!-->optimised (op) prompts. Lesion-<!--> <!-->and<!--> <!-->patient-level performance were benchmarked against LI-RADS scores assigned by two radiologists, and report quality was analysed using a<!--> <!-->5-point Likert scale.</div></div><div><h3>Results</h3><div>A total of 296 CT reports (mean age, 64.6 years ± 11.3 [SD]; 193 men; 654 lesions) were included. Lesion- and patient-level accuracies for LI-RADS scoring ranged from 40.8% (Med-LMsp) to 61.3% (Med-LMop) and from 27.7% (Med-LMsp) to 52.4% (Med-LMop), respectively. When dichotomized into malignant and benign lesions, lesion- and patient-level accuracies rose to 56.1% (GPT-4sp) − 82.3% (Med-LMop) and 71.3% (Med-LMsp) – 86.5% (Med-LMop). Med-LMop demonstrated the highest performance in all analyses and was statistically superior to other models (all <em>p</em> < 0.001). Non-classification rates ranged between 12.7% (Med-LMop) and 40.5% (GPT-4sp), particularly for benign lesions. Kappa values were weak to moderate between the two reviewers in different aspects of report quality (0.471–0.766), and Likert scores for lesion information differed significantly between correctly and incorrectly classified lesions (all <em>p</em> ≤ 0.04). Repeatability varied widely from 12.7% (Med-LMop) to 39.0% (GPT-4sp).</div></div><div><h3>Conclusions</h3><div>Med-LM outperforms GPT-4 in classifying liver lesions from unstructured CT reports with both models better at detecting malignancy than full LI-RADS classification. However, high misclassification rates and inconsistent repeatability hinder their clinical use.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"196 ","pages":"Article 112689"},"PeriodicalIF":3.3,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146035637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.ejrad.2026.112679
Yi Chen , Wei Liu , Tiansong Xie , Meng Gao , Jing Sun , Zehua Zhang , Lei Chen , Yu Wang , Jin Xu , Zhengrong Zhou
Importance
Occult lymph node metastasis (LNM) remains challenging to detect preoperatively in resectable pancreatic ductal adenocarcinoma (PDAC), yet has significant implications for treatment and prognosis.
Objective
To evaluate the predictive value of spectral CT–based habitat imaging and CT-diagnosed peripancreatic invasion for occult LNM in resectable PDAC, with pathological validation using collagen ratio.
Methods
This retrospective study included 113 patients with resectable PDAC who underwent triple-phase spectral CT before surgery. Occult LNM was defined as pathologically confirmed nodal metastasis without radiologically suspicious lymph nodes; nodes with short-axis diameter ≥10 mm were excluded. Tumors were segmented into subregions based on pancreatic-to-venous phase iodine concentration ratio (PVICR), and subregional volume fractions were quantified. Correlations with collagen ratio were assessed via Spearman analysis. Patients were divided into a training cohort (n = 79; 30 with occult LNM) and a validation cohort (n = 34; 12 with occult LNM). A logistic regression model with backward stepwise selection was developed and evaluated by receiver operating characteristic analysis.
Results
Four subregions were identified. Subregion 1, characterized by the lowest PVICR, showed a moderate negative correlation with collagen ratio (r = − 0.543, p < 0.001). The combined model incorporating the Subregion 1 fraction and CT-diagnosed peripancreatic invasion yielded areas under the curve of 0.836 (95% CI: 0.733–0.921) and 0.820 (95% CI: 0.661–0.955) in the training and validation cohorts, respectively.
Conclusions
Subregion 1 fraction and CT-diagnosed peripancreatic invasion enable accurate preoperative prediction of occult LNM in resectable PDAC.
{"title":"Spectral CT-Based habitat imaging for the prediction of occult lymph node metastasis in resectable pancreatic ductal Adenocarcinoma: Pathological validation via collagen ratio","authors":"Yi Chen , Wei Liu , Tiansong Xie , Meng Gao , Jing Sun , Zehua Zhang , Lei Chen , Yu Wang , Jin Xu , Zhengrong Zhou","doi":"10.1016/j.ejrad.2026.112679","DOIUrl":"10.1016/j.ejrad.2026.112679","url":null,"abstract":"<div><h3>Importance</h3><div>Occult lymph node metastasis (LNM) remains challenging to detect preoperatively in resectable pancreatic ductal adenocarcinoma (PDAC), yet has significant implications for treatment and prognosis.</div></div><div><h3>Objective</h3><div>To evaluate the predictive value of spectral CT–based habitat imaging and CT-diagnosed peripancreatic invasion for occult LNM in resectable PDAC, with pathological validation using collagen ratio.</div></div><div><h3>Methods</h3><div>This retrospective study included 113 patients with resectable PDAC who underwent triple-phase spectral CT before surgery. Occult LNM was defined as pathologically confirmed nodal metastasis without radiologically suspicious lymph nodes; nodes with short-axis diameter ≥10 mm were excluded. Tumors were segmented into subregions based on pancreatic-to-venous phase iodine concentration ratio (PVICR), and subregional volume fractions were quantified. Correlations with collagen ratio were assessed via Spearman analysis. Patients were divided into a training cohort (n = 79; 30 with occult LNM) and a validation cohort (n = 34; 12 with occult LNM). A logistic regression model with backward stepwise selection was developed and evaluated by receiver operating characteristic analysis.</div></div><div><h3>Results</h3><div>Four subregions were identified. Subregion 1, characterized by the lowest PVICR, showed a moderate negative correlation with collagen ratio (r = − 0.543, <em>p</em> < 0.001). The combined model incorporating the Subregion 1 fraction and CT-diagnosed peripancreatic invasion yielded areas under the curve of 0.836 (95% CI: 0.733–0.921) and 0.820 (95% CI: 0.661–0.955) in the training and validation cohorts, respectively.</div></div><div><h3>Conclusions</h3><div>Subregion 1 fraction and CT-diagnosed peripancreatic invasion enable accurate preoperative prediction of occult LNM in resectable PDAC.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"196 ","pages":"Article 112679"},"PeriodicalIF":3.3,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146009445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.ejrad.2026.112669
Dongfan Liu , Yingwei Sun , Lunhao Bai , Chunbo Deng
Background
Medial meniscal extrusion (MME) accelerates structural progression in knee osteoarthritis (KOA). While its biomechanical impact has been established, its relationship with subchondral bone denudation and the potential mediating role of synovitis remain unclear.
Purpose
This study aimed to investigate the association between MME in the absence of medial meniscal posterior root tears and the size of denuded areas of subchondral bone (dABs), and to evaluate whether synovitis mediates this relationship.
Methods
Data from the Foundation for the National Institutes of Health (FNIH) Osteoarthritis Biomarkers Consortium were analyzed. MME and synovitis (effusion-synovitis and Hoffa-synovitis) were assessed semi-quantitatively using the MRI Osteoarthritis Knee Score (MOAKS) system. The size of medial tibiofemoral dABs was quantified at baseline and 24-month follow-up. Linear regression models evaluated cross-sectional and longitudinal associations. Causal mediation analysis was conducted to quantify the proportion of the total effect of MME on dABs mediated by synovitis.
Results
A total of 520 participants were included. Cross-sectionally, both baseline MME (β = 2.03,95 % CI: 0.67, 3.38) and synovitis score (β = 0.97,95 % CI: 0.48, 1.46) were significantly associated with central medial femoral (cMF) dABs. Longitudinal analysis revealed significant correlations between MME and both 24-month cMF dABs (β = 3.41,95 % CI: 1.49,5.33) and 24-month medial tibial (MT) dABs (β = 0.98,95 % CI: 0.47,1.87). Furthermore, the 24-month synovitis score showed significant associations with both cMF dABs (β = 2.06,95 % CI: 1.38,2.74) and MT dABs (β = 0.88,95 % CI: 0.55,1.21). Mediation analysis indicated that synovitis mediated 20.1 % (95 % CI: 6.6, 71.1) of the effect of MME on baseline cMF dABs. 24-month synovitis mediated 20.38 % (95 % CI: 6.61, 44.50) of the effect of MME on 24-month cMF dABs and 16.86 % (95 % CI: 2.71, 42.16) of its effect on 24-month MT dABs.
Conclusion
MME and dABs showed significant correlations in both cross-sectional and longitudinal studies. Synovitis acted as a mediator between MME and dABs, suggesting that inflammatory pathways may be involved in the pathological mechanisms of MME promoting KOA progression.
{"title":"Synovitis mediates the association between medial meniscal extrusion and subchondral bone denudation in knee osteoarthritis: Data from the FNIH OA biomarkers consortium","authors":"Dongfan Liu , Yingwei Sun , Lunhao Bai , Chunbo Deng","doi":"10.1016/j.ejrad.2026.112669","DOIUrl":"10.1016/j.ejrad.2026.112669","url":null,"abstract":"<div><h3>Background</h3><div>Medial meniscal extrusion (MME) accelerates structural progression in knee osteoarthritis (KOA). While its biomechanical impact has been established, its relationship with subchondral bone denudation and the potential mediating role of synovitis remain unclear.</div></div><div><h3>Purpose</h3><div>This study aimed to investigate the association between MME in the absence of medial meniscal posterior root tears and the size of denuded areas of subchondral bone (dABs), and to evaluate whether synovitis mediates this relationship.</div></div><div><h3>Methods</h3><div>Data from the Foundation for the National Institutes of Health (FNIH) Osteoarthritis Biomarkers Consortium were analyzed. MME and synovitis (effusion-synovitis and Hoffa-synovitis) were assessed semi-quantitatively using the MRI Osteoarthritis Knee Score (MOAKS) system. The size of medial tibiofemoral dABs was quantified at baseline and 24-month follow-up. Linear regression models evaluated cross-sectional and longitudinal associations. Causal mediation analysis was conducted to quantify the proportion of the total effect of MME on dABs mediated by synovitis.</div></div><div><h3>Results</h3><div>A total of 520 participants were included. Cross-sectionally, both baseline MME (β = 2.03,95 % CI: 0.67, 3.38) and synovitis score (β = 0.97,95 % CI: 0.48, 1.46) were significantly associated with central medial femoral (cMF) dABs. Longitudinal analysis revealed significant correlations between MME and both 24-month cMF dABs (β = 3.41,95 % CI: 1.49,5.33) and 24-month medial tibial (MT) dABs (β = 0.98,95 % CI: 0.47,1.87). Furthermore, the 24-month synovitis score showed significant associations with both cMF dABs (β = 2.06,95 % CI: 1.38,2.74) and MT dABs (β = 0.88,95 % CI: 0.55,1.21). Mediation analysis indicated that synovitis mediated 20.1 % (95 % CI: 6.6, 71.1) of the effect of MME on baseline cMF dABs. 24-month synovitis mediated 20.38 % (95 % CI: 6.61, 44.50) of the effect of MME on 24-month cMF dABs and 16.86 % (95 % CI: 2.71, 42.16) of its effect on 24-month MT dABs.</div></div><div><h3>Conclusion</h3><div>MME and dABs showed significant correlations in both cross-sectional and longitudinal studies. Synovitis acted as a mediator between MME and dABs, suggesting that inflammatory pathways may be involved in the pathological mechanisms of MME promoting KOA progression.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"196 ","pages":"Article 112669"},"PeriodicalIF":3.3,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146017550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.ejrad.2026.112676
Bingjie Wu , Lingwei Wang , Yang Wang , Fan Liu , Xujie Gao , Wenpeng Wang , Bohan Xiao , Ying Liu
Objective
To investigate whether pre-treatment T2WI-based multiregional radiomics can predict the probability of post-treatment tumor deposit (TD) and prognostic outcomes in patients with resectable rectal cancer after neoadjuvant therapy.
Materials and methods
This retrospective study included 159 patients with pathologically confirmed rectal cancer who received neoadjuvant therapy and then underwent surgery from March 2013 to March 2024. Radiomics features were extracted from the intratumoral region, a 3-mm-region straddling the tumor margin, and peritumoral 3 mm region on pre-treatment T2WI images. Clinical-radiomics nomogram was developed based on the most predictive radiomics signatures and clinical risk factors. Prognostic model for 5-year recurrence-free survival (RFS) was constructed by Cox regression analysis.
Results
The nomogram integrating clinical risk factors (Tumor distance to anal margin and MRI-reported extramural vascular invasion (EMVI)) with an intra-straddle 3 mm radiomics signature score (radscore) demonstrated optimal predictive performance with area under the receiver operating characteristic curve (AUC) of 0.953 (95% CI: 0.877–0.988), 0.810 (95% CI: 0.629–0.928) and 0.952 (95% CI: 0.857–0.992) in the training cohort, validation cohort and test cohort, respectively. The prognostic model constructed by intra-straddle 3 mm radscore (hazard ratio [HR] = 3.60, 95% CI: 1.59–8.16) and MRI-reported EMVI (HR = 6.07, 95% CI: 2.51–14.63) showed good performance for predicting 5‑year RFS with AUC of 0.827 (95% CI: 0.772–0.890) in the entire cohort.
Conclusion
The nomogram, incorporating pre-treatment MRI-based intra-straddle 3 mm radscore along with clinical risk factors, facilitates noninvasive assessment of the likelihood of TD positivity following neoadjuvant therapy, and has the power to predict 5-year RFS in patients with resectable rectal cancer.
{"title":"Utilizing baseline multiregional MRI radiomics for prediction of tumor deposition and prognosis following neoadjuvant therapy in resectable rectal cancer","authors":"Bingjie Wu , Lingwei Wang , Yang Wang , Fan Liu , Xujie Gao , Wenpeng Wang , Bohan Xiao , Ying Liu","doi":"10.1016/j.ejrad.2026.112676","DOIUrl":"10.1016/j.ejrad.2026.112676","url":null,"abstract":"<div><h3>Objective</h3><div>To investigate whether pre-treatment T2WI-based multiregional radiomics can predict the probability of post-treatment tumor deposit (TD) and prognostic outcomes in patients with resectable rectal cancer after neoadjuvant therapy.</div></div><div><h3>Materials and methods</h3><div>This retrospective study included 159 patients with pathologically confirmed rectal cancer who received neoadjuvant therapy and then underwent surgery from March 2013 to March 2024. Radiomics features were extracted from the intratumoral region, a 3-mm-region straddling the tumor margin, and peritumoral 3 mm region on pre-treatment T2WI images. Clinical-radiomics nomogram was developed based on the most predictive radiomics signatures and clinical risk factors. Prognostic model for 5-year recurrence-free survival (RFS) was constructed by Cox regression analysis.</div></div><div><h3>Results</h3><div>The nomogram integrating clinical risk factors (Tumor distance to anal margin and MRI-reported extramural vascular invasion (EMVI)) with an intra-straddle 3 mm radiomics signature score (radscore) demonstrated optimal predictive performance with area under the receiver operating characteristic curve (AUC) of 0.953 (95% CI: 0.877–0.988), 0.810 (95% CI: 0.629–0.928) and 0.952 (95% CI: 0.857–0.992) in the training cohort, validation cohort and test cohort, respectively. The prognostic model constructed by intra-straddle 3 mm radscore (hazard ratio [HR] = 3.60, 95% CI: 1.59–8.16) and MRI-reported EMVI (HR = 6.07, 95% CI: 2.51–14.63) showed good performance for predicting 5‑year RFS with AUC of 0.827 (95% CI: 0.772–0.890) in the entire cohort.</div></div><div><h3>Conclusion</h3><div>The nomogram, incorporating pre-treatment MRI-based intra-straddle 3 mm radscore along with clinical risk factors, facilitates noninvasive assessment of the likelihood of TD positivity following neoadjuvant therapy, and has the power to predict 5-year RFS in patients with resectable rectal cancer.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"196 ","pages":"Article 112676"},"PeriodicalIF":3.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146074619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.ejrad.2026.112678
Po-Hsuan Hsieh , Ya-Fang Chen , Ta-Fu Chen , Wen-Chau Wu , for the Alzheimer’s Disease Neuroimaging Initiative
Background
The complex brain changes involved in Alzheimer’s disease (AD) development constitute a high-dimensional nonlinear feature space where deep learning (DL) classification/diagnosis may be advantageous over classical non-learning methods. However, the practicality of DL remains under debate among healthcare professionals, largely because many models are computationally expensive and operate without explicit interpretability. This study aimed to construct a lightweight DL model to disclose the association between cognitive status and structural brain changes in AD.
Methods
By using the data obtained from the Alzheimer’s Disease Neuroimaging Initiative database, 418 AD patients and 418 age-matched cognitively normal (CN) subjects were included for DL model construction based on their T1-weighted magnetic resonance images at baseline visit. A lightweight design was achieved by incorporating group convolution, global pooling, and efficient channel attention.
Results
The accuracy rate of our model was 90.6 %, competitive with previous models built with up-to-ten times more parameters. The occlusion maps showed that the medial temporal area and thalamus accounted the most for our model’s differentiation between AD and CN, in line with current knowledge of the pathological trajectory. Hierarchical regression further revealed that the logit of the DL model output explained a significant amount of variance in the mini mental state examination score, above and beyond the clinical indices including age, sex, and education duration (R2 change = 0.341, F(1, 91) = 57.623, p < 0.001).
Conclusions
Lightweight DL can be clinically practicable for AD diagnosis by focusing on pathologically interpretable structural changes and offering image-based assessment of cognitive status.
{"title":"Association between cognitive status and structural brain changes in Alzheimer’s disease: Clinical implication of lightweight deep learning-aided diagnosis","authors":"Po-Hsuan Hsieh , Ya-Fang Chen , Ta-Fu Chen , Wen-Chau Wu , for the Alzheimer’s Disease Neuroimaging Initiative","doi":"10.1016/j.ejrad.2026.112678","DOIUrl":"10.1016/j.ejrad.2026.112678","url":null,"abstract":"<div><h3>Background</h3><div>The complex brain changes involved in Alzheimer’s disease (AD) development constitute a high-dimensional nonlinear feature space where deep learning (DL) classification/diagnosis may be advantageous over classical non-learning methods. However, the practicality of DL remains under debate among healthcare professionals, largely because many models are computationally expensive and operate without explicit interpretability. This study aimed to construct a lightweight DL model to disclose the association between cognitive status and structural brain changes in AD.</div></div><div><h3>Methods</h3><div>By using the data obtained from the Alzheimer’s Disease Neuroimaging Initiative database, 418 AD patients and 418 age-matched cognitively normal (CN) subjects were included for DL model construction based on their T1-weighted magnetic resonance images at baseline visit. A lightweight design was achieved by incorporating group convolution, global pooling, and efficient channel attention.</div></div><div><h3>Results</h3><div>The accuracy rate of our model was 90.6 %, competitive with previous models built with up-to-ten times more parameters. The occlusion maps showed that the medial temporal area and thalamus accounted the most for our model’s differentiation between AD and CN, in line with current knowledge of the pathological trajectory. Hierarchical regression further revealed that the logit of the DL model output explained a significant amount of variance in the mini mental state examination score, above and beyond the clinical indices including age, sex, and education duration (<em>R</em><sup>2</sup> change = 0.341, <em>F</em>(1, 91) = 57.623, <em>p</em> < 0.001).</div></div><div><h3>Conclusions</h3><div>Lightweight DL can be clinically practicable for AD diagnosis by focusing on pathologically interpretable structural changes and offering image-based assessment of cognitive status.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"196 ","pages":"Article 112678"},"PeriodicalIF":3.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146009423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.ejrad.2026.112674
Joy-Marie Kleiß , Sebastian Arndt , Lisa Sommerfeld , Maximilian Schmidt MD , Florian Putz , Teresa Graetz , Leonard Stepansky , Kaan Türkan , Simon Mayr , Michael Uder , Matthias S May
Rationale and Objectives
This study evaluates the accuracy of the nn-UNet TotalSegmentator (TS) by Wasserthal et al. (2023) in segmenting atypical livers with pathologies and variants in CT scans.
Materials and Methods
CT scans were retrospectively collected from our RIS and divided into two cohorts: a reference group (67 healthy livers) and a study group (55 scans across eleven pathology and variant subgroups). TS performed automatic segmentation for all groups. For reference, the images were then manually segmented, with corrections reviewed by two radiologists. Accuracy was assessed using Dice similarity score, Hausdorff distance (HD), mean surface distance (MSD), volume difference, and clinical ratings.
Results
Automatic segmentation underestimated liver volume by a mean of 48.11 ml (3.1%) in the reference group and overestimated it in 84% of study group cases by 79.09 ml (4%).
The average Dice score was 0.980 ± 0.007 for the reference group and 0.933 ± 0.113 for the study group. Hepatomegaly achieved the highest score (0.979 ± 0.006), Polycystic liver disease (PLD) the lowest (0.656 ± 0.230). Cirrhosis with Ascites, Beavertail, and PLD had significantly lower Dice scores than the reference group. Clinical ratings were often lower than Dice scores suggested, especially in Beavertail, Cirrhosis with Ascites, Ablation defects, Metastases, and Hemihepatectomy.
Conclusion
TS performs excellently on healthy and well on most pathological livers. Despite high Dice scores in many pathological cases, clinical ratings reveal limitations. Clinical evaluation remains essential. Inclusion of PLD and Beavertail cases in training data may reduce bias and improve performance.
{"title":"Performance analysis of liver segmentation using nn-UNet TotalSegmentator: Focus on atypical livers, pathologies, and variants","authors":"Joy-Marie Kleiß , Sebastian Arndt , Lisa Sommerfeld , Maximilian Schmidt MD , Florian Putz , Teresa Graetz , Leonard Stepansky , Kaan Türkan , Simon Mayr , Michael Uder , Matthias S May","doi":"10.1016/j.ejrad.2026.112674","DOIUrl":"10.1016/j.ejrad.2026.112674","url":null,"abstract":"<div><h3>Rationale and Objectives</h3><div>This study evaluates the accuracy of the nn-UNet TotalSegmentator (TS) by Wasserthal et al. (2023) in segmenting atypical livers with pathologies and variants in CT scans.</div></div><div><h3>Materials and Methods</h3><div>CT scans were retrospectively collected from our RIS and divided into two cohorts: a reference group (67 healthy livers) and a study group (55 scans across eleven pathology and variant subgroups). TS performed automatic segmentation for all groups. For reference, the images were then manually segmented, with corrections reviewed by two radiologists. Accuracy was assessed using Dice similarity score, Hausdorff distance (HD), mean surface distance (MSD), volume difference, and clinical ratings.</div></div><div><h3>Results</h3><div>Automatic segmentation underestimated liver volume by a mean of 48.11 ml (3.1%) in the reference group and overestimated it in 84% of study group cases by 79.09 ml (4%).</div><div>The average Dice score was 0.980 ± 0.007 for the reference group and 0.933 ± 0.113 for the study group. Hepatomegaly achieved the highest score (0.979 ± 0.006), Polycystic liver disease (PLD) the lowest (0.656 ± 0.230). Cirrhosis with Ascites, Beavertail, and PLD had significantly lower Dice scores than the reference group. Clinical ratings were often lower than Dice scores suggested, especially in Beavertail, Cirrhosis with Ascites, Ablation defects, Metastases, and Hemihepatectomy.</div></div><div><h3>Conclusion</h3><div>TS performs excellently on healthy and well on most pathological livers. Despite high Dice scores in many pathological cases, clinical ratings reveal limitations. Clinical evaluation remains essential. Inclusion of PLD and Beavertail cases in training data may reduce bias and improve performance.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"196 ","pages":"Article 112674"},"PeriodicalIF":3.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146035336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}