This study aims to develop a 2.5D deep learning framework leveraging non-contrast CT scans for early prediction of hepatic encephalopathy (HE) in hepatitis B-related acute-on-chronic liver failure (ACLF) patients. This retrospective study enrolled 228 ACLF patients meeting APASL criteria from two centers. Participants were divided into training (n = 102), internal validation (n = 44), and external testing (n = 82) cohorts. Non-contrast CT scans (5 mm slices) from six scanner models were preprocessed to 1 × 1 × 1 mm³ isotropic resolution with windowing (30-110 HU). Liver ROIs were manually segmented by two radiologists. The image center on the maximal cross-sectional slice and its adjacent slices (±1/2/4) were extracted to form 2.5D inputs. Deep learning models (DenseNet121, DenseNet201, ResNet50, InceptionV3) were employed for feature extraction. Multi-instance learning methods, including probability likelihood histograms and bag-of-words, were used for feature fusion. Machine learning classifiers (Logistic Regression, RandomForest, LightGBM) with 5-fold cross validation were built for HE prediction. DenseNet121 demonstrated the best slice-level prediction performance (validation AUC: 0.698). The LightGBM classifier with MIL fusion achieved AUCs of 0.969 (training), 0.886 (validation), and 0.829 (external testing), outperforming other fusion methods. Grad-CAM visualizations confirmed model attention to peri-portal fibrotic regions, demonstrating anatomical relevance. The MIL-based 2.5D deep learning model effectively predicts HE risk using routine non-contrast CT in ACLF patients, providing a non-invasive method for individualized risk assessment.
{"title":"From Liver to Brain: A 2.5D Deep Learning Model for Predicting Hepatic Encephalopathy Using Opportunistic Non-contrast CT in Hepatitis B Related Acute-on-Chronic Liver Failure Patients.","authors":"Zonglin Liu, Xueyun Zhang, Ying Chen, Qi Zhang, Zhenxuan Ma, Yue Wu, Yuxin Huang, Yajie Li, Xi Zhao, Wenchao Gu, Jiaxing Wu, Ying Tao, Yuxin Shi, Zhenwei Yao, Yan Ren, Yuxian Huang, Shiman Wu","doi":"10.1007/s10278-025-01802-1","DOIUrl":"https://doi.org/10.1007/s10278-025-01802-1","url":null,"abstract":"<p><p>This study aims to develop a 2.5D deep learning framework leveraging non-contrast CT scans for early prediction of hepatic encephalopathy (HE) in hepatitis B-related acute-on-chronic liver failure (ACLF) patients. This retrospective study enrolled 228 ACLF patients meeting APASL criteria from two centers. Participants were divided into training (n = 102), internal validation (n = 44), and external testing (n = 82) cohorts. Non-contrast CT scans (5 mm slices) from six scanner models were preprocessed to 1 × 1 × 1 mm³ isotropic resolution with windowing (30-110 HU). Liver ROIs were manually segmented by two radiologists. The image center on the maximal cross-sectional slice and its adjacent slices (±1/2/4) were extracted to form 2.5D inputs. Deep learning models (DenseNet121, DenseNet201, ResNet50, InceptionV3) were employed for feature extraction. Multi-instance learning methods, including probability likelihood histograms and bag-of-words, were used for feature fusion. Machine learning classifiers (Logistic Regression, RandomForest, LightGBM) with 5-fold cross validation were built for HE prediction. DenseNet121 demonstrated the best slice-level prediction performance (validation AUC: 0.698). The LightGBM classifier with MIL fusion achieved AUCs of 0.969 (training), 0.886 (validation), and 0.829 (external testing), outperforming other fusion methods. Grad-CAM visualizations confirmed model attention to peri-portal fibrotic regions, demonstrating anatomical relevance. The MIL-based 2.5D deep learning model effectively predicts HE risk using routine non-contrast CT in ACLF patients, providing a non-invasive method for individualized risk assessment.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145907323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05DOI: 10.1007/s10278-025-01816-9
Weichao Pan
Deep learning models for medical image classification often exhibit overconfident predictions and domain mismatch when transferred from natural image pretraining, which undermines their generalization and clinical reliability. This study proposes TET Loss (Temperature-Entropy calibrated Transfer Loss Function), a plug-and-play objective function that combines temperature scaling to moderate logit sharpness with entropy regularization to promote uncertainty-aware learning. TET Loss is model-agnostic and introduces zero inference-time overhead. Across four public benchmarks (BreastMNIST, DermaMNIST, PneumoniaMNIST, and RetinaMNIST), TET Loss consistently enhances CNNs, transformers, and hybrid backbones under short 10-epoch fine-tuning. For example, EfficientViT-M2 improves its F1 score from 53.9 to 66.7% on BreastMNIST, and BiFormer-Tiny increases its F1 from 73.1 to 86.1% with an AUC gain to 94.1%. On PneumoniaMNIST, RMT-T3 with TET Loss reaches an F1 of 96.4% and an AUC of 99.1%, surpassing several medical-specific architectures trained for 50-150 epochs. Grad-CAM visualizations demonstrate tighter lesion localization and fewer spurious activations, reflecting improved interpretability. By calibrating confidence while preserving discriminative learning, TET Loss provides a lightweight and effective pathway toward more reliable and robust medical imaging systems. Our code will be available at https://github.com/JEFfersusu/TET_loss .
{"title":"TET Loss: A Temperature-Entropy Calibrated Transfer Loss for Reliable Medical Image Classification.","authors":"Weichao Pan","doi":"10.1007/s10278-025-01816-9","DOIUrl":"https://doi.org/10.1007/s10278-025-01816-9","url":null,"abstract":"<p><p>Deep learning models for medical image classification often exhibit overconfident predictions and domain mismatch when transferred from natural image pretraining, which undermines their generalization and clinical reliability. This study proposes TET Loss (Temperature-Entropy calibrated Transfer Loss Function), a plug-and-play objective function that combines temperature scaling to moderate logit sharpness with entropy regularization to promote uncertainty-aware learning. TET Loss is model-agnostic and introduces zero inference-time overhead. Across four public benchmarks (BreastMNIST, DermaMNIST, PneumoniaMNIST, and RetinaMNIST), TET Loss consistently enhances CNNs, transformers, and hybrid backbones under short 10-epoch fine-tuning. For example, EfficientViT-M2 improves its F1 score from 53.9 to 66.7% on BreastMNIST, and BiFormer-Tiny increases its F1 from 73.1 to 86.1% with an AUC gain to 94.1%. On PneumoniaMNIST, RMT-T3 with TET Loss reaches an F1 of 96.4% and an AUC of 99.1%, surpassing several medical-specific architectures trained for 50-150 epochs. Grad-CAM visualizations demonstrate tighter lesion localization and fewer spurious activations, reflecting improved interpretability. By calibrating confidence while preserving discriminative learning, TET Loss provides a lightweight and effective pathway toward more reliable and robust medical imaging systems. Our code will be available at https://github.com/JEFfersusu/TET_loss .</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145907447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1007/s10278-025-01786-y
Sung Ui Shin, Mijung Jang, Bo La Yun, Su Min Cho, Ji Eun Park, Juyeon Lee, Hye Shin Ahn, Bohyoung Kim, Sun Mi Kim
To assess the diagnostic performance and clinical usefulness of deep learning-based computer-aided detection (AI-CAD) for automated breast ultrasound (ABUS) across radiologists with varying ABUS experience. This retrospective study included 114 women (228 breasts) who underwent ABUS in 2019. Three radiologists interpreted images with and without AI-CAD. We evaluated sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and area under the curve (AUC), reading time and interobserver agreement in Breast Imaging Reporting & Data System (BI-RADS) categorization and biopsy recommendations. Among 114 women (50.9 ± 10.8 years), 28 were diagnosed with breast cancer. The following performance metrics improved significant with AI-CAD: Reader 1 (least experienced of ABUS; 2 years of ABUS experience), AUC, 0.837 to 0.947 (p = 0.009), and NPV, 95.8% to 98.4% (p = 0.022); Reader 2 (7 years of experience), PPV, 50.0% to 59.5% (p = 0.042); Reader 3 (8 years of experience), PPV, 55.6% to 66.7% (p = 0.034). Reader 1 with AI-CAD achieved a performance comparable or higher than those of more experienced readers without AI. Specifically: compared with Reader 2, specificity (93.5% vs. 88.0%), PPV (65.8% vs. 50.0%), and accuracy (93.0% vs. 87.7%) were higher. Although Reader 3 originally demonstrated higher NPV (98.4% vs. 95.8%) and AUC (0.954 vs. 0.837) without CAD, these differences were no longer significant when Reader 1 used AI-CAD. Across all readers, AI-CAD reduced the mean reading time by an average of 25 s (p < 0.001). Inter-observer agreement after AI-CAD use (BI-RADS κ: 0.279 → 0.363; biopsy recommendation κ: 0.666 → 0.736) showed no statistically significant difference. AI-CAD enhanced diagnostic performance and reading efficiency in ABUS interpretation, demonstrating the most pronounced improvement for the less experienced reader.
{"title":"Effectiveness of AI-CAD Software for Breast Cancer Detection in Automated Breast Ultrasound.","authors":"Sung Ui Shin, Mijung Jang, Bo La Yun, Su Min Cho, Ji Eun Park, Juyeon Lee, Hye Shin Ahn, Bohyoung Kim, Sun Mi Kim","doi":"10.1007/s10278-025-01786-y","DOIUrl":"https://doi.org/10.1007/s10278-025-01786-y","url":null,"abstract":"<p><p>To assess the diagnostic performance and clinical usefulness of deep learning-based computer-aided detection (AI-CAD) for automated breast ultrasound (ABUS) across radiologists with varying ABUS experience. This retrospective study included 114 women (228 breasts) who underwent ABUS in 2019. Three radiologists interpreted images with and without AI-CAD. We evaluated sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and area under the curve (AUC), reading time and interobserver agreement in Breast Imaging Reporting & Data System (BI-RADS) categorization and biopsy recommendations. Among 114 women (50.9 ± 10.8 years), 28 were diagnosed with breast cancer. The following performance metrics improved significant with AI-CAD: Reader 1 (least experienced of ABUS; 2 years of ABUS experience), AUC, 0.837 to 0.947 (p = 0.009), and NPV, 95.8% to 98.4% (p = 0.022); Reader 2 (7 years of experience), PPV, 50.0% to 59.5% (p = 0.042); Reader 3 (8 years of experience), PPV, 55.6% to 66.7% (p = 0.034). Reader 1 with AI-CAD achieved a performance comparable or higher than those of more experienced readers without AI. Specifically: compared with Reader 2, specificity (93.5% vs. 88.0%), PPV (65.8% vs. 50.0%), and accuracy (93.0% vs. 87.7%) were higher. Although Reader 3 originally demonstrated higher NPV (98.4% vs. 95.8%) and AUC (0.954 vs. 0.837) without CAD, these differences were no longer significant when Reader 1 used AI-CAD. Across all readers, AI-CAD reduced the mean reading time by an average of 25 s (p < 0.001). Inter-observer agreement after AI-CAD use (BI-RADS κ: 0.279 → 0.363; biopsy recommendation κ: 0.666 → 0.736) showed no statistically significant difference. AI-CAD enhanced diagnostic performance and reading efficiency in ABUS interpretation, demonstrating the most pronounced improvement for the less experienced reader.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1007/s10278-025-01769-z
Huseyin Simsek, Abdulsamet Aktas, Hamza Osman Ilhan, Nagihan Kara Simsek, Yasin Yasa, Esra Ozcelik, Ayse Betul Oktay
This study presents a two-phase methodology for estimating chronological age in children using panoramic dental images and deep learning-based feature extraction. The dataset comprised 626 panoramic radiographs from children aged 6.0 to 13.8 years (320 males, 306 females; mean age = 9.88 years). Two expert dentists annotated each radiograph according to the Demirjian stages of seven mandibular teeth. In the first phase, three architectures-ResNet-18, EfficientNetV2-M, and Swin V2 Base-were trained separately for males and females to extract high-dimensional feature representations. Images were preprocessed via intensity quantization, histogram equalization, segmentation, and resizing to standardized 224 × 224 pixel inputs. From the fully connected layer of the Swin V2 Base model, 512 features were extracted for each tooth, and the concatenation of seven teeth yielded a 3584-dimensional feature vector per subject. These feature vectors were then used for regression analysis to predict chronological age on a day-level scale. In the second phase, nine machine learning regression models-LightGBM, RandomForest, ExtraTrees, GradientBoosting, XGBoost, KNN, SVR, MLP, and Gaussian Process Regression-were trained using the extracted features. Pairwise t-test analysis revealed ExtraTrees as the most statistically significant model. For this model, RMSE and MAE were 6.98 and 5.18 months for females, and 6.55 and 5.01 months for males. SHAP-based analysis highlighted the second molar (M2) and first premolar (P1) as the most influential features for females, and the first premolar (P1) and second molar (M2) for males. This automated pipeline enhances age prediction accuracy, reduces observer variability, and provides a reliable tool for clinical and forensic dental age estimation. Future work will explore dataset expansion, multimodal integration, and refined model architectures.
{"title":"Improving Chronological Age Estimation in Children Using the Demirjian Method Enhanced with Transformer and Regression Models.","authors":"Huseyin Simsek, Abdulsamet Aktas, Hamza Osman Ilhan, Nagihan Kara Simsek, Yasin Yasa, Esra Ozcelik, Ayse Betul Oktay","doi":"10.1007/s10278-025-01769-z","DOIUrl":"https://doi.org/10.1007/s10278-025-01769-z","url":null,"abstract":"<p><p>This study presents a two-phase methodology for estimating chronological age in children using panoramic dental images and deep learning-based feature extraction. The dataset comprised 626 panoramic radiographs from children aged 6.0 to 13.8 years (320 males, 306 females; mean age = 9.88 years). Two expert dentists annotated each radiograph according to the Demirjian stages of seven mandibular teeth. In the first phase, three architectures-ResNet-18, EfficientNetV2-M, and Swin V2 Base-were trained separately for males and females to extract high-dimensional feature representations. Images were preprocessed via intensity quantization, histogram equalization, segmentation, and resizing to standardized 224 × 224 pixel inputs. From the fully connected layer of the Swin V2 Base model, 512 features were extracted for each tooth, and the concatenation of seven teeth yielded a 3584-dimensional feature vector per subject. These feature vectors were then used for regression analysis to predict chronological age on a day-level scale. In the second phase, nine machine learning regression models-LightGBM, RandomForest, ExtraTrees, GradientBoosting, XGBoost, KNN, SVR, MLP, and Gaussian Process Regression-were trained using the extracted features. Pairwise t-test analysis revealed ExtraTrees as the most statistically significant model. For this model, RMSE and MAE were 6.98 and 5.18 months for females, and 6.55 and 5.01 months for males. SHAP-based analysis highlighted the second molar (M2) and first premolar (P1) as the most influential features for females, and the first premolar (P1) and second molar (M2) for males. This automated pipeline enhances age prediction accuracy, reduces observer variability, and provides a reliable tool for clinical and forensic dental age estimation. Future work will explore dataset expansion, multimodal integration, and refined model architectures.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1007/s10278-025-01791-1
Fatih Gelir, Taymaz Akan, Owen T Carmichael, Md Shenuarin Bhuiyan, Steven A Conrad, John A Vanchiere, Christopher G Kevil, Mohammad Alfrad Nobel Bhuiyan
The automated analysis of medical images is crucial for early disease detection. In recent years, deep learning has become popular for medical image analysis. In this study, we employed color-based topological features with deep learning for pattern recognition. The data topology provides information about the image's shape and global features such as connectivity and holes. We used different color channels to identify changes in topological footprints by altering the image's color. We extracted topological, local binary pattern (LBP), and Gabor features and used machine learning and deep learning models for disease classification. The model's performance was tested using three open-source fundus image databases: the Asia Pacific Tele-ophthalmology Society (APTOS 2019) data, the Optic Retinal Image Database for Glaucoma Analysis (ORIGA), and the Automatic Detection Challenge on Age-Related Macular Degeneration (ICHALLENGE-AMD). We have found that topological features from different color models provide important information for disease diagnosis.
{"title":"Topological Feature Extraction from Multi-color Channels for Pattern Recognition: An Application to Fundus Image Analysis.","authors":"Fatih Gelir, Taymaz Akan, Owen T Carmichael, Md Shenuarin Bhuiyan, Steven A Conrad, John A Vanchiere, Christopher G Kevil, Mohammad Alfrad Nobel Bhuiyan","doi":"10.1007/s10278-025-01791-1","DOIUrl":"https://doi.org/10.1007/s10278-025-01791-1","url":null,"abstract":"<p><p>The automated analysis of medical images is crucial for early disease detection. In recent years, deep learning has become popular for medical image analysis. In this study, we employed color-based topological features with deep learning for pattern recognition. The data topology provides information about the image's shape and global features such as connectivity and holes. We used different color channels to identify changes in topological footprints by altering the image's color. We extracted topological, local binary pattern (LBP), and Gabor features and used machine learning and deep learning models for disease classification. The model's performance was tested using three open-source fundus image databases: the Asia Pacific Tele-ophthalmology Society (APTOS 2019) data, the Optic Retinal Image Database for Glaucoma Analysis (ORIGA), and the Automatic Detection Challenge on Age-Related Macular Degeneration (ICHALLENGE-AMD). We have found that topological features from different color models provide important information for disease diagnosis.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1007/s10278-025-01763-5
D Meghana, A Manimaran
Early dental caries detection is essential for timely diagnosis and treatment. However, current deep learning (DL) models exhibit inconsistent accuracy across different dental X-ray datasets, revealing limitations in their robustness and adaptability. To automate caries detection in intraoral periapical radiographs, this study presents a hybrid object detector that integrates a Swin-T transformer with a YOLOv8s backbone. The model was trained on 1887 radiographs collected from the Sibar Institute of Dental Sciences, Guntur. To detect dental caries in intricate intraoral structures, this work presents an improved feature extraction through its hierarchical attention mechanism that outperforms convolutional neural network (CNN)-based models in both spatial understanding and contextual awareness. We evaluated the method against single-stage YOLOv8 variants (n, s, m, l) and a representative two-stage detector (Faster R-CNN with ResNet-50-FPNv2) under a consistent protocol. The proposed YOLOv8s+Swin-T outperformed all baselines in precision, recall, F1-score, and mAP@0.5, achieving 0.97 for precision/recall/F1 and 0.99 for mAP@0.5. These results underscore the model's clinical applicability and robustness, providing a reliable tool for accurate caries detection and supporting routine AI-assisted diagnosis.
早期发现龋齿对及时诊断和治疗至关重要。然而,目前的深度学习(DL)模型在不同的牙科x射线数据集上表现出不一致的准确性,揭示了其鲁棒性和适应性的局限性。为了在口腔内根尖周x线片中自动检测龋齿,本研究提出了一种混合物体检测器,该检测器集成了swing - t变压器和YOLOv8s骨干。该模型是根据从Guntur的Sibar牙科科学研究所收集的1887张x光片进行训练的。为了检测复杂口腔内结构中的龋齿,本研究提出了一种改进的特征提取方法,该方法通过分层注意机制在空间理解和上下文感知方面优于基于卷积神经网络(CNN)的模型。我们在一致协议下对单级YOLOv8变体(n, s, m, l)和具有代表性的两级检测器(Faster R-CNN with ResNet-50-FPNv2)进行了评估。提出的YOLOv8s+ swwin - t在精度、召回率、F1得分和mAP@0.5方面优于所有基线,精度/召回率/F1达到0.97,mAP@0.5达到0.99。这些结果强调了该模型的临床适用性和鲁棒性,为准确检测龋齿和支持常规人工智能辅助诊断提供了可靠的工具。
{"title":"A Hybrid YOLOv8s+Swin-T Transformer Approach for Automated Caries Detection on Periapical Radiographs.","authors":"D Meghana, A Manimaran","doi":"10.1007/s10278-025-01763-5","DOIUrl":"https://doi.org/10.1007/s10278-025-01763-5","url":null,"abstract":"<p><p>Early dental caries detection is essential for timely diagnosis and treatment. However, current deep learning (DL) models exhibit inconsistent accuracy across different dental X-ray datasets, revealing limitations in their robustness and adaptability. To automate caries detection in intraoral periapical radiographs, this study presents a hybrid object detector that integrates a Swin-T transformer with a YOLOv8s backbone. The model was trained on 1887 radiographs collected from the Sibar Institute of Dental Sciences, Guntur. To detect dental caries in intricate intraoral structures, this work presents an improved feature extraction through its hierarchical attention mechanism that outperforms convolutional neural network (CNN)-based models in both spatial understanding and contextual awareness. We evaluated the method against single-stage YOLOv8 variants (n, s, m, l) and a representative two-stage detector (Faster R-CNN with ResNet-50-FPNv2) under a consistent protocol. The proposed YOLOv8s+Swin-T outperformed all baselines in precision, recall, F1-score, and mAP@0.5, achieving 0.97 for precision/recall/F1 and 0.99 for mAP@0.5. These results underscore the model's clinical applicability and robustness, providing a reliable tool for accurate caries detection and supporting routine AI-assisted diagnosis.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1007/s10278-025-01748-4
Md Enamul Hoq, Lawrence Tarbox, Donald Johann, Linda Larson-Prior, Fred Prior
Low‑dose CT (LDCT) screening reduces lung cancer mortality but yields high false‑positive findings, motivating practical AI that aligns with slice‑based clinical review. We evaluate a label‑efficient 2D pipeline that uses frozen RAD‑DINO embeddings from native‑resolution axial slices and a lightweight multilayer perceptron for patient‑level risk estimation via mean aggregation and isotonic calibration. Using the NLST CT arm with outcomes defined over a 0-24‑month window, we construct a fixed patient‑level split (one CT per patient; no cross‑split leakage) and perform 25 repeated imbalanced test draws (~ 6% prevalence) to approximate screening conditions. At screening prevalence, RAD‑DINO + MLP achieves PR‑AUC = 0.705 (calibrated; raw 0.554) and ROC‑AUC = 0.817 (raw; 0.736 calibrated), with improved probability reliability following calibration; operating points are selected on validation and reported on test. For secondary ablations only, a near‑balanced cohort (N = 1984) yields accuracy 0.966, precision 0.974, recall 0.973, F1 0.973, and ROC‑AUC 0.912. Beyond classification, retrieval with triplet‑fine‑tuned embeddings attains Precision@5 = 0.853. Interpretability analyses show that cancer cases sustain higher top‑k slice scores and that directional SHAP concentrates on a small subset of high‑probability slices; label‑colored t‑SNE provides qualitative views of embedding structure. Limitations include single‑cohort evaluation, lack of Lung‑RADS labels in public NLST, and a CXR → CT pretraining shift; future work will pursue external validation and CT‑native self‑supervised continuation. Overall, frozen 2D foundation embeddings provide a strong, transparent, and computationally practical starting point for LDCT screening workflows under realistic prevalence.
{"title":"Harnessing Native-Resolution 2D Embeddings for Lung Cancer Classification: A Feasibility Study with the RAD-DINO Self-supervised Foundation Model.","authors":"Md Enamul Hoq, Lawrence Tarbox, Donald Johann, Linda Larson-Prior, Fred Prior","doi":"10.1007/s10278-025-01748-4","DOIUrl":"https://doi.org/10.1007/s10278-025-01748-4","url":null,"abstract":"<p><p>Low‑dose CT (LDCT) screening reduces lung cancer mortality but yields high false‑positive findings, motivating practical AI that aligns with slice‑based clinical review. We evaluate a label‑efficient 2D pipeline that uses frozen RAD‑DINO embeddings from native‑resolution axial slices and a lightweight multilayer perceptron for patient‑level risk estimation via mean aggregation and isotonic calibration. Using the NLST CT arm with outcomes defined over a 0-24‑month window, we construct a fixed patient‑level split (one CT per patient; no cross‑split leakage) and perform 25 repeated imbalanced test draws (~ 6% prevalence) to approximate screening conditions. At screening prevalence, RAD‑DINO + MLP achieves PR‑AUC = 0.705 (calibrated; raw 0.554) and ROC‑AUC = 0.817 (raw; 0.736 calibrated), with improved probability reliability following calibration; operating points are selected on validation and reported on test. For secondary ablations only, a near‑balanced cohort (N = 1984) yields accuracy 0.966, precision 0.974, recall 0.973, F1 0.973, and ROC‑AUC 0.912. Beyond classification, retrieval with triplet‑fine‑tuned embeddings attains Precision@5 = 0.853. Interpretability analyses show that cancer cases sustain higher top‑k slice scores and that directional SHAP concentrates on a small subset of high‑probability slices; label‑colored t‑SNE provides qualitative views of embedding structure. Limitations include single‑cohort evaluation, lack of Lung‑RADS labels in public NLST, and a CXR → CT pretraining shift; future work will pursue external validation and CT‑native self‑supervised continuation. Overall, frozen 2D foundation embeddings provide a strong, transparent, and computationally practical starting point for LDCT screening workflows under realistic prevalence.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent breast cancer research has investigated shape-based attention guidance in Vision Transformer (ViT) models, focusing on anatomical structures and the heterogeneity surrounding tumors. However, few studies have clarified the optimal transformer encoder layer stage for applying attention guidance. Our study aimed to evaluate the effectiveness of shape-guidance strategies by varying the combinations of encoder layers that guide attention to breast structures and by comparing the proposed models with conventional models. For the shape-guidance strategy, we applied breast masks to the attention mechanism to emphasize spatial dependencies and enhance the learning of positional relationships within breast anatomy. We then compared the representative models-Masked Transformer models that demonstrated the best performance across layer combinations-with the conventional ResNet50, ViT, and SwinT V2. In our study, a total of 2,436 publicly available mammography images from the Chinese Mammography Database via The Cancer Imaging Archive were analyzed. Three-fold cross-validation was employed, with a patient-wise split of 70% for training and 30% for validation. Model performance on differentiating breast cancer from non-cancer images was assessed by the area under the receiver-operating characteristic curve (AUROC). The results showed that applying masks at the Shallow and Deep stages gave the highest AUROC for Masked ViT. The Masked ViT achieved an AUROC of 0.885 [95% confidence interval: 0.849-0.918], a sensitivity of 0.876, and a specificity of 0.802, outperforming all other conventional models. These results indicate that incorporating mask guidance into particular Transformer encoders promotes representation learning, highlighting their potential as decision-support tools in breast cancer diagnosis.
{"title":"Transformer-based Deep Learning Models with Shape Guidance for Predicting Breast Cancer in Mammography Images.","authors":"Kengo Takahashi, Yuwen Zeng, Zhang Zhang, Kei Ichiji, Takuma Usuzaki, Ryusei Inamori, Haoyang Liu, Noriyasu Homma","doi":"10.1007/s10278-025-01773-3","DOIUrl":"https://doi.org/10.1007/s10278-025-01773-3","url":null,"abstract":"<p><p>Recent breast cancer research has investigated shape-based attention guidance in Vision Transformer (ViT) models, focusing on anatomical structures and the heterogeneity surrounding tumors. However, few studies have clarified the optimal transformer encoder layer stage for applying attention guidance. Our study aimed to evaluate the effectiveness of shape-guidance strategies by varying the combinations of encoder layers that guide attention to breast structures and by comparing the proposed models with conventional models. For the shape-guidance strategy, we applied breast masks to the attention mechanism to emphasize spatial dependencies and enhance the learning of positional relationships within breast anatomy. We then compared the representative models-Masked Transformer models that demonstrated the best performance across layer combinations-with the conventional ResNet50, ViT, and SwinT V2. In our study, a total of 2,436 publicly available mammography images from the Chinese Mammography Database via The Cancer Imaging Archive were analyzed. Three-fold cross-validation was employed, with a patient-wise split of 70% for training and 30% for validation. Model performance on differentiating breast cancer from non-cancer images was assessed by the area under the receiver-operating characteristic curve (AUROC). The results showed that applying masks at the Shallow and Deep stages gave the highest AUROC for Masked ViT. The Masked ViT achieved an AUROC of 0.885 [95% confidence interval: 0.849-0.918], a sensitivity of 0.876, and a specificity of 0.802, outperforming all other conventional models. These results indicate that incorporating mask guidance into particular Transformer encoders promotes representation learning, highlighting their potential as decision-support tools in breast cancer diagnosis.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The need for dynamic and static acquisitions under stress and rest in myocardial perfusion positron emission tomography (PET) is burdensome, and the short half-life of N-13 ammonia places additional constraints on scanning protocols. This study investigated whether combining static PET-derived images with clinical parameters via machine learning predicts major adverse cardiac events (MACE) to expand the utility of ammonia PET without dynamic scanning. The cohort comprised 386 patients, and during a mean follow-up of 345 days, MACE occurred in 35 patients. We applied stratified fivefold cross-validation based on MACE prediction with balanced and random shuffles. A logistic regression model was trained using all the explanatory variables after removing highly collinear features. Based on the cumulative importance of MACE prediction, a model was developed using the minimum number of top-ranked features accounting for > 50% of total cumulative importance. The predictive performance of a simple threshold-based classification using myocardial flow reserve (MFR) < 2.0 was also evaluated for comparison. The model was trained on 308 cases using three features: age, dyslipidemia, and resting end-diastolic volume. When tested on an independent set of 78 cases with fivefold cross-validation, it achieved an accuracy of 0.74 ± 0.06, a sensitivity of 0.74 ± 0.23, and a specificity of 0.74 ± 0.07. The accuracy, sensitivity, and specificity of simple MFR < 2.0 prediction were 0.58 ± 0.05, 0.77 ± 0.13, and 0.56 ± 0.05, respectively. A multimodal machine learning approach potentially serves as a clinically useful alternative to dynamic PET scans.
{"title":"Multimodal Machine Learning Integrating N-13 Ammonia PET and Clinical Variables Predicts Major Adverse Cardiac Events.","authors":"Ryo Mikurino, Michinobu Nagao, Masateru Kawakubo, Atsushi Yamamoto, Risako Nakao, Yuka Matsuo, Akiko Sakai, Shuji Sakai","doi":"10.1007/s10278-025-01779-x","DOIUrl":"https://doi.org/10.1007/s10278-025-01779-x","url":null,"abstract":"<p><p>The need for dynamic and static acquisitions under stress and rest in myocardial perfusion positron emission tomography (PET) is burdensome, and the short half-life of N-13 ammonia places additional constraints on scanning protocols. This study investigated whether combining static PET-derived images with clinical parameters via machine learning predicts major adverse cardiac events (MACE) to expand the utility of ammonia PET without dynamic scanning. The cohort comprised 386 patients, and during a mean follow-up of 345 days, MACE occurred in 35 patients. We applied stratified fivefold cross-validation based on MACE prediction with balanced and random shuffles. A logistic regression model was trained using all the explanatory variables after removing highly collinear features. Based on the cumulative importance of MACE prediction, a model was developed using the minimum number of top-ranked features accounting for > 50% of total cumulative importance. The predictive performance of a simple threshold-based classification using myocardial flow reserve (MFR) < 2.0 was also evaluated for comparison. The model was trained on 308 cases using three features: age, dyslipidemia, and resting end-diastolic volume. When tested on an independent set of 78 cases with fivefold cross-validation, it achieved an accuracy of 0.74 ± 0.06, a sensitivity of 0.74 ± 0.23, and a specificity of 0.74 ± 0.07. The accuracy, sensitivity, and specificity of simple MFR < 2.0 prediction were 0.58 ± 0.05, 0.77 ± 0.13, and 0.56 ± 0.05, respectively. A multimodal machine learning approach potentially serves as a clinically useful alternative to dynamic PET scans.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1007/s10278-025-01794-y
Zari O'Connor, Austin Fullenkamp, Morgan P McBee
Ventriculoperitoneal (VP) shunts are a mainstay treatment for hydrocephalus and identifying the correct valve type is essential to determine its setting. The proliferation of valve models over the last few decades makes identification on radiographs more difficult and time consuming. We trained a deep learning model to detect and classify VP shunts on radiographs. We curated 2263 skull radiographs spanning 11 valve types and split the data 80/10/10 for training/validation/test. A YOLOv8-large model was fine-tuned with augmentation and hyperparameter search (300 epochs, batch 32, 640 × 640 input). The fine-tuned YOLOv8 model demonstrated robust performance on the test set, achieving a precision (P) of 0.949, recall (R) of 0.930, mean average precision (mAP50) of 0.951, mAP50-90 of 0.755, and max F1 score of 0.952 across all classes at a confidence threshold of 0.524. For individual valve types, mAP50 ranged from 0.841 to 0.995, mAP50-90 from 0.562 to 0.963, P from 0.8 to 1, and R from 0.667 to 1 demonstrating that the model overall performs well but does perform better for some valve types. The fine-tuned YOLOv8 model demonstrates high accuracy and generalization across a range of valve types, suggesting its potential for clinical application. Compared with prior purely classification approaches, detection explicitly localizes each valve, accommodates patients with multiple valves, and enables creation of downstream models to determine valve settings.
{"title":"Artificial Intelligence Detection and Classification of Ventriculoperitoneal Shunt Valves Utilizing Fine-Tuning of a Detection Model.","authors":"Zari O'Connor, Austin Fullenkamp, Morgan P McBee","doi":"10.1007/s10278-025-01794-y","DOIUrl":"https://doi.org/10.1007/s10278-025-01794-y","url":null,"abstract":"<p><p>Ventriculoperitoneal (VP) shunts are a mainstay treatment for hydrocephalus and identifying the correct valve type is essential to determine its setting. The proliferation of valve models over the last few decades makes identification on radiographs more difficult and time consuming. We trained a deep learning model to detect and classify VP shunts on radiographs. We curated 2263 skull radiographs spanning 11 valve types and split the data 80/10/10 for training/validation/test. A YOLOv8-large model was fine-tuned with augmentation and hyperparameter search (300 epochs, batch 32, 640 × 640 input). The fine-tuned YOLOv8 model demonstrated robust performance on the test set, achieving a precision (P) of 0.949, recall (R) of 0.930, mean average precision (mAP50) of 0.951, mAP50-90 of 0.755, and max F1 score of 0.952 across all classes at a confidence threshold of 0.524. For individual valve types, mAP50 ranged from 0.841 to 0.995, mAP50-90 from 0.562 to 0.963, P from 0.8 to 1, and R from 0.667 to 1 demonstrating that the model overall performs well but does perform better for some valve types. The fine-tuned YOLOv8 model demonstrates high accuracy and generalization across a range of valve types, suggesting its potential for clinical application. Compared with prior purely classification approaches, detection explicitly localizes each valve, accommodates patients with multiple valves, and enables creation of downstream models to determine valve settings.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}