JMI Editor-in-Chief Bennett Landman offers guidance to help authors achieve higher impact, clearer assessment of contributions, and more useful and direct reviews from the peer review community.
JMI Editor-in-Chief Bennett Landman offers guidance to help authors achieve higher impact, clearer assessment of contributions, and more useful and direct reviews from the peer review community.
Purpose: Accurate cancer subtyping is essential for precision medicine but challenged by the computational demands of gigapixel whole-slide images (WSIs). Although transformer-based multiple instance learning (MIL) methods achieve strong performance, their quadratic complexity limits clinical deployment. We introduce LiteMIL, a computationally efficient cross-attention MIL, optimized for WSIs classification.
Approach: LiteMIL employs a single learnable query with multi-head cross-attention for bag-level aggregation from extracted features. We evaluated LiteMIL against five baselines (mean/max pooling, ABMIL, MAD-MIL, and TransMIL) on four TCGA datasets (breast: , kidney: , lung: , and TUPAC16: ) using nested cross-validation with patient-level splitting. Systematic ablation studies evaluated multi-query variants, attention heads, dropout rates, and architectural components.
Results: LiteMIL achieved competitive accuracy (average 83.5%), matching TransMIL, while offering substantial efficiency gains: fewer parameters (560K versus 2.67M), faster inference (1.6s versus 4.6s per fold), and lower Graphics Processing Unit (GPU) memory usage (1.15 GB versus 7.77 GB). LiteMIL excelled on lung (86.3% versus 85.0%), TUPAC16 (72% versus 71.4%), and matched kidney performance (89.9% versus 89.7%). Ablation studies revealed task-dependent multi-query performance benefits: versus improved morphologically heterogeneous tasks (breast/lung each, ) but degraded on grading tasks (TUPAC16: ), validating single-query optimality for focused attention scenarios.
Conclusions: LiteMIL provides a resource-efficient solution for WSI classification. The cross-attention architecture matches complex transformer performance while enabling deployment on consumer GPUs. Task-dependent design insights, single query for sparse discriminating features, multi-query for heterogeneous patterns, guide practical implementation. The architecture's efficiency, combined with compact features, makes LiteMIL suitable for clinical integration in settings with limited computational infrastructure.
Purpose: Centralized machine learning often struggles with limited data access and expert involvement. We investigate decentralized approaches that preserve data privacy while enabling collaborative model training for medical imaging tasks.
Approach: We explore asynchronous federated learning (FL) using the FL with buffered asynchronous aggregation (FedBuff) algorithm for classifying optical coherence tomography (OCT) retina images. Unlike synchronous algorithms such as FedAvg, which require all clients to participate simultaneously, FedBuff supports independent client updates. We compare its performance to both centralized models and FedAvg. In addition, we develop a browser-based proof-of-concept system using modern web technologies to assess the feasibility and limitations of interactive, collaborative learning in real-world settings.
Results: FedBuff performs well in binary OCT classification tasks but shows reduced accuracy in more complex, multiclass scenarios. FedAvg achieves results comparable to centralized training, consistent with previous findings. Although FedBuff underperforms compared with FedAvg and centralized models, it still delivers acceptable accuracy in less complex settings. The browser-based prototype demonstrates the potential for accessible, user-driven FL systems but also highlights technical limitations in current web standards, especially regarding local computation and communication efficiency.
Conclusion: Asynchronous FL via FedBuff offers a promising, privacy-preserving approach for medical image classification, particularly when synchronous participation is impractical. However, its scalability to complex classification tasks remains limited. Web-based implementations have the potential to broaden access to collaborative AI tools, but limitations of the current technologies need to be further investigated.
Purpose: There are multiple commercially available, Food and Drug Administration (FDA)-cleared, artificial intelligence (AI)-based tools automating stroke evaluation in noncontrast computed tomography (NCCT). This study assessed the impact of variations in reconstruction kernel and slice thickness on two outputs of such a system: hypodense volume and Alberta Stroke Program Early CT Score (ASPECTS).
Approach: The NCCT series image data of 67 patients imaged with a CT stroke protocol were reconstructed with four kernels (H10s-smooth, H40s-medium, H60s-sharp, and H70h-very sharp) and three slice thicknesses (1.5, 3.0, and 5.0 mm) to create 1 reference condition (H40s/5.0 mm) and 11 nonreference conditions. The 12 reconstructions per patient were processed with a commercially available FDA-cleared software package that yields total hypodense volume (mL) and ASPECTS. A mixed-effect model was used to test the difference in hypodense volume, and an ordered logistic model was used to test the difference in e-ASPECTS.
Results: Hypodense volume differences from the reference condition ranged from to 1.1 mL and were significant for all nonreference kernels (H10s , H60s , and H70h ) and for thinner slices (1.5 mm and 3.0 mm ). e-ASPECTS was invariant to the nonreference kernels and slice thicknesses, with a mean difference ranging from to 0.5. No significant differences were found for any kernel or slice thickness (all ).
Conclusions: Automated hypodense volume measured with a commercially available, FDA-cleared software package is substantially impacted by reconstruction kernel and slice thickness. Conversely, automated ASPECTS is invariant to these reconstruction parameters.
Purpose: The credibility of artificial intelligence (AI) models for medical imaging continues to be a challenge, affected by the diversity of models, the data used to train the models, and the applicability of their combination to produce reproducible results for new data. We aimed to explore whether emerging virtual imaging trial (VIT) methodologies can provide an objective resource to approach this challenge.
Approach: We conducted this study for the case example of COVID-19 diagnosis using clinical and virtual computed tomography (CT) and chest radiography (CXR) processed with convolutional neural networks. Multiple AI models were developed and tested using 3D ResNet-like and 2D EfficientNetv2 architectures across diverse datasets.
Results: Model performance was evaluated using the area under the curve (AUC) and the DeLong method for AUC confidence intervals. The models trained on the most diverse datasets showed the highest external testing performance, with AUC values ranging from 0.73 to 0.76 for CT and 0.70 to 0.73 for CXR. Internal testing yielded higher AUC values (0.77 to 0.85 for CT and 0.77 to 1.0 for CXR), highlighting a substantial drop in performance during external validation, which underscores the importance of diverse and comprehensive training and testing data. Most notably, the VIT approach provided an objective assessment of the utility of diverse models and datasets while offering insight into the influence of dataset characteristics, patient factors, and imaging physics on AI efficacy.
Conclusions: The VIT approach enhances model transparency and reliability, offering nuanced insights into the factors driving AI performance and bridging the gap between experimental and clinical settings.
Purpose: Cochlear implant (CI) surgery treats severe hearing loss by inserting an electrode array into the cochlea to stimulate the auditory nerve. An important step in this procedure is mastoidectomy, which removes part of the mastoid region of the temporal bone to provide surgical access. Accurate mastoidectomy shape prediction from preoperative imaging improves presurgical planning, reduces risks, and enhances surgical outcomes. Despite its importance, there are limited deep-learning-based studies regarding this topic due to the challenges of acquiring ground-truth labels. We address this gap by investigating self-supervised and weakly-supervised learning models to predict the mastoidectomy region without human annotations.
Approach: We propose a hybrid self-supervised and weakly-supervised learning framework to predict the mastoidectomy region directly from preoperative CT scans, where the mastoid remains intact. Our self-supervised learning approach reconstructs the postmastoidectomy 3D surface from preoperative imaging, aiming to align with the corresponding intraoperative microscope views for future surgical navigation-related applications. Postoperative CT scans are used in the self-supervised learning model to assist training procedures despite additional challenges such as metal artifacts and low signal-to-noise ratios introduced by them. To further improve the accuracy and robustness, we introduce a Mamba-based weakly-supervised model that refines mastoidectomy shape prediction by using 3D T-distribution loss function, inspired by the student- distribution. Weak supervision is achieved by leveraging segmentation results from the prior self-supervised framework, eliminating the manual data labeling process.
Results: Our hybrid method achieves a mean Dice score of 0.72 when predicting the complex and boundary-less mastoidectomy shape, surpassing state-of-the-art approaches and demonstrating strong performance. The method provides groundwork for constructing 3D postmastoidectomy surfaces directly from the corresponding preoperative CT scans.
Conclusion: To our knowledge, this is the first work that integrates self-supervised and weakly-supervised learning for mastoidectomy shape prediction, offering a robust and efficient solution for CI surgical planning while leveraging 3D T-distribution loss in weakly-supervised medical imaging.
Purpose: -staging, a critical component in cancer diagnostics, quantifies metastatic involvement of lymph nodes and plays an important role in guiding treatment decisions. Manual assessment of lymph nodes on PET/CT scans is time-consuming due to minimal contrast to surrounding tissue and strong heterogeneity of the lymph node's morphology. To streamline the -staging process, we propose a deep learning-based algorithm that localizes lymph node stations through atlas-to-patient registration, classifies mediastinal lymph node stations as malignant or benign, and subsequently performs automated -staging. Notably, our model is trained without any pixel-level annotations, i.e., using image-level classification labels only.
Approach: To address the challenge of training without annotations at the pixel level, we use prior knowledge of the lymph node station locations through atlas-to-patient registration and deduce pseudo-labels for lymph node station groups from the -stage to enable weakly supervised network training.
Results: The proposed algorithm achieves an accuracy of , a sensitivity of , and a specificity of for lymph node station classification, which is significantly better than the performance of the standard threshold-based approach used for lymph node assessment in radiological images and an algorithm for PET lesion segmentation that was trained with segmentation masks. For automatic -staging, the accuracy of is on par with an algorithm that was trained with segmentation masks.
Conclusions: The division of the problem setting into subtasks as well as the integration of prior knowledge enables better or comparable performance of models trained with and without segmentation masks.
Purpose: Accurate simulation of breast tissue deformation is essential for reliable image registration between 3D imaging modalities and 2D mammograms, where compression significantly alters tissue geometry. Although finite element analysis (FEA) provides high-fidelity modeling, it is computationally intensive and not well suited for rapid simulations. To address this, the physics-based graph neural network (PhysGNN) has been introduced as a computationally efficient approximation model trained on FEA-generated deformations. We extend prior work by evaluating the performance of PhysGNN on new digital breast phantoms and assessing the impact of training on multiple phantoms.
Approach: PhysGNN was trained on both single-phantom (per-geometry) and multiphantom (multigeometry) datasets generated from incremental FEA simulations. The digital breast phantoms represent the uncompressed state, serving as input geometries for predicting compressed configurations. A leave-one-deformation-out evaluation strategy was used to assess predictive performance under compression.
Results: Training on new digital phantoms confirmed the model's robust performance, though with some variability in prediction accuracy reflecting the diverse anatomical structures. Multiphantom training further enhanced this robustness and reduced prediction errors.
Conclusions: PhysGNN offers a computationally efficient alternative to FEA for simulating breast compression. The results showed that model performance remains robust when trained per-geometry, and further demonstrated that multigeometry training enhances predictive accuracy and robustness for the geometries included in the training set. This suggests a strong potential path toward developing reliable models for generating compressed breast volumes, which could facilitate image registration and algorithm development.
Purpose: Thermal ablation is a minimally invasive therapy used for the treatment of small renal cell carcinoma tumors. Treatment success is evaluated on postablation computed tomography (CT) to determine if the ablation zone covered the tumor with an adequate treatment margin (often 5 to 10 mm). Incorrect margin identification can lead to treatment misassessment, resulting in unnecessary additional ablation. Therefore, segmentation of the renal ablation zone (RAZ) is crucial for treatment evaluation. We aim to develop and assess an accurate deep learning workflow for delineating the RAZ from surrounding tissues in kidney CT images.
Approach: We present an advanced deep learning method using the attention-based U-Net architecture to segment the RAZ. The workflow leverages the strengths of U-Net, enhanced with attention mechanisms, to improve the network's focus on the most relevant parts of the images, resulting in an accurate segmentation.
Results: Our model was trained and evaluated on a dataset comprising 76 patients' annotated RAZs in CT images. Analysis demonstrated that the proposed workflow achieved an accuracy , precision , , , Jaccard , specificity , Hausdorff distance , and mean absolute boundary distance .
Conclusions: We used 3D CT images with RAZs and, for the first time, addressed deep-learning-based RAZ segmentation using parallel CT images. Our framework can effectively segment RAZs, allowing clinicians to automatically determine the ablation margin, making our tool ready for clinical use. Prediction time is per patient, enabling clinicians to perform quick reviews, especially in time-constrained settings.
Purpose: Accurate airway measurement is critical for bronchitis quantification with computed tomography (CT), yet optimal protocols and the added value of photon-counting CT (PCCT) over energy-integrating CT (EICT) for reducing bias remain unclear. We quantified biomarker accuracy across modalities and protocols and assessed strategies to reduce bias.
Approach: A virtual imaging trial with 20 bronchitis anthropomorphic models was scanned using a validated simulator for two systems (EICT: SOMATOM Flash; PCCT: NAEOTOM Alpha) at 6.3 and 12.6 mGy. Reconstructions varied algorithm, kernel sharpness, slice thickness, and pixel size. Pi10 (square-root wall thickness at 10-mm perimeter) and WA% (wall-area percentage) were compared against ground-truth airway dimensions obtained from the 0.1-mm-precision anatomical models prior to CT simulation. External validation used clinical PCCT ( ) and EICT ( ).
Results: Simulated airway dimensions agreed with pathological references ( ). PCCT had lower errors than EICT across segmented generations ( ). Under optimal parameters, PCCT improved Pi10 and WA% accuracy by 26.3% and 64.9%. Across the tested PCCT and EICT imaging protocols, improvements were associated with sharper kernels (25.8% Pi10, 33.0% WA%), thinner slices (23.9% Pi10, 49.8% WA%), smaller pixels (17.0% Pi10, 23.1% WA%), and higher dose ( ). Clinically, PCCT achieved higher maximum airway generation ( versus ) and lower variability, mirroring trends in virtual results.
Conclusions: PCCT improves the accuracy and consistency of airway biomarker quantification relative to EICT, particularly with optimized protocols. The validated virtual platform enables modality-bias assessment and protocol optimization for accurate, reproducible bronchitis measurements.

