Purpose: Accurate segmentation and precise delineation of colorectal polyp structures are crucial for early clinical diagnosis and treatment planning. However, existing polyp segmentation techniques face significant challenges due to the high variability in polyp size and morphology, as well as the frequent indistinctness of polyp-tissue structures.
Approach: To address these challenges, we propose a multiscale attention network with structure guidance (MAN-SG). The core of MAN-SG is a structure extraction module (SEM) designed to capture rich structural information from fine-grained early-stage encoder features. In addition, we introduce a cross-scale structure guided attention (CSGA) module that effectively fuses multiscale features under the guidance of the structural information provided by the SEM, thereby enabling more accurate delineation of polyp structures. MAN-SG is implemented and evaluated using two high-performance backbone networks: Res2Net-50 and PVTv2-B2.
Results: Extensive experiments were conducted on five benchmark datasets for polyp segmentation. The results demonstrate that MAN-SG consistently outperforms existing state-of-the-art methods across these datasets.
Conclusion: The proposed MAN-SG framework, which leverages structural guidance via SEM and CSGA modules, proves to be both highly effective and robust for the challenging task of colorectal polyp segmentation.
Purpose: Deep learning (DL) is rapidly advancing in computational pathology, offering high diagnostic accuracy but often functioning as a "black box" with limited interpretability. This lack of transparency hinders its clinical adoption, emphasizing the need for quantitative explainable artificial intelligence (QXAI) methods. We propose a QXAI approach to objectively and quantitatively elucidate the reasoning behind DL model decisions in hepatocellular carcinoma (HCC) pathological image analysis.
Approach: The proposed method utilizes clustering in the latent space of embeddings generated by a DL model to identify regions that contribute to the model's discrimination. Each cluster is then quantitatively characterized by morphometric features obtained through nuclear segmentation using HoverNet and key feature selection with LightGBM. Statistical analysis is performed to assess the importance of selected features, ensuring an interpretable relationship between morphological characteristics and classification outcomes. This approach enables the quantitative interpretation of which regions and features are critical for the model's decision-making, without sacrificing accuracy.
Results: Experiments on pathology images of hematoxylin-and-eosin-stained HCC tissue sections showed that the proposed method effectively identified key discriminatory regions and features, such as nuclear size, chromatin density, and shape irregularity. The clustering-based analysis provided structured insights into morphological patterns influencing classification, with explanations evaluated as clinically relevant and interpretable by a pathologist.
Conclusions: Our QXAI framework enhances the interpretability of DL-based pathology analysis by linking morphological features to classification decisions. This fosters trust in DL models and facilitates their clinical integration.
Purpose: Segmenting intraglomerular tissue and glomerular lesions traditionally depends on detailed morphological evaluations by expert nephropathologists, a labor-intensive process susceptible to interobserver variability. Our group previously developed the Glo-In-One toolkit for integrated glomerulus detection and segmentation. We leverage the Glo-In-One toolkit to version 2 (Glo-In-One-v2), which adds fine-grained segmentation capabilities. We curated 14 distinct labels spanning tissue regions, cells, and lesions across 23,529 annotated glomeruli from human and mouse histopathology data. To our knowledge, this dataset is among the largest of its kind to date.
Approach: We present a single dynamic-head deep learning architecture for segmenting 14 classes within partially labeled images from human and mouse kidney pathology. The model was trained on data derived from 368 annotated kidney whole-slide images with five key intraglomerular tissue types and nine glomerular lesion types.
Results: The glomerulus segmentation model achieved a decent performance compared with baselines and achieved a 76.5% average Dice similarity coefficient. In addition, transfer learning from rodent to human for the glomerular lesion segmentation model has enhanced the average segmentation accuracy across different types of lesions by more than 3%, as measured by Dice scores.
Conclusions: We introduce a convolutional neural network for multiclass segmentation of intraglomerular tissue and lesions. The Glo-In-One-v2 model and pretrained weight are publicly available at https://github.com/hrlblab/Glo-In-One_v2.
Purpose: To achieve the high sensitivity of digital breast tomosynthesis (DBT), a time-consuming reading is necessary. However, synthetic mammography (SM) images, equivalent to digital mammography (DM), can be generated from DBT images. SM is faster to read and might be sufficient in many cases. We investigate using artificial intelligence (AI) to stratify examinations into reading of either SM or DBT to minimize workload and maximize accuracy.
Approach: This is a retrospective study based on double-read paired DM and one-view DBT from the Malmö Breast Tomosynthesis Screening Trial. DBT examinations were analyzed with the cancer detection AI system ScreenPoint Transpara 1.7. For low-risk examinations, SM reading was simulated by assuming equality with DM reading. For high-risk examinations, the DBT reading results were used. Different combinations of single and double reading were studied.
Results: By double-reading the DBT of 30% (4452/14,772) of the cases with the highest risk, and single-reading SM for the rest, 122 cancers would be detected with the same reading workload as DM double reading. That is 28% (27/95) more cancers would be detected than with DM double reading, and in total, 96% (122/127) of the cancers detectable with full DBT double reading would be found.
Conclusions: In a DBT-based screening program, AI could be used to select high-risk cases where the reading of DBT is valuable, whereas SM is sufficient for low-risk cases. Substantially, more cancers could be detected compared with DM only, with only a limited increase in reading workload. Prospective studies are necessary.
Purpose: Breast density estimation is an important part of breast cancer risk assessment, as mammographic density is associated with risk. However, density assessed by multiple experts can be subject to high inter-observer variability, so automated methods are increasingly used. We investigate the inter-reader variability and risk prediction for expert assessors and a deep learning approach.
Approach: Screening data from a cohort of 1328 women, case-control matched, was used to compare between two expert readers and between a single reader and a deep learning model, Manchester artificial intelligence - visual analog scale (MAI-VAS). Bland-Altman analysis was used to assess the variability and matched concordance index to assess risk.
Results: Although the mean differences for the two experiments were alike, the limits of agreement between MAI-VAS and a single reader are substantially lower at +SD (standard deviation) 21 (95% CI: 19.65, 21.69) -SD 22 (95% CI: , ) than between two expert readers +SD 31 (95% CI: 32.08, 29.23) -SD 29 (95% CI: , ). In addition, breast cancer risk discrimination for the deep learning method and density readings from a single expert was similar, with a matched concordance of 0.628 (95% CI: 0.598, 0.658) and 0.624 (95% CI: 0.595, 0.654), respectively. The automatic method had a similar inter-view agreement to experts and maintained consistency across density quartiles.
Conclusions: The artificial intelligence breast density assessment tool MAI-VAS has a better inter-observer agreement with a randomly selected expert reader than that between two expert readers. Deep learning-based density methods provide consistent density scores without compromising on breast cancer risk discrimination.
Purpose: Predictive models for contrast-enhanced mammography often perform better at detecting and classifying enhancing masses than (non-enhancing) microcalcification clusters. We aim to investigate whether incorporating synthetic data with simulated microcalcification clusters during training can enhance model performance.
Approach: Microcalcification clusters were simulated in low-energy images of lesion-free breasts from 782 patients, considering local texture features. Enhancement was simulated in the corresponding recombined images. A deep learning (DL) model for lesion detection and classification was trained with varying ratios of synthetic and real (850 patients) data. In addition, a handcrafted radiomics classifier was trained using delineations and class labels from real data, and predictions from both models were ensembled. Validation was performed on internal (212 patients) and external (279 patients) real datasets.
Results: The DL model trained exclusively with synthetic data detected over 60% of malignant lesions. Adding synthetic data to smaller real training sets improved detection sensitivity for malignant lesions but decreased precision. Performance plateaued at a detection sensitivity of 0.80. The ensembled DL and radiomics models performed worse than the standalone DL model, decreasing the area under this receiver operating characteristic curve from 0.75 to 0.60 on the external validation set, likely due to falsely detected suspicious regions of interest.
Conclusions: Synthetic data can enhance DL model performance, provided model setup and data distribution are optimized. The possibility to detect malignant lesions without real data present in the training set confirms the utility of synthetic data. It can serve as a helpful tool, especially when real data are scarce, and it is most effective when complementing real data.
Purpose: Breast cancer risk depends on an accurate assessment of breast density due to lesion masking. Although governed by standardized guidelines, radiologist assessment of breast density is still highly variable. Automated breast density assessment tools leverage deep learning but are limited by model robustness and interpretability.
Approach: We assessed the robustness of a feature selection methodology (RFE-SHAP) for classifying breast density grades using tissue-specific radiomic features extracted from raw central projections of digital breast tomosynthesis screenings ( , ). RFE-SHAP leverages traditional and explainable AI methods to identify highly predictive and influential features. A simple logistic regression (LR) classifier was used to assess classification performance, and unsupervised clustering was employed to investigate the intrinsic separability of density grade classes.
Results: LR classifiers yielded cross-validated areas under the receiver operating characteristic (AUCs) per density grade of [ : , : , : , : ] and an AUC of for classifying patients as nondense or dense. In external validation, we observed per density grade AUCs of [ : 0.880, : 0.779, : 0.878, : 0.673] and nondense/dense AUC of 0.823. Unsupervised clustering highlighted the ability of these features to characterize different density grades.
Conclusions: Our RFE-SHAP feature selection methodology for classifying breast tissue density generalized well to validation datasets after accounting for natural class imbalance, and the identified radiomic features properly captured the progression of density grades. Our results potentiate future research into correlating selected radiomic features with clinical descriptors of breast tissue density.
Purpose: The purposes are to evaluate the change in mammographic density within individuals across screening rounds using automatic density software, to evaluate whether a change in breast density is associated with a future breast cancer diagnosis, and to provide insight into breast density evolution.
Approach: Mammographic breast density was analyzed in women screened in Malmö, Sweden, between 2010 and 2015 who had undergone at least two consecutive screening rounds months apart. The volumetric and area-based densities were measured with deep learning-based software and fully automated software, respectively. The change in volumetric breast density percentage (VBD%) between two consecutive screening examinations was determined. Multiple linear regression was used to investigate the association between VBD% change in percentage points and future breast cancer, as well as the initial VBD%, adjusting for age group and the time between examinations. Examinations with potential positioning issues were removed in a sensitivity analysis.
Results: In 26,056 included women, the mean VBD% decreased from 10.7% [95% confidence interval (CI) 10.6 to 10.8] to 10.3% (95% CI: 10.2 to 10.3) ( ) between the two examinations. The decline in VBD% was more pronounced in women with initially denser breasts (adjusted , ) and less pronounced in women with a future breast cancer diagnosis (adjusted , ).
Conclusions: The demonstrated density changes over time support the potential of using breast density change in risk assessment tools and provide insights for future risk-based screening.

