Background
Blurriness in whole slide images (WSIs) is a common issue in digital pathology. Whereas severe blurriness is known to degrade artificial intelligence (AI) model performance, the impact of typical levels of blurriness observed in real-world settings remains unclear.
Objectives
To evaluate the effect of WSI blurring on robustness of AI predictions in real-world settings.
Methods
A retrospective study was conducted using 7529 WSIs and the corresponding AI predictions from 4 AI models trained on data from 2 scanners and 2 organs. The WSIs were categorized into concordant and discordant groups based on the AI prediction accuracy. Analyses included: (1) comparing blur metrics between groups, (2) determining the odds ratio between the proportions of blurry patch in WSIs and prediction concordance, (3) assessing model performance across various blur intensities, and (4) examining the similarity of slide- and patch-level embeddings across focal planes using Z-stacks.
Results
Regarding each organ–scanner pair, the average wavelet score and Laplacian variance did not show statistically significant differences between the two groups and no significant association was observed between prediction concordance and the proportion of blurry regions (p > 0.05, except one pair). Model performance remained robust even at a high blur level (radius = 1), where the patch image had a Laplacian variance of 133.14 and a wavelet score of 1667.98, corresponding to the top 8.6% and 12.15% of blurriness, respectively, in our dataset. In addition, embedding analysis across focal planes using Z-stacks revealed that both patch- and slide-level representations were preserved up to ±3 μm. Slide-level embeddings consistently exhibited cosine similarity values above 0.99.
Conclusions
These findings empirically suggest that the typical levels of WSI blurriness encountered in clinical practice may not significantly compromise the robustness of slide-level AI classification.
扫码关注我们
求助内容:
应助结果提醒方式:
