Xiaoyi Ji, Richard Salmon, Nita Mulliqi, Umair Khan, Yinxi Wang, Anders Blilie, Henrik Olsson, Bodil Ginnerup Pedersen, Karina Dalsgaard Sørensen, Benedicte Parm Ulhøi, Svein R Kjosavik, Emilius A M Janssen, Mattias Rantalainen, Lars Egevad, Pekka Ruusuvuori, Martin Eklund, Kimmo Kartasalo
{"title":"Physical Color Calibration of Digital Pathology Scanners for Robust Artificial Intelligence Assisted Cancer Diagnosis.","authors":"Xiaoyi Ji, Richard Salmon, Nita Mulliqi, Umair Khan, Yinxi Wang, Anders Blilie, Henrik Olsson, Bodil Ginnerup Pedersen, Karina Dalsgaard Sørensen, Benedicte Parm Ulhøi, Svein R Kjosavik, Emilius A M Janssen, Mattias Rantalainen, Lars Egevad, Pekka Ruusuvuori, Martin Eklund, Kimmo Kartasalo","doi":"10.1016/j.modpat.2025.100715","DOIUrl":null,"url":null,"abstract":"<p><p>The potential of artificial intelligence (AI) in digital pathology is limited by technical inconsistencies in the production of whole slide images (WSIs). This causes degraded AI performance and poses a challenge for widespread clinical application, as fine-tuning algorithms for each site is impractical. Changes in the imaging workflow can also compromise diagnostic accuracy and patient safety. Physical color calibration of scanners, relying on a biomaterial-based calibrant slide and a spectrophotometric reference measurement, has been proposed for standardizing WSI appearance, but its impact on AI performance has not been investigated. We evaluated whether physical color calibration can enable robust AI performance. We trained fully supervised and foundation model based AI systems for detecting and Gleason grading prostate cancer using WSIs of prostate biopsies from the STHLM3 clinical trial (n=3,651) and evaluated their performance in three external cohorts (n=1,161) with and without calibration. With physical color calibration, the fully supervised system's concordance with pathologists' grading (Cohen's linearly weighted kappa) improved from 0.439 to 0.619 in the Stavanger University Hospital cohort (n=860), from 0.354 to 0.738 in the Karolinska University Hospital cohort (n=229), and from 0.423 to 0.452 in the Aarhus University Hospital cohort (n=72). The foundation model's concordance improved from 0.739 to 0.760 (Karolinska), from 0.424 to 0.459 (Aarhus) and from 0.547 to 0.670 (Stavanger). The study demonstrates that physical color calibration provides a potential solution to the variation introduced by different scanners, making AI-based cancer diagnostics more reliable and applicable in diverse clinical settings.</p>","PeriodicalId":18706,"journal":{"name":"Modern Pathology","volume":" ","pages":"100715"},"PeriodicalIF":7.1000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Modern Pathology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.modpat.2025.100715","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The potential of artificial intelligence (AI) in digital pathology is limited by technical inconsistencies in the production of whole slide images (WSIs). This causes degraded AI performance and poses a challenge for widespread clinical application, as fine-tuning algorithms for each site is impractical. Changes in the imaging workflow can also compromise diagnostic accuracy and patient safety. Physical color calibration of scanners, relying on a biomaterial-based calibrant slide and a spectrophotometric reference measurement, has been proposed for standardizing WSI appearance, but its impact on AI performance has not been investigated. We evaluated whether physical color calibration can enable robust AI performance. We trained fully supervised and foundation model based AI systems for detecting and Gleason grading prostate cancer using WSIs of prostate biopsies from the STHLM3 clinical trial (n=3,651) and evaluated their performance in three external cohorts (n=1,161) with and without calibration. With physical color calibration, the fully supervised system's concordance with pathologists' grading (Cohen's linearly weighted kappa) improved from 0.439 to 0.619 in the Stavanger University Hospital cohort (n=860), from 0.354 to 0.738 in the Karolinska University Hospital cohort (n=229), and from 0.423 to 0.452 in the Aarhus University Hospital cohort (n=72). The foundation model's concordance improved from 0.739 to 0.760 (Karolinska), from 0.424 to 0.459 (Aarhus) and from 0.547 to 0.670 (Stavanger). The study demonstrates that physical color calibration provides a potential solution to the variation introduced by different scanners, making AI-based cancer diagnostics more reliable and applicable in diverse clinical settings.
期刊介绍:
Modern Pathology, an international journal under the ownership of The United States & Canadian Academy of Pathology (USCAP), serves as an authoritative platform for publishing top-tier clinical and translational research studies in pathology.
Original manuscripts are the primary focus of Modern Pathology, complemented by impactful editorials, reviews, and practice guidelines covering all facets of precision diagnostics in human pathology. The journal's scope includes advancements in molecular diagnostics and genomic classifications of diseases, breakthroughs in immune-oncology, computational science, applied bioinformatics, and digital pathology.