Pub Date : 2025-12-05DOI: 10.1016/j.jpi.2025.100535
Lina Winter , Annalena Artinger , Hendrik Böck , Vignesh Ramakrishnan , Bruno Reible , Jan Albin , Peter J. Schüffler , Georgios Raptis , Christoph Brochhausen
Digital microscopy plays a crucial role in pathology education, providing scalable and standardized access to learning resources. In response, we present PATe 2.0, a scalable redeveloped web-application of the former PATe system from 2015. PATe 2.0 was developed using an agile, iterative process and built on a microservices architecture to ensure modularity, scalability, and reliability. It integrates a modern web-based user interface optimized for desktop and tablet use and automates key workflows such as whole-slide image uploads and processing. Performance tests demonstrated that PATe 2.0 significantly reduces tile request times compared to PATe, despite handling larger tiles. The platform supports open formats like DICOM and OpenSlide, enhancing its interoperability and adaptability across institutions. PATe 2.0 represents a robust digital microscopy solution in pathology education enhancing usability, performance, and flexibility. Its design enables future integration of research algorithms and highlights it as a pivotal tool for advancing pathology education and research.
{"title":"Developing a smart and scalable tool for histopathological education—PATe 2.0","authors":"Lina Winter , Annalena Artinger , Hendrik Böck , Vignesh Ramakrishnan , Bruno Reible , Jan Albin , Peter J. Schüffler , Georgios Raptis , Christoph Brochhausen","doi":"10.1016/j.jpi.2025.100535","DOIUrl":"10.1016/j.jpi.2025.100535","url":null,"abstract":"<div><div>Digital microscopy plays a crucial role in pathology education, providing scalable and standardized access to learning resources. In response, we present PATe 2.0, a scalable redeveloped web-application of the former PATe system from 2015. PATe 2.0 was developed using an agile, iterative process and built on a microservices architecture to ensure modularity, scalability, and reliability. It integrates a modern web-based user interface optimized for desktop and tablet use and automates key workflows such as whole-slide image uploads and processing. Performance tests demonstrated that PATe 2.0 significantly reduces tile request times compared to PATe, despite handling larger tiles. The platform supports open formats like DICOM and OpenSlide, enhancing its interoperability and adaptability across institutions. PATe 2.0 represents a robust digital microscopy solution in pathology education enhancing usability, performance, and flexibility. Its design enables future integration of research algorithms and highlights it as a pivotal tool for advancing pathology education and research.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"20 ","pages":"Article 100535"},"PeriodicalIF":0.0,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The aim of this study was to document technical errors encountered during validation of the Genius Digital Diagnostics System (GDDS).
Materials and methods
A total of 909 cases of archived ThinPrep Pap slides with follow-up biopsies were retrieved. Slides were cleaned, relabeled, and scanned with GDDS. Digital imager errors, including slide events and imager errors, were documented and evaluated.
Results
Of the 909 slides scanned, 21 (2.3 %) demonstrated slide events. For 5 cases, the slides had cell focus errors, 12 failed due to quality control (QC) errors, 2 had barcode issues, 1 showed an oversaturated frame, and 1 presented a problem because it was a duplicate. Some errors could be corrected, of which 8 cases with various diagnostic cytology interpretations were successfully rescanned. There were 13 (1.4%) cases that could not be scanned and thus were excluded from the study, predominantly because of focus QC errors due to scratched coverslips from long-term storage. There were 43 imager errors including failure of motor movement, cancellation of slide handling action, and failure to pick slides from the carrier station for which the scanning process had to be paused. Imager errors were solved by rebooting the system, correcting the positioning of the slide on the system, and technical help provided by the vendor.
Conclusion
Minor errors are to be expected when digitizing large volume of Pap slides. Total number of rescanned cases to address such technical problems were low in number and did not compromise the interpretation of Pap test slides using GDDS.
{"title":"Technical considerations during validation of the Genius® Digital Diagnostic System","authors":"Lakshmi Harinath , Sarah Harrington , Jonee Matsko , Amy Colaizzi , Esther Elishaev , Samer Khader , Rohit Bhargava , Chengquan Zhao , Liron Pantanowitz","doi":"10.1016/j.jpi.2025.100532","DOIUrl":"10.1016/j.jpi.2025.100532","url":null,"abstract":"<div><h3>Background</h3><div>The aim of this study was to document technical errors encountered during validation of the Genius Digital Diagnostics System (GDDS).</div></div><div><h3>Materials and methods</h3><div>A total of 909 cases of archived ThinPrep Pap slides with follow-up biopsies were retrieved. Slides were cleaned, relabeled, and scanned with GDDS. Digital imager errors, including slide events and imager errors, were documented and evaluated.</div></div><div><h3>Results</h3><div>Of the 909 slides scanned, 21 (2.3<!--> <!-->%) demonstrated slide events. For 5 cases, the slides had cell focus errors, 12 failed due to quality control (QC) errors, 2 had barcode issues, 1 showed an oversaturated frame, and 1 presented a problem because it was a duplicate. Some errors could be corrected, of which 8 cases with various diagnostic cytology interpretations were successfully rescanned. There were 13 (1.4%) cases that could not be scanned and thus were excluded from the study, predominantly because of focus QC errors due to scratched coverslips from long-term storage. There were 43 imager errors including failure of motor movement, cancellation of slide handling action, and failure to pick slides from the carrier station for which the scanning process had to be paused. Imager errors were solved by rebooting the system, correcting the positioning of the slide on the system, and technical help provided by the vendor.</div></div><div><h3>Conclusion</h3><div>Minor errors are to be expected when digitizing large volume of Pap slides. Total number of rescanned cases to address such technical problems were low in number and did not compromise the interpretation of Pap test slides using GDDS.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"20 ","pages":"Article 100532"},"PeriodicalIF":0.0,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.1016/j.jpi.2025.100531
Hala R. Makhlouf , Miguel R. Ossandon , Keyvan Farahani , Irina Lubensky , Lyndsay N. Harris
Digital pathology imaging (DPI) is a rapidly advancing field with increasing relevance to cancer diagnosis, research, and clinical trials through large-scale image analysis and artificial intelligence (AI) integration. Despite these advances, regulatory adoption in digital pathology (DP) has lagged; to date, only three AI/ML Software as a Medical Device tool have received FDA clearance, highlighting a validation dataset gap rather than an absence of regulatory pathways. On March 6–7, 2024, the National Cancer Institute held a virtual workshop titled “Digital Pathology Imaging-Artificial Intelligence in Cancer Research and Clinical Trials,” bringing together experts in pathology, radiology, oncology, data science, and regulatory fields to assess current challenges, practical solutions, and future directions. This report summarizes expert opinions on key issues related to the use of DPI in cancer research and clinical trials, including data standardization, de-identification, and the application of Digital Imaging and Communication in Medicine (DICOM) standards. Key topics included data standardization, image quality assurance, validation strategies, AI applications, integration in clinical trials, biobanking, intellectual property, investigators' needs, and lessons from digital cytology and radiology domains. Solutions discussed included adoption of open standards such as DICOM, centralized imaging portals, and scalable cloud-based platforms. The expert consensus outlined in this report is intended to guide the development of DPI infrastructure, standardization, support AI validation, and align regulatory and data-sharing practices to advance precision oncology.
{"title":"Digital pathology imaging artificial intelligence in cancer research and clinical trials: An NCI workshop report","authors":"Hala R. Makhlouf , Miguel R. Ossandon , Keyvan Farahani , Irina Lubensky , Lyndsay N. Harris","doi":"10.1016/j.jpi.2025.100531","DOIUrl":"10.1016/j.jpi.2025.100531","url":null,"abstract":"<div><div>Digital pathology imaging (DPI) is a rapidly advancing field with increasing relevance to cancer diagnosis, research, and clinical trials through large-scale image analysis and artificial intelligence (AI) integration. Despite these advances, regulatory adoption in digital pathology (DP) has lagged; to date, only three AI/ML Software as a Medical Device tool have received FDA clearance, highlighting a validation dataset gap rather than an absence of regulatory pathways. On March 6–7, 2024, the National Cancer Institute held a virtual workshop titled “Digital Pathology Imaging-Artificial Intelligence in Cancer Research and Clinical Trials,” bringing together experts in pathology, radiology, oncology, data science, and regulatory fields to assess current challenges, practical solutions, and future directions. This report summarizes expert opinions on key issues related to the use of DPI in cancer research and clinical trials, including data standardization, de-identification, and the application of Digital Imaging and Communication in Medicine (DICOM) standards. Key topics included data standardization, image quality assurance, validation strategies, AI applications, integration in clinical trials, biobanking, intellectual property, investigators' needs, and lessons from digital cytology and radiology domains. Solutions discussed included adoption of open standards such as DICOM, centralized imaging portals, and scalable cloud-based platforms. The expert consensus outlined in this report is intended to guide the development of DPI infrastructure, standardization, support AI validation, and align regulatory and data-sharing practices to advance precision oncology.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"20 ","pages":"Article 100531"},"PeriodicalIF":0.0,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.jpi.2025.100522
Andrew L. Valesano, Stephanie L. Skala , Mustafa Yousif
Serous tubal intraepithelial carcinoma (STIC) is an uncommon, non-invasive carcinoma that occurs more frequently in individuals with germline BRCA mutations and is an established precursor to high-grade serous ovarian carcinoma. STIC can be challenging to detect during pathologist evaluation, as it can manifest as a small focus of atypia in an otherwise benign salpingectomy specimen. There is a clinical need for scalable, weakly supervised computational approaches to aid in the detection of STIC. We developed a deep learning model to identify STIC and serous tubal intraepithelial lesions (STIL) in whole-slide images. We obtained fallopian tube specimens diagnosed as STIC (n = 49), STIL (n = 48), and benign fallopian tube (n = 83) at a single academic medical center. We trained a weakly supervised, attention-based multiple instance learning model and evaluated performance on independent datasets, including an additional unbalanced dataset (n = 40 benign, n = 2 STIL, n = 1 STIC) and cases diagnosed descriptively as benign reactive atypia (n = 53). The model achieved high sensitivity and specificity on the balanced validation cohort, with an area under the receiver operating characteristic curve (AUROC) of 0.96 (95% CI: 0.90–1.00), and demonstrated similarly strong performance on unbalanced validation cohorts (AUROC 0.98). Interpretability analyses indicated that model decisions were based on epithelial atypia. These results support the potential of integrating deep learning screening tools into clinical workflows to augment pathologist efficiency and diagnostic accuracy in fallopian tubes.
{"title":"Weakly supervised deep learning-based detection of serous tubal intraepithelial carcinoma in fallopian tubes","authors":"Andrew L. Valesano, Stephanie L. Skala , Mustafa Yousif","doi":"10.1016/j.jpi.2025.100522","DOIUrl":"10.1016/j.jpi.2025.100522","url":null,"abstract":"<div><div>Serous tubal intraepithelial carcinoma (STIC) is an uncommon, non-invasive carcinoma that occurs more frequently in individuals with germline <em>BRCA</em> mutations and is an established precursor to high-grade serous ovarian carcinoma. STIC can be challenging to detect during pathologist evaluation, as it can manifest as a small focus of atypia in an otherwise benign salpingectomy specimen. There is a clinical need for scalable, weakly supervised computational approaches to aid in the detection of STIC. We developed a deep learning model to identify STIC and serous tubal intraepithelial lesions (STIL) in whole-slide images. We obtained fallopian tube specimens diagnosed as STIC (<em>n</em> = 49), STIL (<em>n</em> = 48), and benign fallopian tube (<em>n</em> = 83) at a single academic medical center. We trained a weakly supervised, attention-based multiple instance learning model and evaluated performance on independent datasets, including an additional unbalanced dataset (<em>n</em> = 40 benign, <em>n</em> = 2 STIL, <em>n</em> = 1 STIC) and cases diagnosed descriptively as benign reactive atypia (<em>n</em> = 53). The model achieved high sensitivity and specificity on the balanced validation cohort, with an area under the receiver operating characteristic curve (AUROC) of 0.96 (95% CI: 0.90–1.00), and demonstrated similarly strong performance on unbalanced validation cohorts (AUROC 0.98). Interpretability analyses indicated that model decisions were based on epithelial atypia. These results support the potential of integrating deep learning screening tools into clinical workflows to augment pathologist efficiency and diagnostic accuracy in fallopian tubes.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100522"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145525750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.jpi.2025.100523
Michael N. Wicks , Michael Glinka , Bill Hill , Derek Houghton , Bernard Haggarty , Jorge Del-Pozo , Ingrid Ferreira , Florian Jaeckle , David Adams , Shahida Din , Irene Papatheodorou , Kathryn Kirkwood , Albert Burger , Richard A. Baldock , Mark J. Arends
The Comparative Pathology Workbench (CPW) is a web-browser-based visual analytics platform providing shared access to an interactive “spreadsheet” style presentation of image data and associated analysis data. The software was developed to enable pathologists and other clinical and research users to compare histopathological images of diseased and/or normal tissues between different samples of the same or different patients/species. The CPW provides a grid layout of cells in rows and columns so that images that correspond to matching data can be organized in the form of an image-enabled “spreadsheet”. An individual workbench or bench can be shared with other users with read-only or full edit access as required. In addition, each bench cell or the whole bench itself has an associated discussion thread to allow collaborative analysis and consensual interpretation of the data. Here, we present the updated system based on 2 years of active use in the field that generated constructive feedback. The updates deliver new capabilities, including automated importation of entire image collections, sorting image collections, long running tasks, public benches, uploading miscellaneous image types, refining search facilities, enabling use of tags, and improving efficiency, speed, and user-friendliness.
{"title":"The comparative pathology workbench: An update","authors":"Michael N. Wicks , Michael Glinka , Bill Hill , Derek Houghton , Bernard Haggarty , Jorge Del-Pozo , Ingrid Ferreira , Florian Jaeckle , David Adams , Shahida Din , Irene Papatheodorou , Kathryn Kirkwood , Albert Burger , Richard A. Baldock , Mark J. Arends","doi":"10.1016/j.jpi.2025.100523","DOIUrl":"10.1016/j.jpi.2025.100523","url":null,"abstract":"<div><div>The Comparative Pathology Workbench (CPW) is a web-browser-based visual analytics platform providing shared access to an interactive “spreadsheet” style presentation of image data and associated analysis data. The software was developed to enable pathologists and other clinical and research users to compare histopathological images of diseased and/or normal tissues between different samples of the same or different patients/species. The CPW provides a grid layout of cells in rows and columns so that images that correspond to matching data can be organized in the form of an image-enabled “spreadsheet”. An individual workbench or bench can be shared with other users with read-only or full edit access as required. In addition, each bench cell or the whole bench itself has an associated discussion thread to allow collaborative analysis and consensual interpretation of the data. Here, we present the updated system based on 2 years of active use in the field that generated constructive feedback. The updates deliver new capabilities, including automated importation of entire image collections, sorting image collections, long running tasks, public benches, uploading miscellaneous image types, refining search facilities, enabling use of tags, and improving efficiency, speed, and user-friendliness.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100523"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145525751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.jpi.2025.100528
Wei Huang , Huihua Li , Philipos Tsourkas , Sean Mcilwain , Irene Ong , Christos E. Kyriakopoulos , Brian Johnson , Steve Y. Cho , Shane A. Wells , Alejandro Roldan Alzate , David F. Jarrard , Erika Heninger , Joshua M. Lang
Accurate assessment of partial pathological response rate (ppRR) to neoadjuvant chemotherapy (NAT) is critical for assessing the efficacy of therapy and for optimal clinical management. Because of a lack of accurate estimation of baseline cancer burden, assessment of ppRR has never been attempted in prostate histologically. We presented a novel morphometric approach assessing ppRR in patients who underwent NAT and then correlated the ppRR with patients' outcomes. A control cohort consisted of 39 NAT-naïve Caucasian patients who had high-risk PCa (defined as Gleason Grade Group >2) and an adequate biopsy sample (defined as the size of the biopsy PCa area, including PCa epithelium and stroma >2 mm2). A study cohort included 26 patients with high-risk PCa (defined as clinical stage T3a or higher, serum PSA >20 ng/mL, or GGG of 4–5, or with oligometastatic disease) who underwent androgen deprivation therapy plus docetaxel. Using the PCa epithelial to stromal ratio (E/S) as a metric, surrogate BCB for the study cohort was predicted from the pre-treatment biopsy samples, and ppRR was calculated. Correlation analysis of patients' ppRR with progression-free survival was performed using ppRR >80% as a cut-off.
Nine of the 26 patients from the study cohort experienced a significant response to NAT (ppRR > 80%) using the PCa E/S-based approach, and these patients had significantly better progression-free survival (p = 0.006). ppRR to NAT can be reliably assessed using PCa E/S as a surrogate metric from biopsy and RP samples, and ppRR can be used to predict patients' outcomes.
{"title":"Quantifying partial pathological response rate in prostate cancer patients who underwent neoadjuvant chemotherapy using a novel morphometric approach","authors":"Wei Huang , Huihua Li , Philipos Tsourkas , Sean Mcilwain , Irene Ong , Christos E. Kyriakopoulos , Brian Johnson , Steve Y. Cho , Shane A. Wells , Alejandro Roldan Alzate , David F. Jarrard , Erika Heninger , Joshua M. Lang","doi":"10.1016/j.jpi.2025.100528","DOIUrl":"10.1016/j.jpi.2025.100528","url":null,"abstract":"<div><div>Accurate assessment of partial pathological response rate (ppRR) to neoadjuvant chemotherapy (NAT) is critical for assessing the efficacy of therapy and for optimal clinical management. Because of a lack of accurate estimation of baseline cancer burden, assessment of ppRR has never been attempted in prostate histologically. We presented a novel morphometric approach assessing ppRR in patients who underwent NAT and then correlated the ppRR with patients' outcomes. A control cohort consisted of 39 NAT-naïve Caucasian patients who had high-risk PCa (defined as Gleason Grade Group >2) and an adequate biopsy sample (defined as the size of the biopsy PCa area, including PCa epithelium and stroma >2 <sup>mm2</sup>). A study cohort included 26 patients with high-risk PCa (defined as clinical stage T3a or higher, serum PSA >20 ng/mL, or GGG of 4–5, or with oligometastatic disease) who underwent androgen deprivation therapy plus docetaxel. Using the PCa epithelial to stromal ratio (E/S) as a metric, surrogate BCB for the study cohort was predicted from the pre-treatment biopsy samples, and ppRR was calculated. Correlation analysis of patients' ppRR with progression-free survival was performed using ppRR >80% as a cut-off.</div><div>Nine of the 26 patients from the study cohort experienced a significant response to NAT (ppRR > 80%) using the PCa E/S-based approach, and these patients had significantly better progression-free survival (<em>p</em> = 0.006). ppRR to NAT can be reliably assessed using PCa E/S as a surrogate metric from biopsy and RP samples, and ppRR can be used to predict patients' outcomes.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100528"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145690694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.jpi.2025.100527
Sahar Narimani, Sophie Pirenne, Birgit Weynand
Introduction
The Ki67 proliferation index is mandatory for grading, prognostication, and clinical decision-making in pancreatic neuroendocrine tumors (PanNETs). Automatic Ki67 quantification on cytology has been shown to be at least as accurate, less time-consuming, and more consistent than the current gold-standard manual determination. After a thorough literature review, we aimed to validate the Visiopharm image analysis software for automatic Ki67 quantification on diagnostic cell block material from PanNETs.
Methods
We conducted a retrospective study and assembled a cohort of 69 PanNETs from clinical routine with available endoscopic ultrasound fine needle aspiration cell block, Ki67, and synaptophysin immunostained slides. The manual Ki67 index, if available, was obtained from the original pathology report. Otherwise, a manual count was performed by a pathologist using a cell counter. The automatic Ki67 index was quantified through four consecutive algorithms from the Visiopharm Image Analysis software on aligned serial sections.
Results
Automatic Ki67 quantification showed a strong correlation with manual counting based on the non-parametric Spearman correlation coefficients of r = 0.786 [95% confidence interval (CI): 0.650–0.873, p < 0.001] and r = 0.721 (95% CI: 0.558–0.830, p < 0.001]), for absolute Ki67 values and grades, respectively. Grade concordance showed excellent agreement for Grade 1 and Grade 3 tumors (91.89% and 83.3%) and rather moderate agreement for Grade 2 lesions (59.09%) due to underestimation. Bland–Altman analysis obtained excellent results, with a mean underestimation of digital versus manual quantification of 0.2265%.
Conclusion
Our findings show accurate assessment of the proliferation index from PanNETs using the Visiopharm software for digital Ki67 quantification and provide a prevalidation framework for the implementation of this technique in pathology practice. Discrepancies were mainly seen in Grade 2 tumors due to tumor heterogeneity of Grade 2 lesions. To this end, future research should seek refinement of the digital algorithms and examine the reliability of prognosis and clinical endpoints based on this technique.
Ki67增殖指数是胰腺神经内分泌肿瘤(PanNETs)分级、预后和临床决策的强制性指标。细胞学上的自动Ki67定量已被证明至少与目前的金标准手工测定一样准确,更少耗时,更一致。经过全面的文献综述,我们旨在验证Visiopharm图像分析软件对PanNETs诊断细胞块材料的Ki67自动定量。方法采用内镜超声细针穿刺细胞阻滞、Ki67和synaptophysin免疫染色玻片,对69例临床常规PanNETs进行回顾性研究。手工Ki67索引(如果有的话)是从原始病理报告中获得的。否则,由病理学家使用细胞计数器进行手动计数。自动Ki67指数通过Visiopharm图像分析软件在对齐的序列切片上连续四种算法进行量化。结果Ki67的绝对值和分级的非参数Spearman相关系数分别为r = 0.786[95%置信区间(CI): 0.650-0.873, p <; 0.001]和r = 0.721 (95% CI: 0.558-0.830, p < 0.001]),自动Ki67定量显示与人工计数有很强的相关性。分级一致性显示1级和3级肿瘤的一致性非常好(91.89%和83.3%),由于低估,2级病变的一致性相当中等(59.09%)。Bland-Altman分析获得了极好的结果,与人工量化相比,数字量化的平均低估率为0.2265%。结论使用Visiopharm软件可准确评估PanNETs的增殖指数,并为该技术在病理实践中的应用提供了预验证框架。由于2级病变的肿瘤异质性,差异主要见于2级肿瘤。为此,未来的研究应寻求数字算法的改进,并检查基于该技术的预后和临床终点的可靠性。
{"title":"Ki67 in cytological specimens of pancreatic neuroendocrine tumors: A literature review and validation of automated quantification","authors":"Sahar Narimani, Sophie Pirenne, Birgit Weynand","doi":"10.1016/j.jpi.2025.100527","DOIUrl":"10.1016/j.jpi.2025.100527","url":null,"abstract":"<div><h3>Introduction</h3><div>The Ki67 proliferation index is mandatory for grading, prognostication, and clinical decision-making in pancreatic neuroendocrine tumors (PanNETs). Automatic Ki67 quantification on cytology has been shown to be at least as accurate, less time-consuming, and more consistent than the current gold-standard manual determination. After a thorough literature review, we aimed to validate the Visiopharm image analysis software for automatic Ki67 quantification on diagnostic cell block material from PanNETs.</div></div><div><h3>Methods</h3><div>We conducted a retrospective study and assembled a cohort of 69 PanNETs from clinical routine with available endoscopic ultrasound fine needle aspiration cell block, Ki67, and synaptophysin immunostained slides. The manual Ki67 index, if available, was obtained from the original pathology report. Otherwise, a manual count was performed by a pathologist using a cell counter. The automatic Ki67 index was quantified through four consecutive algorithms from the Visiopharm Image Analysis software on aligned serial sections.</div></div><div><h3>Results</h3><div>Automatic Ki67 quantification showed a strong correlation with manual counting based on the non-parametric Spearman correlation coefficients of <em>r</em> <!-->=<!--> <!-->0.786 [95% confidence interval (CI): 0.650–0.873, <em>p</em> <!--><<!--> <!-->0.001] and <em>r</em> <!-->=<!--> <!-->0.721 (95% CI: 0.558–0.830, <em>p</em> <!--><<!--> <!-->0.001]<em>)</em>, for absolute Ki67 values and grades, respectively. Grade concordance showed excellent agreement for Grade 1 and Grade 3 tumors (91.89% and 83.3%) and rather moderate agreement for Grade 2 lesions (59.09%) due to underestimation. Bland–Altman analysis obtained excellent results, with a mean underestimation of digital versus manual quantification of 0.2265%.</div></div><div><h3>Conclusion</h3><div>Our findings show accurate assessment of the proliferation index from PanNETs using the Visiopharm software for digital Ki67 quantification and provide a prevalidation framework for the implementation of this technique in pathology practice. Discrepancies were mainly seen in Grade 2 tumors due to tumor heterogeneity of Grade 2 lesions. To this end, future research should seek refinement of the digital algorithms and examine the reliability of prognosis and clinical endpoints based on this technique.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100527"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145690583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.jpi.2025.100521
Jeya Balaji Balasubramanian , Daniel Adams , Ioannis Roxanis , Amy Berrington de Gonzalez , Penny Coulson , Jonas S. Almeida , Montserrat García-Closas
Background
Structured information extraction from unstructured histopathology reports facilitates data accessibility for clinical research. Manual extraction by experts is time-consuming and expensive, limiting scalability. Large language models (LLMs) offer efficient automated extraction through zero-shot prompting, requiring only natural language instructions without labeled data or training. We evaluate LLMs' accuracy in extracting structured information from breast cancer histopathology reports, compared to manual extraction by a trained human annotator.
Methods
We developed the Medical Report Information Extractor, a web application leveraging LLMs for automated extraction. We also developed a gold-standard extraction dataset to evaluate the human annotator alongside five LLMs including GPT-4o, a leading proprietary model, and the Llama 3 model family, which allows self-hosting for data privacy. Our assessment involved 111 breast cancer histopathology reports from the Generations study, extracting 51 pathology features specified within the study's data dictionary.
Results
Evaluation against the gold-standard dataset showed that both Llama 3.1 405B (94.7% accuracy) and GPT-4o (96.1%) achieved extraction accuracy comparable to the human annotator (95.4%; p = 0.146 and p = 0.106, respectively). Whereas Llama 3.1 70B (91.6%) performed below human accuracy (p < 0.001), its reduced computational requirements make it a viable option for self-hosting.
Conclusion
We developed an open-source tool for structured information extraction that demonstrated expert human-level accuracy in our evaluation using state-of-the-art LLMs. The tool can be customized by non-programmers using natural language and the modular design enables reuse for diverse extraction tasks to produce standardized, structured data facilitating analytics through improved accessibility and interoperability.
{"title":"Leveraging large language models for structured information extraction from pathology reports","authors":"Jeya Balaji Balasubramanian , Daniel Adams , Ioannis Roxanis , Amy Berrington de Gonzalez , Penny Coulson , Jonas S. Almeida , Montserrat García-Closas","doi":"10.1016/j.jpi.2025.100521","DOIUrl":"10.1016/j.jpi.2025.100521","url":null,"abstract":"<div><h3>Background</h3><div>Structured information extraction from unstructured histopathology reports facilitates data accessibility for clinical research. Manual extraction by experts is time-consuming and expensive, limiting scalability. Large language models (LLMs) offer efficient automated extraction through zero-shot prompting, requiring only natural language instructions without labeled data or training. We evaluate LLMs' accuracy in extracting structured information from breast cancer histopathology reports, compared to manual extraction by a trained human annotator.</div></div><div><h3>Methods</h3><div>We developed the Medical Report Information Extractor, a web application leveraging LLMs for automated extraction. We also developed a gold-standard extraction dataset to evaluate the human annotator alongside five LLMs including GPT-4o, a leading proprietary model, and the Llama 3 model family, which allows self-hosting for data privacy. Our assessment involved 111 breast cancer histopathology reports from the Generations study, extracting 51 pathology features specified within the study's data dictionary.</div></div><div><h3>Results</h3><div>Evaluation against the gold-standard dataset showed that both Llama 3.1 405B (94.7% accuracy) and GPT-4o (96.1%) achieved extraction accuracy comparable to the human annotator (95.4%; <em>p</em> = 0.146 and <em>p</em> = 0.106, respectively). Whereas Llama 3.1 70B (91.6%) performed below human accuracy (<em>p</em> < 0.001), its reduced computational requirements make it a viable option for self-hosting.</div></div><div><h3>Conclusion</h3><div>We developed an open-source tool for structured information extraction that demonstrated expert human-level accuracy in our evaluation using state-of-the-art LLMs. The tool can be customized by non-programmers using natural language and the modular design enables reuse for diverse extraction tasks to produce standardized, structured data facilitating analytics through improved accessibility and interoperability.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100521"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145578929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.jpi.2025.100519
Benjamin K. Olson , Joseph H. Rosenthal , Ryan D. Kappedal , Niels H. Olson
Laboratories must verify and validate assays before reporting results in the clinical record. With the advent of machine learning algorithms, multiclass decision-support tools are coming online but the FDA explicitly does not contemplate multiclass problems in their guidance for test validation. Validation requires, for a laboratory's patient population, evaluation of four performance characteristics to a reference method: accuracy, precision, reportable range, and reference intervals. In the absence of a reference method, proportion of agreement is the appropriate metric (Meier 2007). For subjective tests, the traditional metrics for precision are in the area of interrater reliability, and interrater reliability is well studied in the pathology literature (“Gwet Handbook of Interrater Reliability 4th Ed.pdf,” n.d.). Recently, Guo and Han introduced an alternative framing, Observers Needed to Evaluate a Subjective Test (ONEST). This article introduces a treatment effect extension of ONEST, Cases and Observers Needed to Evaluate a Subjective Test (CONTEST) and demonstrates that the agreement and disagreement distributions can be reasonably specified with parametric probability distributions such that the required sample size for a test, at a given level and power, can be calculated. We argue that this would be an appropriate method to develop for validation of tools used to augment a subjective test, given a prior set of cases, observers, and decisions, such as from another archive, cohort, or dataset, particularly in resource-constrained settings.
{"title":"CONTEST: A generalization of ONEST to estimate sample size for predictive augmented intelligence method validation studies","authors":"Benjamin K. Olson , Joseph H. Rosenthal , Ryan D. Kappedal , Niels H. Olson","doi":"10.1016/j.jpi.2025.100519","DOIUrl":"10.1016/j.jpi.2025.100519","url":null,"abstract":"<div><div>Laboratories must verify and validate assays before reporting results in the clinical record. With the advent of machine learning algorithms, multiclass decision-support tools are coming online but the FDA explicitly does not contemplate multiclass problems in their guidance for test validation. Validation requires, for a laboratory's patient population, evaluation of four performance characteristics to a reference method: accuracy, precision, reportable range, and reference intervals. In the absence of a reference method, proportion of agreement is the appropriate metric (Meier 2007). For subjective tests, the traditional metrics for precision are in the area of interrater reliability, and interrater reliability is well studied in the pathology literature (“Gwet Handbook of Interrater Reliability 4th Ed.pdf,” n.d.). Recently, Guo and Han introduced an alternative framing, Observers Needed to Evaluate a Subjective Test (ONEST). This article introduces a treatment effect extension of ONEST, Cases and Observers Needed to Evaluate a Subjective Test (CONTEST) and demonstrates that the agreement and disagreement distributions can be reasonably specified with parametric probability distributions such that the required sample size for a test, at a given level and power, can be calculated. We argue that this would be an appropriate method to develop for validation of tools used to augment a subjective test, given a prior set of cases, observers, and decisions, such as from another archive, cohort, or dataset, particularly in resource-constrained settings.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100519"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145465588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.jpi.2025.100520
Suparna Nanua , Raven Steward , Benjamin Neely , Michael Datto , Kenneth Youens
Large language models (LLMs) have demonstrated strong performance on general knowledge tasks, but they have important limitations as standalone tools for question answering in specialized domains where accuracy and consistency are critical. Retrieval-augmented generation (RAG) is a strategy in which LLM outputs are grounded in dynamically retrieved source documents, offering advantages in accuracy, explainability, and maintainability. We developed and evaluated a custom RAG system called Raven, designed to answer laboratory regulatory questions using the part of the Code of Federal Regulations (CFR) pertaining to laboratory (42 CFR Part 493) as an authoritative source. Raven employed a vector search pipeline and a LLM to generate grounded responses via a chatbot–style interface. The system was tested using 103 synthetic laboratory regulatory questions, 88 of which were explicitly addressed in the CFR. Compared to answers generated manually by a board-certified pathologist, Raven's responses were judged to be totally complete and correct in 92.0% of those 88 cases, with little irrelevant content and a low potential for regulatory or medical error. Performance declined significantly on questions not addressed in the CFR, confirming the system's grounding in the source documents. Most suboptimal responses were attributable to faulty source document retrieval rather than model hallucination or misinterpretation. These findings demonstrate that a basic RAG system can produce useful, accurate, and verifiable answers to complex regulatory questions. With appropriate safeguards and with thoughtful integration into user workflows, tools like Raven may serve as valuable decision-support systems in laboratory medicine and other knowledge-intensive healthcare domains.
大型语言模型(llm)已经在一般知识任务上展示了强大的性能,但是它们作为在准确性和一致性至关重要的专业领域的问题回答的独立工具有重要的局限性。检索增强生成(RAG)是一种策略,其中LLM输出以动态检索的源文档为基础,在准确性、可解释性和可维护性方面具有优势。我们开发并评估了一个名为Raven的定制RAG系统,该系统旨在使用联邦法规(CFR)有关实验室的部分(42 CFR part 493)作为权威来源来回答实验室监管问题。Raven采用矢量搜索管道和LLM,通过聊天机器人风格的界面生成接地响应。该系统使用103个合成实验室监管问题进行了测试,其中88个在CFR中明确解决。与经过专业认证的病理学家手动生成的答案相比,在这88个病例中,Raven的回答有92.0%被认为是完全完整和正确的,几乎没有不相关的内容,出现监管或医疗错误的可能性也很低。在CFR中未解决的问题上,性能显著下降,证实了源文件中系统的接地。大多数次优反应可归因于错误的源文件检索,而不是模型幻觉或误解。这些发现表明,一个基本的RAG系统可以为复杂的监管问题提供有用、准确和可验证的答案。通过适当的保护措施和周到地集成到用户工作流中,Raven等工具可以作为实验室医学和其他知识密集型医疗保健领域中有价值的决策支持系统。
{"title":"Retrieval-augmented generation for interpreting clinical laboratory regulations using large language models","authors":"Suparna Nanua , Raven Steward , Benjamin Neely , Michael Datto , Kenneth Youens","doi":"10.1016/j.jpi.2025.100520","DOIUrl":"10.1016/j.jpi.2025.100520","url":null,"abstract":"<div><div>Large language models (LLMs) have demonstrated strong performance on general knowledge tasks, but they have important limitations as standalone tools for question answering in specialized domains where accuracy and consistency are critical. Retrieval-augmented generation (RAG) is a strategy in which LLM outputs are grounded in dynamically retrieved source documents, offering advantages in accuracy, explainability, and maintainability. We developed and evaluated a custom RAG system called Raven, designed to answer laboratory regulatory questions using the part of the Code of Federal Regulations (CFR) pertaining to laboratory (42 CFR Part 493) as an authoritative source. Raven employed a vector search pipeline and a LLM to generate grounded responses via a chatbot–style interface. The system was tested using 103 synthetic laboratory regulatory questions, 88 of which were explicitly addressed in the CFR. Compared to answers generated manually by a board-certified pathologist, Raven's responses were judged to be totally complete and correct in 92.0% of those 88 cases, with little irrelevant content and a low potential for regulatory or medical error. Performance declined significantly on questions not addressed in the CFR, confirming the system's grounding in the source documents. Most suboptimal responses were attributable to faulty source document retrieval rather than model hallucination or misinterpretation. These findings demonstrate that a basic RAG system can produce useful, accurate, and verifiable answers to complex regulatory questions. With appropriate safeguards and with thoughtful integration into user workflows, tools like Raven may serve as valuable decision-support systems in laboratory medicine and other knowledge-intensive healthcare domains.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100520"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145415716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}