首页 > 最新文献

Journal of Pathology Informatics最新文献

英文 中文
Developing a smart and scalable tool for histopathological education—PATe 2.0 开发一个智能和可扩展的组织病理学教育工具- pate 2.0
Q2 Medicine Pub Date : 2025-12-05 DOI: 10.1016/j.jpi.2025.100535
Lina Winter , Annalena Artinger , Hendrik Böck , Vignesh Ramakrishnan , Bruno Reible , Jan Albin , Peter J. Schüffler , Georgios Raptis , Christoph Brochhausen
Digital microscopy plays a crucial role in pathology education, providing scalable and standardized access to learning resources. In response, we present PATe 2.0, a scalable redeveloped web-application of the former PATe system from 2015. PATe 2.0 was developed using an agile, iterative process and built on a microservices architecture to ensure modularity, scalability, and reliability. It integrates a modern web-based user interface optimized for desktop and tablet use and automates key workflows such as whole-slide image uploads and processing. Performance tests demonstrated that PATe 2.0 significantly reduces tile request times compared to PATe, despite handling larger tiles. The platform supports open formats like DICOM and OpenSlide, enhancing its interoperability and adaptability across institutions. PATe 2.0 represents a robust digital microscopy solution in pathology education enhancing usability, performance, and flexibility. Its design enables future integration of research algorithms and highlights it as a pivotal tool for advancing pathology education and research.
数字显微镜在病理学教育中起着至关重要的作用,提供了可扩展和标准化的学习资源。作为回应,我们提出了PATe 2.0,这是2015年以前的PATe系统的可扩展的重新开发的web应用程序。PATe 2.0是使用敏捷的迭代过程开发的,并构建在微服务体系结构上,以确保模块化、可伸缩性和可靠性。它集成了一个现代的基于web的用户界面,为桌面和平板电脑的使用进行了优化,并自动化了关键的工作流程,如整张幻灯片图像的上传和处理。性能测试表明,尽管处理的贴图更大,但与PATe相比,PATe 2.0显著减少了贴图请求时间。该平台支持DICOM和OpenSlide等开放格式,增强了其跨机构的互操作性和适应性。PATe 2.0代表了病理学教育中强大的数字显微镜解决方案,增强了可用性,性能和灵活性。它的设计使未来的研究算法的整合,并突出了它作为一个关键的工具,推进病理教育和研究。
{"title":"Developing a smart and scalable tool for histopathological education—PATe 2.0","authors":"Lina Winter ,&nbsp;Annalena Artinger ,&nbsp;Hendrik Böck ,&nbsp;Vignesh Ramakrishnan ,&nbsp;Bruno Reible ,&nbsp;Jan Albin ,&nbsp;Peter J. Schüffler ,&nbsp;Georgios Raptis ,&nbsp;Christoph Brochhausen","doi":"10.1016/j.jpi.2025.100535","DOIUrl":"10.1016/j.jpi.2025.100535","url":null,"abstract":"<div><div>Digital microscopy plays a crucial role in pathology education, providing scalable and standardized access to learning resources. In response, we present PATe 2.0, a scalable redeveloped web-application of the former PATe system from 2015. PATe 2.0 was developed using an agile, iterative process and built on a microservices architecture to ensure modularity, scalability, and reliability. It integrates a modern web-based user interface optimized for desktop and tablet use and automates key workflows such as whole-slide image uploads and processing. Performance tests demonstrated that PATe 2.0 significantly reduces tile request times compared to PATe, despite handling larger tiles. The platform supports open formats like DICOM and OpenSlide, enhancing its interoperability and adaptability across institutions. PATe 2.0 represents a robust digital microscopy solution in pathology education enhancing usability, performance, and flexibility. Its design enables future integration of research algorithms and highlights it as a pivotal tool for advancing pathology education and research.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"20 ","pages":"Article 100535"},"PeriodicalIF":0.0,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Technical considerations during validation of the Genius® Digital Diagnostic System Genius®数字诊断系统验证期间的技术考虑
Q2 Medicine Pub Date : 2025-11-19 DOI: 10.1016/j.jpi.2025.100532
Lakshmi Harinath , Sarah Harrington , Jonee Matsko , Amy Colaizzi , Esther Elishaev , Samer Khader , Rohit Bhargava , Chengquan Zhao , Liron Pantanowitz

Background

The aim of this study was to document technical errors encountered during validation of the Genius Digital Diagnostics System (GDDS).

Materials and methods

A total of 909 cases of archived ThinPrep Pap slides with follow-up biopsies were retrieved. Slides were cleaned, relabeled, and scanned with GDDS. Digital imager errors, including slide events and imager errors, were documented and evaluated.

Results

Of the 909 slides scanned, 21 (2.3 %) demonstrated slide events. For 5 cases, the slides had cell focus errors, 12 failed due to quality control (QC) errors, 2 had barcode issues, 1 showed an oversaturated frame, and 1 presented a problem because it was a duplicate. Some errors could be corrected, of which 8 cases with various diagnostic cytology interpretations were successfully rescanned. There were 13 (1.4%) cases that could not be scanned and thus were excluded from the study, predominantly because of focus QC errors due to scratched coverslips from long-term storage. There were 43 imager errors including failure of motor movement, cancellation of slide handling action, and failure to pick slides from the carrier station for which the scanning process had to be paused. Imager errors were solved by rebooting the system, correcting the positioning of the slide on the system, and technical help provided by the vendor.

Conclusion

Minor errors are to be expected when digitizing large volume of Pap slides. Total number of rescanned cases to address such technical problems were low in number and did not compromise the interpretation of Pap test slides using GDDS.
本研究的目的是记录天才数字诊断系统(GDDS)验证过程中遇到的技术错误。材料与方法共检索909例已存档的薄prep Pap切片并随访活检。切片清洗,重新贴上标签,并用GDDS扫描。数字成像仪错误,包括滑动事件和成像仪错误,被记录和评估。结果在扫描的909张幻灯片中,21张(2.3%)出现滑动事件。在5例中,载玻片有细胞聚焦错误,12例由于质量控制(QC)错误而失败,2例有条形码问题,1例显示过饱和帧,1例因为重复而出现问题。有一些错误是可以纠正的,其中有8例诊断细胞学解释不同的病例成功重新扫描。有13例(1.4%)病例无法扫描,因此被排除在研究之外,主要是因为长期储存造成的盖子划伤导致焦点QC错误。有43个成像仪错误,包括电机运动失败,取消载玻片处理动作,未能从载体站取玻片,扫描过程必须暂停。成像仪错误通过重新启动系统、纠正系统上载玻片的定位以及供应商提供的技术帮助来解决。结论对大量巴氏涂片进行数字化处理,误差较小。为解决此类技术问题而重新扫描病例的总数较少,并且不影响使用GDDS对巴氏涂片的解释。
{"title":"Technical considerations during validation of the Genius® Digital Diagnostic System","authors":"Lakshmi Harinath ,&nbsp;Sarah Harrington ,&nbsp;Jonee Matsko ,&nbsp;Amy Colaizzi ,&nbsp;Esther Elishaev ,&nbsp;Samer Khader ,&nbsp;Rohit Bhargava ,&nbsp;Chengquan Zhao ,&nbsp;Liron Pantanowitz","doi":"10.1016/j.jpi.2025.100532","DOIUrl":"10.1016/j.jpi.2025.100532","url":null,"abstract":"<div><h3>Background</h3><div>The aim of this study was to document technical errors encountered during validation of the Genius Digital Diagnostics System (GDDS).</div></div><div><h3>Materials and methods</h3><div>A total of 909 cases of archived ThinPrep Pap slides with follow-up biopsies were retrieved. Slides were cleaned, relabeled, and scanned with GDDS. Digital imager errors, including slide events and imager errors, were documented and evaluated.</div></div><div><h3>Results</h3><div>Of the 909 slides scanned, 21 (2.3<!--> <!-->%) demonstrated slide events. For 5 cases, the slides had cell focus errors, 12 failed due to quality control (QC) errors, 2 had barcode issues, 1 showed an oversaturated frame, and 1 presented a problem because it was a duplicate. Some errors could be corrected, of which 8 cases with various diagnostic cytology interpretations were successfully rescanned. There were 13 (1.4%) cases that could not be scanned and thus were excluded from the study, predominantly because of focus QC errors due to scratched coverslips from long-term storage. There were 43 imager errors including failure of motor movement, cancellation of slide handling action, and failure to pick slides from the carrier station for which the scanning process had to be paused. Imager errors were solved by rebooting the system, correcting the positioning of the slide on the system, and technical help provided by the vendor.</div></div><div><h3>Conclusion</h3><div>Minor errors are to be expected when digitizing large volume of Pap slides. Total number of rescanned cases to address such technical problems were low in number and did not compromise the interpretation of Pap test slides using GDDS.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"20 ","pages":"Article 100532"},"PeriodicalIF":0.0,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital pathology imaging artificial intelligence in cancer research and clinical trials: An NCI workshop report 数字病理成像人工智能在癌症研究和临床试验:NCI研讨会报告
Q2 Medicine Pub Date : 2025-11-14 DOI: 10.1016/j.jpi.2025.100531
Hala R. Makhlouf , Miguel R. Ossandon , Keyvan Farahani , Irina Lubensky , Lyndsay N. Harris
Digital pathology imaging (DPI) is a rapidly advancing field with increasing relevance to cancer diagnosis, research, and clinical trials through large-scale image analysis and artificial intelligence (AI) integration. Despite these advances, regulatory adoption in digital pathology (DP) has lagged; to date, only three AI/ML Software as a Medical Device tool have received FDA clearance, highlighting a validation dataset gap rather than an absence of regulatory pathways. On March 6–7, 2024, the National Cancer Institute held a virtual workshop titled “Digital Pathology Imaging-Artificial Intelligence in Cancer Research and Clinical Trials,” bringing together experts in pathology, radiology, oncology, data science, and regulatory fields to assess current challenges, practical solutions, and future directions. This report summarizes expert opinions on key issues related to the use of DPI in cancer research and clinical trials, including data standardization, de-identification, and the application of Digital Imaging and Communication in Medicine (DICOM) standards. Key topics included data standardization, image quality assurance, validation strategies, AI applications, integration in clinical trials, biobanking, intellectual property, investigators' needs, and lessons from digital cytology and radiology domains. Solutions discussed included adoption of open standards such as DICOM, centralized imaging portals, and scalable cloud-based platforms. The expert consensus outlined in this report is intended to guide the development of DPI infrastructure, standardization, support AI validation, and align regulatory and data-sharing practices to advance precision oncology.
数字病理成像(DPI)是一个快速发展的领域,通过大规模图像分析和人工智能(AI)集成,与癌症诊断、研究和临床试验的相关性越来越大。尽管取得了这些进步,但数字病理学(DP)的监管采用却滞后;到目前为止,只有三个AI/ML软件作为医疗器械工具获得了FDA的许可,这突出了验证数据集的差距,而不是缺乏监管途径。2024年3月6日至7日,美国国家癌症研究所举办了一场名为“数字病理成像——癌症研究和临床试验中的人工智能”的虚拟研讨会,汇集了病理学、放射学、肿瘤学、数据科学和监管领域的专家,以评估当前的挑战、实用的解决方案和未来的方向。本报告总结了与DPI在癌症研究和临床试验中使用相关的关键问题的专家意见,包括数据标准化、去识别和医学数字成像和通信(DICOM)标准的应用。主要议题包括数据标准化、图像质量保证、验证策略、人工智能应用、临床试验集成、生物银行、知识产权、研究者需求以及数字细胞学和放射学领域的经验教训。讨论的解决方案包括采用开放标准,如DICOM、集中式成像门户和可扩展的基于云的平台。本报告中概述的专家共识旨在指导DPI基础设施的发展,标准化,支持人工智能验证,并协调监管和数据共享实践,以推进精准肿瘤学。
{"title":"Digital pathology imaging artificial intelligence in cancer research and clinical trials: An NCI workshop report","authors":"Hala R. Makhlouf ,&nbsp;Miguel R. Ossandon ,&nbsp;Keyvan Farahani ,&nbsp;Irina Lubensky ,&nbsp;Lyndsay N. Harris","doi":"10.1016/j.jpi.2025.100531","DOIUrl":"10.1016/j.jpi.2025.100531","url":null,"abstract":"<div><div>Digital pathology imaging (DPI) is a rapidly advancing field with increasing relevance to cancer diagnosis, research, and clinical trials through large-scale image analysis and artificial intelligence (AI) integration. Despite these advances, regulatory adoption in digital pathology (DP) has lagged; to date, only three AI/ML Software as a Medical Device tool have received FDA clearance, highlighting a validation dataset gap rather than an absence of regulatory pathways. On March 6–7, 2024, the National Cancer Institute held a virtual workshop titled “Digital Pathology Imaging-Artificial Intelligence in Cancer Research and Clinical Trials,” bringing together experts in pathology, radiology, oncology, data science, and regulatory fields to assess current challenges, practical solutions, and future directions. This report summarizes expert opinions on key issues related to the use of DPI in cancer research and clinical trials, including data standardization, de-identification, and the application of Digital Imaging and Communication in Medicine (DICOM) standards. Key topics included data standardization, image quality assurance, validation strategies, AI applications, integration in clinical trials, biobanking, intellectual property, investigators' needs, and lessons from digital cytology and radiology domains. Solutions discussed included adoption of open standards such as DICOM, centralized imaging portals, and scalable cloud-based platforms. The expert consensus outlined in this report is intended to guide the development of DPI infrastructure, standardization, support AI validation, and align regulatory and data-sharing practices to advance precision oncology.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"20 ","pages":"Article 100531"},"PeriodicalIF":0.0,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weakly supervised deep learning-based detection of serous tubal intraepithelial carcinoma in fallopian tubes 基于弱监督深度学习的输卵管浆液性上皮内癌检测
Q2 Medicine Pub Date : 2025-11-01 DOI: 10.1016/j.jpi.2025.100522
Andrew L. Valesano, Stephanie L. Skala , Mustafa Yousif
Serous tubal intraepithelial carcinoma (STIC) is an uncommon, non-invasive carcinoma that occurs more frequently in individuals with germline BRCA mutations and is an established precursor to high-grade serous ovarian carcinoma. STIC can be challenging to detect during pathologist evaluation, as it can manifest as a small focus of atypia in an otherwise benign salpingectomy specimen. There is a clinical need for scalable, weakly supervised computational approaches to aid in the detection of STIC. We developed a deep learning model to identify STIC and serous tubal intraepithelial lesions (STIL) in whole-slide images. We obtained fallopian tube specimens diagnosed as STIC (n = 49), STIL (n = 48), and benign fallopian tube (n = 83) at a single academic medical center. We trained a weakly supervised, attention-based multiple instance learning model and evaluated performance on independent datasets, including an additional unbalanced dataset (n = 40 benign, n = 2 STIL, n = 1 STIC) and cases diagnosed descriptively as benign reactive atypia (n = 53). The model achieved high sensitivity and specificity on the balanced validation cohort, with an area under the receiver operating characteristic curve (AUROC) of 0.96 (95% CI: 0.90–1.00), and demonstrated similarly strong performance on unbalanced validation cohorts (AUROC 0.98). Interpretability analyses indicated that model decisions were based on epithelial atypia. These results support the potential of integrating deep learning screening tools into clinical workflows to augment pathologist efficiency and diagnostic accuracy in fallopian tubes.
浆液性输卵管上皮内癌(STIC)是一种罕见的、非侵袭性的癌症,多发生于BRCA种系突变个体,是高级别浆液性卵巢癌的先兆。在病理评估中发现STIC是很有挑战性的,因为它可以在良性输卵管切除术标本中表现为一个小的异型灶。临床需要可扩展的、弱监督的计算方法来帮助检测STIC。我们开发了一个深度学习模型来识别全片图像中的STIC和浆液性输卵管上皮内病变(STIL)。我们在一个学术医疗中心获得诊断为STIC (n = 49)、STIL (n = 48)和良性输卵管(n = 83)的输卵管标本。我们训练了一个弱监督的、基于注意力的多实例学习模型,并在独立数据集上评估其性能,包括一个额外的不平衡数据集(n = 40个良性数据集,n = 2个STIL数据集,n = 1个STIC数据集)和被描述诊断为良性反应性非典型型的病例(n = 53)。该模型在平衡验证队列中具有很高的灵敏度和特异性,受试者工作特征曲线下面积(AUROC)为0.96 (95% CI: 0.90-1.00),在不平衡验证队列中也表现出同样强的性能(AUROC为0.98)。可解释性分析表明,模型的决定是基于上皮异型性。这些结果支持将深度学习筛选工具整合到临床工作流程中,以提高输卵管病理学家的效率和诊断准确性。
{"title":"Weakly supervised deep learning-based detection of serous tubal intraepithelial carcinoma in fallopian tubes","authors":"Andrew L. Valesano,&nbsp;Stephanie L. Skala ,&nbsp;Mustafa Yousif","doi":"10.1016/j.jpi.2025.100522","DOIUrl":"10.1016/j.jpi.2025.100522","url":null,"abstract":"<div><div>Serous tubal intraepithelial carcinoma (STIC) is an uncommon, non-invasive carcinoma that occurs more frequently in individuals with germline <em>BRCA</em> mutations and is an established precursor to high-grade serous ovarian carcinoma. STIC can be challenging to detect during pathologist evaluation, as it can manifest as a small focus of atypia in an otherwise benign salpingectomy specimen. There is a clinical need for scalable, weakly supervised computational approaches to aid in the detection of STIC. We developed a deep learning model to identify STIC and serous tubal intraepithelial lesions (STIL) in whole-slide images. We obtained fallopian tube specimens diagnosed as STIC (<em>n</em> = 49), STIL (<em>n</em> = 48), and benign fallopian tube (<em>n</em> = 83) at a single academic medical center. We trained a weakly supervised, attention-based multiple instance learning model and evaluated performance on independent datasets, including an additional unbalanced dataset (<em>n</em> = 40 benign, <em>n</em> = 2 STIL, <em>n</em> = 1 STIC) and cases diagnosed descriptively as benign reactive atypia (<em>n</em> = 53). The model achieved high sensitivity and specificity on the balanced validation cohort, with an area under the receiver operating characteristic curve (AUROC) of 0.96 (95% CI: 0.90–1.00), and demonstrated similarly strong performance on unbalanced validation cohorts (AUROC 0.98). Interpretability analyses indicated that model decisions were based on epithelial atypia. These results support the potential of integrating deep learning screening tools into clinical workflows to augment pathologist efficiency and diagnostic accuracy in fallopian tubes.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100522"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145525750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The comparative pathology workbench: An update 比较病理学工作台:更新
Q2 Medicine Pub Date : 2025-11-01 DOI: 10.1016/j.jpi.2025.100523
Michael N. Wicks , Michael Glinka , Bill Hill , Derek Houghton , Bernard Haggarty , Jorge Del-Pozo , Ingrid Ferreira , Florian Jaeckle , David Adams , Shahida Din , Irene Papatheodorou , Kathryn Kirkwood , Albert Burger , Richard A. Baldock , Mark J. Arends
The Comparative Pathology Workbench (CPW) is a web-browser-based visual analytics platform providing shared access to an interactive “spreadsheet” style presentation of image data and associated analysis data. The software was developed to enable pathologists and other clinical and research users to compare histopathological images of diseased and/or normal tissues between different samples of the same or different patients/species. The CPW provides a grid layout of cells in rows and columns so that images that correspond to matching data can be organized in the form of an image-enabled “spreadsheet”. An individual workbench or bench can be shared with other users with read-only or full edit access as required. In addition, each bench cell or the whole bench itself has an associated discussion thread to allow collaborative analysis and consensual interpretation of the data. Here, we present the updated system based on 2 years of active use in the field that generated constructive feedback. The updates deliver new capabilities, including automated importation of entire image collections, sorting image collections, long running tasks, public benches, uploading miscellaneous image types, refining search facilities, enabling use of tags, and improving efficiency, speed, and user-friendliness.
比较病理学工作台(CPW)是一个基于web浏览器的可视化分析平台,提供对交互式“电子表格”风格的图像数据和相关分析数据的共享访问。开发该软件是为了使病理学家和其他临床和研究用户能够比较相同或不同患者/物种的不同样本的病变和/或正常组织的组织病理学图像。CPW提供了行和列单元格的网格布局,以便与匹配数据相对应的图像可以以支持图像的“电子表格”的形式进行组织。可以根据需要与具有只读或完全编辑访问权限的其他用户共享单个工作台或工作台。此外,每个工作台单元或整个工作台本身都有一个相关的讨论线程,以允许对数据进行协作分析和共识解释。在这里,我们根据在该领域2年的积极使用,提出了更新的系统,产生了建设性的反馈。这些更新提供了新的功能,包括整个图像集合的自动导入、图像集合的排序、长时间运行的任务、公共工作台、上传各种图像类型、优化搜索工具、启用标签的使用,以及提高效率、速度和用户友好性。
{"title":"The comparative pathology workbench: An update","authors":"Michael N. Wicks ,&nbsp;Michael Glinka ,&nbsp;Bill Hill ,&nbsp;Derek Houghton ,&nbsp;Bernard Haggarty ,&nbsp;Jorge Del-Pozo ,&nbsp;Ingrid Ferreira ,&nbsp;Florian Jaeckle ,&nbsp;David Adams ,&nbsp;Shahida Din ,&nbsp;Irene Papatheodorou ,&nbsp;Kathryn Kirkwood ,&nbsp;Albert Burger ,&nbsp;Richard A. Baldock ,&nbsp;Mark J. Arends","doi":"10.1016/j.jpi.2025.100523","DOIUrl":"10.1016/j.jpi.2025.100523","url":null,"abstract":"<div><div>The Comparative Pathology Workbench (CPW) is a web-browser-based visual analytics platform providing shared access to an interactive “spreadsheet” style presentation of image data and associated analysis data. The software was developed to enable pathologists and other clinical and research users to compare histopathological images of diseased and/or normal tissues between different samples of the same or different patients/species. The CPW provides a grid layout of cells in rows and columns so that images that correspond to matching data can be organized in the form of an image-enabled “spreadsheet”. An individual workbench or bench can be shared with other users with read-only or full edit access as required. In addition, each bench cell or the whole bench itself has an associated discussion thread to allow collaborative analysis and consensual interpretation of the data. Here, we present the updated system based on 2 years of active use in the field that generated constructive feedback. The updates deliver new capabilities, including automated importation of entire image collections, sorting image collections, long running tasks, public benches, uploading miscellaneous image types, refining search facilities, enabling use of tags, and improving efficiency, speed, and user-friendliness.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100523"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145525751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying partial pathological response rate in prostate cancer patients who underwent neoadjuvant chemotherapy using a novel morphometric approach 量化前列腺癌患者接受新辅助化疗的部分病理反应率使用一种新的形态计量方法
Q2 Medicine Pub Date : 2025-11-01 DOI: 10.1016/j.jpi.2025.100528
Wei Huang , Huihua Li , Philipos Tsourkas , Sean Mcilwain , Irene Ong , Christos E. Kyriakopoulos , Brian Johnson , Steve Y. Cho , Shane A. Wells , Alejandro Roldan Alzate , David F. Jarrard , Erika Heninger , Joshua M. Lang
Accurate assessment of partial pathological response rate (ppRR) to neoadjuvant chemotherapy (NAT) is critical for assessing the efficacy of therapy and for optimal clinical management. Because of a lack of accurate estimation of baseline cancer burden, assessment of ppRR has never been attempted in prostate histologically. We presented a novel morphometric approach assessing ppRR in patients who underwent NAT and then correlated the ppRR with patients' outcomes. A control cohort consisted of 39 NAT-naïve Caucasian patients who had high-risk PCa (defined as Gleason Grade Group >2) and an adequate biopsy sample (defined as the size of the biopsy PCa area, including PCa epithelium and stroma >2 mm2). A study cohort included 26 patients with high-risk PCa (defined as clinical stage T3a or higher, serum PSA >20 ng/mL, or GGG of 4–5, or with oligometastatic disease) who underwent androgen deprivation therapy plus docetaxel. Using the PCa epithelial to stromal ratio (E/S) as a metric, surrogate BCB for the study cohort was predicted from the pre-treatment biopsy samples, and ppRR was calculated. Correlation analysis of patients' ppRR with progression-free survival was performed using ppRR >80% as a cut-off.
Nine of the 26 patients from the study cohort experienced a significant response to NAT (ppRR > 80%) using the PCa E/S-based approach, and these patients had significantly better progression-free survival (p = 0.006). ppRR to NAT can be reliably assessed using PCa E/S as a surrogate metric from biopsy and RP samples, and ppRR can be used to predict patients' outcomes.
准确评估新辅助化疗(NAT)的部分病理反应率(ppRR)对于评估治疗效果和优化临床管理至关重要。由于缺乏对基线癌症负担的准确估计,从未尝试在前列腺组织学上评估ppRR。我们提出了一种新的形态计量学方法来评估接受NAT治疗的患者的ppRR,然后将ppRR与患者的预后联系起来。对照队列包括39例NAT-naïve高危PCa高加索患者(定义为Gleason分级组>;2)和足够的活检样本(定义为活检的PCa区域大小,包括PCa上皮和间质>;2 mm2)。研究队列包括26例高危PCa患者(定义为临床分期T3a或更高,血清PSA >;20 ng/mL,或GGG为4-5,或患有少转移性疾病),接受雄激素剥夺治疗加多西他赛。以前列腺癌上皮细胞与间质比率(E/S)为指标,从治疗前活检样本中预测研究队列的替代BCB,并计算ppRR。以ppRR >;80%为截止值,对患者ppRR与无进展生存期进行相关性分析。研究队列中26例患者中有9例使用基于PCa E/ s的方法对NAT有显著反应(ppRR >; 80%),这些患者的无进展生存期明显更好(p = 0.006)。使用PCa E/S作为活检和RP样本的替代指标,可以可靠地评估ppRR到NAT,并且ppRR可用于预测患者的预后。
{"title":"Quantifying partial pathological response rate in prostate cancer patients who underwent neoadjuvant chemotherapy using a novel morphometric approach","authors":"Wei Huang ,&nbsp;Huihua Li ,&nbsp;Philipos Tsourkas ,&nbsp;Sean Mcilwain ,&nbsp;Irene Ong ,&nbsp;Christos E. Kyriakopoulos ,&nbsp;Brian Johnson ,&nbsp;Steve Y. Cho ,&nbsp;Shane A. Wells ,&nbsp;Alejandro Roldan Alzate ,&nbsp;David F. Jarrard ,&nbsp;Erika Heninger ,&nbsp;Joshua M. Lang","doi":"10.1016/j.jpi.2025.100528","DOIUrl":"10.1016/j.jpi.2025.100528","url":null,"abstract":"<div><div>Accurate assessment of partial pathological response rate (ppRR) to neoadjuvant chemotherapy (NAT) is critical for assessing the efficacy of therapy and for optimal clinical management. Because of a lack of accurate estimation of baseline cancer burden, assessment of ppRR has never been attempted in prostate histologically. We presented a novel morphometric approach assessing ppRR in patients who underwent NAT and then correlated the ppRR with patients' outcomes. A control cohort consisted of 39 NAT-naïve Caucasian patients who had high-risk PCa (defined as Gleason Grade Group &gt;2) and an adequate biopsy sample (defined as the size of the biopsy PCa area, including PCa epithelium and stroma &gt;2 <sup>mm2</sup>). A study cohort included 26 patients with high-risk PCa (defined as clinical stage T3a or higher, serum PSA &gt;20 ng/mL, or GGG of 4–5, or with oligometastatic disease) who underwent androgen deprivation therapy plus docetaxel. Using the PCa epithelial to stromal ratio (E/S) as a metric, surrogate BCB for the study cohort was predicted from the pre-treatment biopsy samples, and ppRR was calculated. Correlation analysis of patients' ppRR with progression-free survival was performed using ppRR &gt;80% as a cut-off.</div><div>Nine of the 26 patients from the study cohort experienced a significant response to NAT (ppRR &gt; 80%) using the PCa E/S-based approach, and these patients had significantly better progression-free survival (<em>p</em> = 0.006). ppRR to NAT can be reliably assessed using PCa E/S as a surrogate metric from biopsy and RP samples, and ppRR can be used to predict patients' outcomes.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100528"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145690694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ki67 in cytological specimens of pancreatic neuroendocrine tumors: A literature review and validation of automated quantification 胰腺神经内分泌肿瘤细胞学标本中的Ki67:文献综述和自动定量验证
Q2 Medicine Pub Date : 2025-11-01 DOI: 10.1016/j.jpi.2025.100527
Sahar Narimani, Sophie Pirenne, Birgit Weynand

Introduction

The Ki67 proliferation index is mandatory for grading, prognostication, and clinical decision-making in pancreatic neuroendocrine tumors (PanNETs). Automatic Ki67 quantification on cytology has been shown to be at least as accurate, less time-consuming, and more consistent than the current gold-standard manual determination. After a thorough literature review, we aimed to validate the Visiopharm image analysis software for automatic Ki67 quantification on diagnostic cell block material from PanNETs.

Methods

We conducted a retrospective study and assembled a cohort of 69 PanNETs from clinical routine with available endoscopic ultrasound fine needle aspiration cell block, Ki67, and synaptophysin immunostained slides. The manual Ki67 index, if available, was obtained from the original pathology report. Otherwise, a manual count was performed by a pathologist using a cell counter. The automatic Ki67 index was quantified through four consecutive algorithms from the Visiopharm Image Analysis software on aligned serial sections.

Results

Automatic Ki67 quantification showed a strong correlation with manual counting based on the non-parametric Spearman correlation coefficients of r = 0.786 [95% confidence interval (CI): 0.650–0.873, p < 0.001] and r = 0.721 (95% CI: 0.558–0.830, p < 0.001]), for absolute Ki67 values and grades, respectively. Grade concordance showed excellent agreement for Grade 1 and Grade 3 tumors (91.89% and 83.3%) and rather moderate agreement for Grade 2 lesions (59.09%) due to underestimation. Bland–Altman analysis obtained excellent results, with a mean underestimation of digital versus manual quantification of 0.2265%.

Conclusion

Our findings show accurate assessment of the proliferation index from PanNETs using the Visiopharm software for digital Ki67 quantification and provide a prevalidation framework for the implementation of this technique in pathology practice. Discrepancies were mainly seen in Grade 2 tumors due to tumor heterogeneity of Grade 2 lesions. To this end, future research should seek refinement of the digital algorithms and examine the reliability of prognosis and clinical endpoints based on this technique.
Ki67增殖指数是胰腺神经内分泌肿瘤(PanNETs)分级、预后和临床决策的强制性指标。细胞学上的自动Ki67定量已被证明至少与目前的金标准手工测定一样准确,更少耗时,更一致。经过全面的文献综述,我们旨在验证Visiopharm图像分析软件对PanNETs诊断细胞块材料的Ki67自动定量。方法采用内镜超声细针穿刺细胞阻滞、Ki67和synaptophysin免疫染色玻片,对69例临床常规PanNETs进行回顾性研究。手工Ki67索引(如果有的话)是从原始病理报告中获得的。否则,由病理学家使用细胞计数器进行手动计数。自动Ki67指数通过Visiopharm图像分析软件在对齐的序列切片上连续四种算法进行量化。结果Ki67的绝对值和分级的非参数Spearman相关系数分别为r = 0.786[95%置信区间(CI): 0.650-0.873, p <; 0.001]和r = 0.721 (95% CI: 0.558-0.830, p < 0.001]),自动Ki67定量显示与人工计数有很强的相关性。分级一致性显示1级和3级肿瘤的一致性非常好(91.89%和83.3%),由于低估,2级病变的一致性相当中等(59.09%)。Bland-Altman分析获得了极好的结果,与人工量化相比,数字量化的平均低估率为0.2265%。结论使用Visiopharm软件可准确评估PanNETs的增殖指数,并为该技术在病理实践中的应用提供了预验证框架。由于2级病变的肿瘤异质性,差异主要见于2级肿瘤。为此,未来的研究应寻求数字算法的改进,并检查基于该技术的预后和临床终点的可靠性。
{"title":"Ki67 in cytological specimens of pancreatic neuroendocrine tumors: A literature review and validation of automated quantification","authors":"Sahar Narimani,&nbsp;Sophie Pirenne,&nbsp;Birgit Weynand","doi":"10.1016/j.jpi.2025.100527","DOIUrl":"10.1016/j.jpi.2025.100527","url":null,"abstract":"<div><h3>Introduction</h3><div>The Ki67 proliferation index is mandatory for grading, prognostication, and clinical decision-making in pancreatic neuroendocrine tumors (PanNETs). Automatic Ki67 quantification on cytology has been shown to be at least as accurate, less time-consuming, and more consistent than the current gold-standard manual determination. After a thorough literature review, we aimed to validate the Visiopharm image analysis software for automatic Ki67 quantification on diagnostic cell block material from PanNETs.</div></div><div><h3>Methods</h3><div>We conducted a retrospective study and assembled a cohort of 69 PanNETs from clinical routine with available endoscopic ultrasound fine needle aspiration cell block, Ki67, and synaptophysin immunostained slides. The manual Ki67 index, if available, was obtained from the original pathology report. Otherwise, a manual count was performed by a pathologist using a cell counter. The automatic Ki67 index was quantified through four consecutive algorithms from the Visiopharm Image Analysis software on aligned serial sections.</div></div><div><h3>Results</h3><div>Automatic Ki67 quantification showed a strong correlation with manual counting based on the non-parametric Spearman correlation coefficients of <em>r</em> <!-->=<!--> <!-->0.786 [95% confidence interval (CI): 0.650–0.873, <em>p</em> <!-->&lt;<!--> <!-->0.001] and <em>r</em> <!-->=<!--> <!-->0.721 (95% CI: 0.558–0.830, <em>p</em> <!-->&lt;<!--> <!-->0.001]<em>)</em>, for absolute Ki67 values and grades, respectively. Grade concordance showed excellent agreement for Grade 1 and Grade 3 tumors (91.89% and 83.3%) and rather moderate agreement for Grade 2 lesions (59.09%) due to underestimation. Bland–Altman analysis obtained excellent results, with a mean underestimation of digital versus manual quantification of 0.2265%.</div></div><div><h3>Conclusion</h3><div>Our findings show accurate assessment of the proliferation index from PanNETs using the Visiopharm software for digital Ki67 quantification and provide a prevalidation framework for the implementation of this technique in pathology practice. Discrepancies were mainly seen in Grade 2 tumors due to tumor heterogeneity of Grade 2 lesions. To this end, future research should seek refinement of the digital algorithms and examine the reliability of prognosis and clinical endpoints based on this technique.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100527"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145690583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging large language models for structured information extraction from pathology reports 利用大型语言模型从病理报告中提取结构化信息
Q2 Medicine Pub Date : 2025-11-01 DOI: 10.1016/j.jpi.2025.100521
Jeya Balaji Balasubramanian , Daniel Adams , Ioannis Roxanis , Amy Berrington de Gonzalez , Penny Coulson , Jonas S. Almeida , Montserrat García-Closas

Background

Structured information extraction from unstructured histopathology reports facilitates data accessibility for clinical research. Manual extraction by experts is time-consuming and expensive, limiting scalability. Large language models (LLMs) offer efficient automated extraction through zero-shot prompting, requiring only natural language instructions without labeled data or training. We evaluate LLMs' accuracy in extracting structured information from breast cancer histopathology reports, compared to manual extraction by a trained human annotator.

Methods

We developed the Medical Report Information Extractor, a web application leveraging LLMs for automated extraction. We also developed a gold-standard extraction dataset to evaluate the human annotator alongside five LLMs including GPT-4o, a leading proprietary model, and the Llama 3 model family, which allows self-hosting for data privacy. Our assessment involved 111 breast cancer histopathology reports from the Generations study, extracting 51 pathology features specified within the study's data dictionary.

Results

Evaluation against the gold-standard dataset showed that both Llama 3.1 405B (94.7% accuracy) and GPT-4o (96.1%) achieved extraction accuracy comparable to the human annotator (95.4%; p = 0.146 and p = 0.106, respectively). Whereas Llama 3.1 70B (91.6%) performed below human accuracy (p < 0.001), its reduced computational requirements make it a viable option for self-hosting.

Conclusion

We developed an open-source tool for structured information extraction that demonstrated expert human-level accuracy in our evaluation using state-of-the-art LLMs. The tool can be customized by non-programmers using natural language and the modular design enables reuse for diverse extraction tasks to produce standardized, structured data facilitating analytics through improved accessibility and interoperability.
从非结构化组织病理学报告中提取结构化信息有助于临床研究数据的可访问性。由专家手工提取既耗时又昂贵,限制了可扩展性。大型语言模型(llm)通过零采样提示提供高效的自动提取,只需要自然语言指令,而不需要标记数据或训练。我们评估了llm从乳腺癌组织病理学报告中提取结构化信息的准确性,与训练有素的人类注释者手动提取相比。方法我们开发了医学报告信息提取器,这是一个利用法学硕士进行自动提取的web应用程序。我们还开发了一个金标准提取数据集来评估人类注释器和五个llm,包括gpt - 40,一个领先的专有模型,以及Llama 3模型家族,它允许自托管数据隐私。我们的评估涉及了来自世代研究的111份乳腺癌组织病理学报告,提取了研究数据字典中指定的51个病理特征。结果对金标准数据集的评估表明,Llama 3.1 405B(准确率为94.7%)和gpt - 40(准确率为96.1%)的提取准确率与人类注释器相当(95.4%,p = 0.146和p = 0.106)。尽管Llama 3.1 70B(91.6%)的执行精度低于人类(p <; 0.001),但其减少的计算需求使其成为自托管的可行选择。我们开发了一个用于结构化信息提取的开源工具,在我们使用最先进的llm进行评估时展示了专家级的人类级别的准确性。该工具可以由非程序员使用自然语言定制,模块化设计可以重用不同的提取任务,通过改进的可访问性和互操作性来生成标准化、结构化的数据,从而促进分析。
{"title":"Leveraging large language models for structured information extraction from pathology reports","authors":"Jeya Balaji Balasubramanian ,&nbsp;Daniel Adams ,&nbsp;Ioannis Roxanis ,&nbsp;Amy Berrington de Gonzalez ,&nbsp;Penny Coulson ,&nbsp;Jonas S. Almeida ,&nbsp;Montserrat García-Closas","doi":"10.1016/j.jpi.2025.100521","DOIUrl":"10.1016/j.jpi.2025.100521","url":null,"abstract":"<div><h3>Background</h3><div>Structured information extraction from unstructured histopathology reports facilitates data accessibility for clinical research. Manual extraction by experts is time-consuming and expensive, limiting scalability. Large language models (LLMs) offer efficient automated extraction through zero-shot prompting, requiring only natural language instructions without labeled data or training. We evaluate LLMs' accuracy in extracting structured information from breast cancer histopathology reports, compared to manual extraction by a trained human annotator.</div></div><div><h3>Methods</h3><div>We developed the Medical Report Information Extractor, a web application leveraging LLMs for automated extraction. We also developed a gold-standard extraction dataset to evaluate the human annotator alongside five LLMs including GPT-4o, a leading proprietary model, and the Llama 3 model family, which allows self-hosting for data privacy. Our assessment involved 111 breast cancer histopathology reports from the Generations study, extracting 51 pathology features specified within the study's data dictionary.</div></div><div><h3>Results</h3><div>Evaluation against the gold-standard dataset showed that both Llama 3.1 405B (94.7% accuracy) and GPT-4o (96.1%) achieved extraction accuracy comparable to the human annotator (95.4%; <em>p</em> = 0.146 and <em>p</em> = 0.106, respectively). Whereas Llama 3.1 70B (91.6%) performed below human accuracy (<em>p</em> &lt; 0.001), its reduced computational requirements make it a viable option for self-hosting.</div></div><div><h3>Conclusion</h3><div>We developed an open-source tool for structured information extraction that demonstrated expert human-level accuracy in our evaluation using state-of-the-art LLMs. The tool can be customized by non-programmers using natural language and the modular design enables reuse for diverse extraction tasks to produce standardized, structured data facilitating analytics through improved accessibility and interoperability.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100521"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145578929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CONTEST: A generalization of ONEST to estimate sample size for predictive augmented intelligence method validation studies 竞赛:对预测增强智能方法验证研究估计样本大小的ONEST的推广
Q2 Medicine Pub Date : 2025-11-01 DOI: 10.1016/j.jpi.2025.100519
Benjamin K. Olson , Joseph H. Rosenthal , Ryan D. Kappedal , Niels H. Olson
Laboratories must verify and validate assays before reporting results in the clinical record. With the advent of machine learning algorithms, multiclass decision-support tools are coming online but the FDA explicitly does not contemplate multiclass problems in their guidance for test validation. Validation requires, for a laboratory's patient population, evaluation of four performance characteristics to a reference method: accuracy, precision, reportable range, and reference intervals. In the absence of a reference method, proportion of agreement is the appropriate metric (Meier 2007). For subjective tests, the traditional metrics for precision are in the area of interrater reliability, and interrater reliability is well studied in the pathology literature (“Gwet Handbook of Interrater Reliability 4th Ed.pdf,” n.d.). Recently, Guo and Han introduced an alternative framing, Observers Needed to Evaluate a Subjective Test (ONEST). This article introduces a treatment effect extension of ONEST, Cases and Observers Needed to Evaluate a Subjective Test (CONTEST) and demonstrates that the agreement and disagreement distributions can be reasonably specified with parametric probability distributions such that the required sample size for a test, at a given level and power, can be calculated. We argue that this would be an appropriate method to develop for validation of tools used to augment a subjective test, given a prior set of cases, observers, and decisions, such as from another archive, cohort, or dataset, particularly in resource-constrained settings.
实验室必须在临床记录中报告结果之前验证和验证分析。随着机器学习算法的出现,多类别决策支持工具即将上线,但FDA在其测试验证指南中明确没有考虑多类别问题。对于实验室的患者群体,验证需要对参考方法的四个性能特征进行评估:准确性、精密度、可报告范围和参考区间。在没有参考方法的情况下,一致性比例是合适的度量(Meier 2007)。对于主观测试,传统的精度度量是在互测信度领域,而互测信度在病理学文献中得到了很好的研究(“Gwet互测信度手册第4版pdf,”n.d)。最近,郭和韩提出了另一种框架,即观察者需要评估主观测试(ONEST)。本文介绍了ONEST,评估主观测试所需的案例和观察者(CONTEST)的治疗效果扩展,并证明了一致性和不一致性分布可以用参数概率分布合理地指定,从而可以计算出在给定水平和功率下测试所需的样本量。我们认为,这将是一种适当的方法,用于开发用于增强主观测试的工具的验证,给定一组先前的案例,观察者和决策,例如来自另一个档案,队列或数据集,特别是在资源受限的环境中。
{"title":"CONTEST: A generalization of ONEST to estimate sample size for predictive augmented intelligence method validation studies","authors":"Benjamin K. Olson ,&nbsp;Joseph H. Rosenthal ,&nbsp;Ryan D. Kappedal ,&nbsp;Niels H. Olson","doi":"10.1016/j.jpi.2025.100519","DOIUrl":"10.1016/j.jpi.2025.100519","url":null,"abstract":"<div><div>Laboratories must verify and validate assays before reporting results in the clinical record. With the advent of machine learning algorithms, multiclass decision-support tools are coming online but the FDA explicitly does not contemplate multiclass problems in their guidance for test validation. Validation requires, for a laboratory's patient population, evaluation of four performance characteristics to a reference method: accuracy, precision, reportable range, and reference intervals. In the absence of a reference method, proportion of agreement is the appropriate metric (Meier 2007). For subjective tests, the traditional metrics for precision are in the area of interrater reliability, and interrater reliability is well studied in the pathology literature (“Gwet Handbook of Interrater Reliability 4th Ed.pdf,” n.d.). Recently, Guo and Han introduced an alternative framing, Observers Needed to Evaluate a Subjective Test (ONEST). This article introduces a treatment effect extension of ONEST, Cases and Observers Needed to Evaluate a Subjective Test (CONTEST) and demonstrates that the agreement and disagreement distributions can be reasonably specified with parametric probability distributions such that the required sample size for a test, at a given level and power, can be calculated. We argue that this would be an appropriate method to develop for validation of tools used to augment a subjective test, given a prior set of cases, observers, and decisions, such as from another archive, cohort, or dataset, particularly in resource-constrained settings.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100519"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145465588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retrieval-augmented generation for interpreting clinical laboratory regulations using large language models 使用大型语言模型解释临床实验室规则的检索增强生成
Q2 Medicine Pub Date : 2025-11-01 DOI: 10.1016/j.jpi.2025.100520
Suparna Nanua , Raven Steward , Benjamin Neely , Michael Datto , Kenneth Youens
Large language models (LLMs) have demonstrated strong performance on general knowledge tasks, but they have important limitations as standalone tools for question answering in specialized domains where accuracy and consistency are critical. Retrieval-augmented generation (RAG) is a strategy in which LLM outputs are grounded in dynamically retrieved source documents, offering advantages in accuracy, explainability, and maintainability. We developed and evaluated a custom RAG system called Raven, designed to answer laboratory regulatory questions using the part of the Code of Federal Regulations (CFR) pertaining to laboratory (42 CFR Part 493) as an authoritative source. Raven employed a vector search pipeline and a LLM to generate grounded responses via a chatbot–style interface. The system was tested using 103 synthetic laboratory regulatory questions, 88 of which were explicitly addressed in the CFR. Compared to answers generated manually by a board-certified pathologist, Raven's responses were judged to be totally complete and correct in 92.0% of those 88 cases, with little irrelevant content and a low potential for regulatory or medical error. Performance declined significantly on questions not addressed in the CFR, confirming the system's grounding in the source documents. Most suboptimal responses were attributable to faulty source document retrieval rather than model hallucination or misinterpretation. These findings demonstrate that a basic RAG system can produce useful, accurate, and verifiable answers to complex regulatory questions. With appropriate safeguards and with thoughtful integration into user workflows, tools like Raven may serve as valuable decision-support systems in laboratory medicine and other knowledge-intensive healthcare domains.
大型语言模型(llm)已经在一般知识任务上展示了强大的性能,但是它们作为在准确性和一致性至关重要的专业领域的问题回答的独立工具有重要的局限性。检索增强生成(RAG)是一种策略,其中LLM输出以动态检索的源文档为基础,在准确性、可解释性和可维护性方面具有优势。我们开发并评估了一个名为Raven的定制RAG系统,该系统旨在使用联邦法规(CFR)有关实验室的部分(42 CFR part 493)作为权威来源来回答实验室监管问题。Raven采用矢量搜索管道和LLM,通过聊天机器人风格的界面生成接地响应。该系统使用103个合成实验室监管问题进行了测试,其中88个在CFR中明确解决。与经过专业认证的病理学家手动生成的答案相比,在这88个病例中,Raven的回答有92.0%被认为是完全完整和正确的,几乎没有不相关的内容,出现监管或医疗错误的可能性也很低。在CFR中未解决的问题上,性能显著下降,证实了源文件中系统的接地。大多数次优反应可归因于错误的源文件检索,而不是模型幻觉或误解。这些发现表明,一个基本的RAG系统可以为复杂的监管问题提供有用、准确和可验证的答案。通过适当的保护措施和周到地集成到用户工作流中,Raven等工具可以作为实验室医学和其他知识密集型医疗保健领域中有价值的决策支持系统。
{"title":"Retrieval-augmented generation for interpreting clinical laboratory regulations using large language models","authors":"Suparna Nanua ,&nbsp;Raven Steward ,&nbsp;Benjamin Neely ,&nbsp;Michael Datto ,&nbsp;Kenneth Youens","doi":"10.1016/j.jpi.2025.100520","DOIUrl":"10.1016/j.jpi.2025.100520","url":null,"abstract":"<div><div>Large language models (LLMs) have demonstrated strong performance on general knowledge tasks, but they have important limitations as standalone tools for question answering in specialized domains where accuracy and consistency are critical. Retrieval-augmented generation (RAG) is a strategy in which LLM outputs are grounded in dynamically retrieved source documents, offering advantages in accuracy, explainability, and maintainability. We developed and evaluated a custom RAG system called Raven, designed to answer laboratory regulatory questions using the part of the Code of Federal Regulations (CFR) pertaining to laboratory (42 CFR Part 493) as an authoritative source. Raven employed a vector search pipeline and a LLM to generate grounded responses via a chatbot–style interface. The system was tested using 103 synthetic laboratory regulatory questions, 88 of which were explicitly addressed in the CFR. Compared to answers generated manually by a board-certified pathologist, Raven's responses were judged to be totally complete and correct in 92.0% of those 88 cases, with little irrelevant content and a low potential for regulatory or medical error. Performance declined significantly on questions not addressed in the CFR, confirming the system's grounding in the source documents. Most suboptimal responses were attributable to faulty source document retrieval rather than model hallucination or misinterpretation. These findings demonstrate that a basic RAG system can produce useful, accurate, and verifiable answers to complex regulatory questions. With appropriate safeguards and with thoughtful integration into user workflows, tools like Raven may serve as valuable decision-support systems in laboratory medicine and other knowledge-intensive healthcare domains.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"19 ","pages":"Article 100520"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145415716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Pathology Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1