首页 > 最新文献

Journal of Pathology Informatics最新文献

英文 中文
Attention induction based on pathologist annotations for improving whole slide pathology image classifier
Q2 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.jpi.2024.100413
Ryoichi Koga , Tatsuya Yokota , Koji Arihiro , Hidekata Hontani
We propose a method of attention induction to improve an attention mechanism in a whole slide image (WSI) classifier. Generally, only some regions in a WSI are useful for lesion classification, and the WSI classifier is required to find and focus on such regions for the classification. Multiple instance learning and hierarchical representation learning are widely employed for WSI processing and both use attention mechanisms to automatically find the useful regions and then conduct the class prediction. Here, it is impractical to collect a large number of WSIs, and when the attention mechanism is trained with a small number of training WSIs, the resultant attention often fails to focus on the useful regions. To improve the attention mechanism without increasing the number of training WSIs, we propose a method of attention induction for a hierarchical representation of WSI that guides attention to focus on the regions useful for lesion classification based on pathologist's coarse annotations. Our experimental results demonstrate that the proposed method improves the attention mechanism, thereby enhancing the performance of WSI classification.
{"title":"Attention induction based on pathologist annotations for improving whole slide pathology image classifier","authors":"Ryoichi Koga ,&nbsp;Tatsuya Yokota ,&nbsp;Koji Arihiro ,&nbsp;Hidekata Hontani","doi":"10.1016/j.jpi.2024.100413","DOIUrl":"10.1016/j.jpi.2024.100413","url":null,"abstract":"<div><div>We propose a method of <em>attention induction</em> to improve an attention mechanism in a whole slide image (WSI) classifier. Generally, only some regions in a WSI are useful for lesion classification, and the WSI classifier is required to find and focus on such regions for the classification. Multiple instance learning and hierarchical representation learning are widely employed for WSI processing and both use attention mechanisms to automatically find the useful regions and then conduct the class prediction. Here, it is impractical to collect a large number of WSIs, and when the attention mechanism is trained with a small number of training WSIs, the resultant attention often fails to focus on the useful regions. To improve the attention mechanism without increasing the number of training WSIs, we propose a method of attention induction for a hierarchical representation of WSI that guides attention to focus on the regions useful for lesion classification based on pathologist's coarse annotations. Our experimental results demonstrate that the proposed method improves the attention mechanism, thereby enhancing the performance of WSI classification.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"16 ","pages":"Article 100413"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11750489/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancements in pathology: Digital transformation, precision medicine, and beyond
Q2 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.jpi.2024.100408
Sana Ahuja, Sufian Zaheer
Pathology, a cornerstone of medical diagnostics and research, is undergoing a revolutionary transformation fueled by digital technology, molecular biology advancements, and big data analytics. Digital pathology converts conventional glass slides into high-resolution digital images, enhancing collaboration and efficiency among pathologists worldwide. Integrating artificial intelligence (AI) and machine learning (ML) algorithms with digital pathology improves diagnostic accuracy, particularly in complex diseases like cancer. Molecular pathology, facilitated by next-generation sequencing (NGS), provides comprehensive genomic, transcriptomic, and proteomic insights into disease mechanisms, guiding personalized therapies. Immunohistochemistry (IHC) plays a pivotal role in biomarker discovery, refining disease classification and prognostication. Precision medicine integrates pathology's molecular findings with individual genetic, environmental, and lifestyle factors to customize treatment strategies, optimizing patient outcomes. Telepathology extends diagnostic services to underserved areas through remote digital pathology. Pathomics leverages big data analytics to extract meaningful insights from pathology images, advancing our understanding of disease pathology and therapeutic targets. Virtual autopsies employ non-invasive imaging technologies to revolutionize forensic pathology. These innovations promise earlier diagnoses, tailored treatments, and enhanced patient care. Collaboration across disciplines is essential to fully realize the transformative potential of these advancements in medical practice and research.
{"title":"Advancements in pathology: Digital transformation, precision medicine, and beyond","authors":"Sana Ahuja,&nbsp;Sufian Zaheer","doi":"10.1016/j.jpi.2024.100408","DOIUrl":"10.1016/j.jpi.2024.100408","url":null,"abstract":"<div><div>Pathology, a cornerstone of medical diagnostics and research, is undergoing a revolutionary transformation fueled by digital technology, molecular biology advancements, and big data analytics. Digital pathology converts conventional glass slides into high-resolution digital images, enhancing collaboration and efficiency among pathologists worldwide. Integrating artificial intelligence (AI) and machine learning (ML) algorithms with digital pathology improves diagnostic accuracy, particularly in complex diseases like cancer. Molecular pathology, facilitated by next-generation sequencing (NGS), provides comprehensive genomic, transcriptomic, and proteomic insights into disease mechanisms, guiding personalized therapies. Immunohistochemistry (IHC) plays a pivotal role in biomarker discovery, refining disease classification and prognostication. Precision medicine integrates pathology's molecular findings with individual genetic, environmental, and lifestyle factors to customize treatment strategies, optimizing patient outcomes. Telepathology extends diagnostic services to underserved areas through remote digital pathology. Pathomics leverages big data analytics to extract meaningful insights from pathology images, advancing our understanding of disease pathology and therapeutic targets. Virtual autopsies employ non-invasive imaging technologies to revolutionize forensic pathology. These innovations promise earlier diagnoses, tailored treatments, and enhanced patient care. Collaboration across disciplines is essential to fully realize the transformative potential of these advancements in medical practice and research.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"16 ","pages":"Article 100408"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations 优先考虑来自多机构队列的病理学家注释数据集的病例。
Q2 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.jpi.2024.100411
Victor Garcia , Emma Gardecki , Stephanie Jou , Xiaoxian Li , Kenneth R. Shroyer , Joel Saltz , Balazs Acs , Katherine Elfer , Jochen Lennerz , Roberto Salgado , Brandon D. Gallas

Objective

With the increasing energy surrounding the development of artificial intelligence and machine learning (AI/ML) models, the use of the same external validation dataset by various developers allows for a direct comparison of model performance. Through our High Throughput Truthing project, we are creating a validation dataset for AI/ML models trained in the assessment of stromal tumor-infiltrating lymphocytes (sTILs) in triple negative breast cancer (TNBC).

Materials and methods

We obtained clinical metadata for hematoxylin and eosin-stained glass slides and corresponding scanned whole slide images (WSIs) of TNBC core biopsies from two US academic medical centers. We selected regions of interest (ROIs) from the WSIs to target regions with various tissue morphologies and sTILs densities. Given the selected ROIs, we implemented a hierarchical rank-sort method for case prioritization.

Results

We received 122 glass slides and clinical metadata on 105 unique patients with TNBC. All received cases were female, and the mean age was 63.44 years. 60% of all cases were White patients, and 38.1% were Black or African American. After case prioritization, the skewness of the sTILs density distribution improved from 0.60 to 0.46 with a corresponding increase in the entropy of the sTILs density bins from 1.20 to 1.24. We retained cases with less prevalent metadata elements.

Conclusion

This method allows us to prioritize underrepresented subgroups based on important clinical factors. In this manuscript, we discuss how we sourced the clinical metadata, selected ROIs, and developed our approach to prioritizing cases for inclusion in our pivotal study.
随着人工智能和机器学习(AI/ML)模型开发的能量不断增加,不同开发人员使用相同的外部验证数据集可以直接比较模型性能。通过我们的高通量真相项目,我们正在为三阴性乳腺癌(TNBC)中基质肿瘤浸润淋巴细胞(stil)评估训练的AI/ML模型创建验证数据集。材料和方法:我们获得了来自美国两个学术医疗中心的苏木精和伊红染色玻片以及相应的TNBC核心活检扫描全片图像(WSIs)的临床元数据。我们从wsi中选择感兴趣区域(roi)到具有不同组织形态和stil密度的目标区域。给定所选的roi,我们实现了案例优先级的分层排序方法。结果:我们收到105例独特的TNBC患者的122张玻片和临床数据。所有病例均为女性,平均年龄63.44 岁。60%的病例为白人,38.1%为黑人或非裔美国人。经过病例优先排序后,stil密度分布的偏度由0.60提高到0.46,stil密度箱的熵由1.20提高到1.24。我们保留了不太流行的元数据元素的情况。结论:该方法允许我们根据重要的临床因素优先考虑代表性不足的亚群。在这篇文章中,我们讨论了我们如何获取临床元数据,选择roi,并制定了我们的方法来优先考虑纳入我们关键研究的病例。
{"title":"Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations","authors":"Victor Garcia ,&nbsp;Emma Gardecki ,&nbsp;Stephanie Jou ,&nbsp;Xiaoxian Li ,&nbsp;Kenneth R. Shroyer ,&nbsp;Joel Saltz ,&nbsp;Balazs Acs ,&nbsp;Katherine Elfer ,&nbsp;Jochen Lennerz ,&nbsp;Roberto Salgado ,&nbsp;Brandon D. Gallas","doi":"10.1016/j.jpi.2024.100411","DOIUrl":"10.1016/j.jpi.2024.100411","url":null,"abstract":"<div><h3>Objective</h3><div>With the increasing energy surrounding the development of artificial intelligence and machine learning (AI/ML) models, the use of the same external validation dataset by various developers allows for a direct comparison of model performance. Through our High Throughput Truthing project, we are creating a validation dataset for AI/ML models trained in the assessment of stromal tumor-infiltrating lymphocytes (sTILs) in triple negative breast cancer (TNBC).</div></div><div><h3>Materials and methods</h3><div>We obtained clinical metadata for hematoxylin and eosin-stained glass slides and corresponding scanned whole slide images (WSIs) of TNBC core biopsies from two US academic medical centers. We selected regions of interest (ROIs) from the WSIs to target regions with various tissue morphologies and sTILs densities. Given the selected ROIs, we implemented a hierarchical rank-sort method for case prioritization.</div></div><div><h3>Results</h3><div>We received 122 glass slides and clinical metadata on 105 unique patients with TNBC. All received cases were female, and the mean age was 63.44 years. 60% of all cases were White patients, and 38.1% were Black or African American. After case prioritization, the skewness of the sTILs density distribution improved from 0.60 to 0.46 with a corresponding increase in the entropy of the sTILs density bins from 1.20 to 1.24. We retained cases with less prevalent metadata elements.</div></div><div><h3>Conclusion</h3><div>This method allows us to prioritize underrepresented subgroups based on important clinical factors. In this manuscript, we discuss how we sourced the clinical metadata, selected ROIs, and developed our approach to prioritizing cases for inclusion in our pivotal study.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"16 ","pages":"Article 100411"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11667696/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142886209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Liver fibrosis classification on trichrome histology slides using weakly supervised learning in children and young adults
Q2 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.jpi.2024.100416
Mahdieh Shabanian , Zachary Taylor , Christopher Woods , Anas Bernieh , Jonathan Dillman , Lili He , Sarangarajan Ranganathan , Jennifer Picarsic , Elanchezhian Somasundaram

Background

Traditional liver fibrosis staging via percutaneous biopsy suffers from sampling bias and variable inter-pathologist agreement, highlighting the need for more objective techniques. Deep learning models for disease staging from medical images have shown potential to decrease diagnostic variability, with recent weakly supervised learning strategies showing promising results even with limited manual annotation.

Purpose

To study the clustering-constrained attention multiple instance learning (CLAM) approach for staging liver fibrosis on trichrome whole slide images (WSIs) of children and young adults.

Methods

This is an ethics board approved retrospective study utilizing 217 trichrome WSI from pediatric liver biopsies for model development and testing. Two pediatric pathologists scored WSI using two liver fibrosis staging systems, METAVIR and Ishak. Cases were then secondarily categorized into either high- or low-stage liver fibrosis and used for model development. The CLAM pipeline was used to develop binary classification models for histological liver fibrosis. Model performance was evaluated using area under the curve (AUC), accuracy, sensitivity, specificity, and Cohen's Kappa.

Results

The CLAM models showed strong diagnostic performance, with sensitivities up to 0.76 and AUCs up to 0.92 for distinguishing low- and high-stage fibrosis. The agreement between model predictions and average pathologist scores was moderate to substantial (Kappa: 0.57–0.69), whereas pathologist agreement on the METAVIR and Ishak scoring systems was only fair (Kappa: 0.39–0.46).

Conclusions

CLAM pipeline showed promise in detecting features important for differentiating low- and high-stage fibrosis from trichrome WSI based on the results, offering a promising objective method for liver fibrosis detection in children and young adults.
{"title":"Liver fibrosis classification on trichrome histology slides using weakly supervised learning in children and young adults","authors":"Mahdieh Shabanian ,&nbsp;Zachary Taylor ,&nbsp;Christopher Woods ,&nbsp;Anas Bernieh ,&nbsp;Jonathan Dillman ,&nbsp;Lili He ,&nbsp;Sarangarajan Ranganathan ,&nbsp;Jennifer Picarsic ,&nbsp;Elanchezhian Somasundaram","doi":"10.1016/j.jpi.2024.100416","DOIUrl":"10.1016/j.jpi.2024.100416","url":null,"abstract":"<div><h3>Background</h3><div>Traditional liver fibrosis staging via percutaneous biopsy suffers from sampling bias and variable inter-pathologist agreement, highlighting the need for more objective techniques. Deep learning models for disease staging from medical images have shown potential to decrease diagnostic variability, with recent weakly supervised learning strategies showing promising results even with limited manual annotation.</div></div><div><h3>Purpose</h3><div>To study the clustering-constrained attention multiple instance learning (CLAM) approach for staging liver fibrosis on trichrome whole slide images (WSIs) of children and young adults.</div></div><div><h3>Methods</h3><div>This is an ethics board approved retrospective study utilizing 217 trichrome WSI from pediatric liver biopsies for model development and testing. Two pediatric pathologists scored WSI using two liver fibrosis staging systems, METAVIR and Ishak. Cases were then secondarily categorized into either high- or low-stage liver fibrosis and used for model development. The CLAM pipeline was used to develop binary classification models for histological liver fibrosis. Model performance was evaluated using area under the curve (AUC), accuracy, sensitivity, specificity, and Cohen's Kappa.</div></div><div><h3>Results</h3><div>The CLAM models showed strong diagnostic performance, with sensitivities up to 0.76 and AUCs up to 0.92 for distinguishing low- and high-stage fibrosis. The agreement between model predictions and average pathologist scores was moderate to substantial (Kappa: 0.57–0.69), whereas pathologist agreement on the METAVIR and Ishak scoring systems was only fair (Kappa: 0.39–0.46).</div></div><div><h3>Conclusions</h3><div>CLAM pipeline showed promise in detecting features important for differentiating low- and high-stage fibrosis from trichrome WSI based on the results, offering a promising objective method for liver fibrosis detection in children and young adults.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"16 ","pages":"Article 100416"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11760786/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging pre-trained machine learning models for islet quantification in type 1 diabetes 利用预训练的机器学习模型对1型糖尿病的胰岛进行量化。
Q2 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.jpi.2024.100406
Sanghoon Kang , Jesus D. Penaloza Aponte , Omar Elashkar , Juan Francisco Morales , Nicholas Waddington , Damon G. Lamb , Huiwen Ju , Martha Campbell-Thompson , Sarah Kim
Human islets display a high degree of heterogeneity in terms of size, number, architecture, and endocrine cell-type compositions. An ever-increasing number of immunohistochemistry-stained whole slide images (WSIs) are available through the online pathology database of the Network for Pancreatic Organ donors with Diabetes (nPOD) program at the University of Florida (UF). We aimed to develop an enhanced machine learning-assisted WSI analysis workflow to utilize the nPOD resource for analysis of endocrine cell heterogeneity in the natural history of type 1 diabetes (T1D) in comparison to donors without diabetes. To maximize usability, the user-friendly open-source software QuPath was selected for the main interface. The WSI data were analyzed with two pre-trained machine learning models (i.e., Segment Anything Model (SAM) and QuPath's pixel classifier), using the UF high-performance-computing cluster, HiPerGator. SAM was used to define precise endocrine cell and cell grouping boundaries (with an average quality score of 0.91 per slide), and the artificial neural network-based pixel classifier was applied to segment areas of insulin- or glucagon-stained cytoplasmic regions within each endocrine cell. An additional script was developed to automatically count CD3+ cells inside and within 20 μm of each islet perimeter to quantify the number of islets with inflammation (i.e., CD3+ T-cell infiltration). Proof-of-concept analysis was performed to test the developed workflow in 12 subjects using 24 slides. This open-source machine learning-assisted workflow enables rapid and high throughput determinations of endocrine cells, whether as single cells or within groups, across hundreds of slides. It is expected that the use of this workflow will accelerate our understanding of endocrine cell and islet heterogeneity in the context of T1D endotypes and pathogenesis.
人类胰岛在大小、数量、结构和内分泌细胞类型组成方面表现出高度的异质性。越来越多的免疫组织化学染色的全切片图像(WSIs)可以通过佛罗里达大学(UF)的胰腺器官供体网络(nPOD)项目的在线病理数据库获得。我们的目标是开发一种增强的机器学习辅助WSI分析工作流程,利用nPOD资源分析1型糖尿病(T1D)自然史中与非糖尿病供者相比的内分泌细胞异质性。为了最大限度地提高可用性,选择了用户友好的开源软件QuPath作为主界面。使用UF高性能计算集群HiPerGator,使用两个预训练的机器学习模型(即Segment Anything Model (SAM)和QuPath的像素分类器)分析WSI数据。使用SAM定义精确的内分泌细胞和细胞分组边界(每张幻灯片的平均质量分数为0.91),并将基于人工神经网络的像素分类器应用于每个内分泌细胞内胰岛素或胰高血糖素染色的细胞质区域的分割区域。另外还开发了一个脚本,用于自动计数每个胰岛周长20 μm内的CD3+细胞,以量化炎症(即CD3+ t细胞浸润)的胰岛数量。使用24张幻灯片对12名受试者进行了概念验证分析,以测试开发的工作流。这个开源的机器学习辅助工作流程能够快速和高通量地确定内分泌细胞,无论是单个细胞还是组内,跨越数百张幻灯片。预计该工作流程的使用将加速我们对T1D内型和发病机制背景下内分泌细胞和胰岛异质性的理解。
{"title":"Leveraging pre-trained machine learning models for islet quantification in type 1 diabetes","authors":"Sanghoon Kang ,&nbsp;Jesus D. Penaloza Aponte ,&nbsp;Omar Elashkar ,&nbsp;Juan Francisco Morales ,&nbsp;Nicholas Waddington ,&nbsp;Damon G. Lamb ,&nbsp;Huiwen Ju ,&nbsp;Martha Campbell-Thompson ,&nbsp;Sarah Kim","doi":"10.1016/j.jpi.2024.100406","DOIUrl":"10.1016/j.jpi.2024.100406","url":null,"abstract":"<div><div>Human islets display a high degree of heterogeneity in terms of size, number, architecture, and endocrine cell-type compositions. An ever-increasing number of immunohistochemistry-stained whole slide images (WSIs) are available through the online pathology database of the Network for Pancreatic Organ donors with Diabetes (nPOD) program at the University of Florida (UF). We aimed to develop an enhanced machine learning-assisted WSI analysis workflow to utilize the nPOD resource for analysis of endocrine cell heterogeneity in the natural history of type 1 diabetes (T1D) in comparison to donors without diabetes. To maximize usability, the user-friendly open-source software QuPath was selected for the main interface. The WSI data were analyzed with two pre-trained machine learning models (i.e., Segment Anything Model (SAM) and QuPath's pixel classifier), using the UF high-performance-computing cluster, HiPerGator. SAM was used to define precise endocrine cell and cell grouping boundaries (with an average quality score of 0.91 per slide), and the artificial neural network-based pixel classifier was applied to segment areas of insulin- or glucagon-stained cytoplasmic regions within each endocrine cell. An additional script was developed to automatically count CD3+ cells inside and within 20 μm of each islet perimeter to quantify the number of islets with inflammation (i.e., CD3+ T-cell infiltration). Proof-of-concept analysis was performed to test the developed workflow in 12 subjects using 24 slides. This open-source machine learning-assisted workflow enables rapid and high throughput determinations of endocrine cells, whether as single cells or within groups, across hundreds of slides. It is expected that the use of this workflow will accelerate our understanding of endocrine cell and islet heterogeneity in the context of T1D endotypes and pathogenesis.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"16 ","pages":"Article 100406"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11665367/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142886207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning-based classification of breast cancer molecular subtypes from H&E whole-slide images 基于深度学习的H&E全片图像乳腺癌分子亚型分类。
Q2 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.jpi.2024.100410
Masoud Tafavvoghi , Anders Sildnes , Mehrdad Rakaee , Nikita Shvetsov , Lars Ailo Bongo , Lill-Tove Rasmussen Busund , Kajsa Møllersen
Classifying breast cancer molecular subtypes is crucial for tailoring treatment strategies. While immunohistochemistry (IHC) and gene expression profiling are standard methods for molecular subtyping, IHC can be subjective, and gene profiling is costly and not widely accessible in many regions. Previous approaches have highlighted the potential application of deep learning models on hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) for molecular subtyping, but these efforts vary in their methods, datasets, and reported performance. In this work, we investigated whether H&E-stained WSIs could be solely leveraged to predict breast cancer molecular subtypes (luminal A, B, HER2-enriched, and Basal). We used 1433 WSIs of breast cancer in a two-step pipeline: first, classifying tumor and non-tumor tiles to use only the tumor regions for molecular subtyping; and second, employing a One-vs-Rest (OvR) strategy to train four binary OvR classifiers and aggregating their results using an eXtreme Gradient Boosting model. The pipeline was tested on 221 hold-out WSIs, achieving an F1 score of 0.95 for tumor vs non-tumor classification and a macro F1 score of 0.73 for molecular subtyping. Our findings suggest that, with further validation, supervised deep learning models could serve as supportive tools for molecular subtyping in breast cancer. Our codes are made available to facilitate ongoing research and development.
对乳腺癌分子亚型进行分类对于制定治疗策略至关重要。虽然免疫组织化学(IHC)和基因表达谱是分子分型的标准方法,但IHC可能是主观的,而且基因谱昂贵,而且在许多地区无法广泛获得。先前的方法强调了深度学习模型在苏木精和伊红(H&E)染色的全片图像(wsi)上用于分子亚型的潜在应用,但这些努力在方法、数据集和报告的性能方面各不相同。在这项工作中,我们研究了h&e染色的wsi是否可以单独用于预测乳腺癌的分子亚型(管腔A型、B型、her2富集型和基底型)。我们使用了1433例乳腺癌WSIs,分为两步:首先,对肿瘤和非肿瘤瓦片进行分类,仅使用肿瘤区域进行分子分型;其次,采用One-vs-Rest (OvR)策略训练4个二元OvR分类器,并使用极端梯度增强模型对其结果进行聚合。该管道在221例hold-out wsi中进行了测试,肿瘤与非肿瘤分类的F1得分为0.95,分子分型的宏观F1得分为0.73。我们的研究结果表明,经过进一步验证,监督深度学习模型可以作为乳腺癌分子分型的辅助工具。提供我们的代码是为了促进正在进行的研究和开发。
{"title":"Deep learning-based classification of breast cancer molecular subtypes from H&E whole-slide images","authors":"Masoud Tafavvoghi ,&nbsp;Anders Sildnes ,&nbsp;Mehrdad Rakaee ,&nbsp;Nikita Shvetsov ,&nbsp;Lars Ailo Bongo ,&nbsp;Lill-Tove Rasmussen Busund ,&nbsp;Kajsa Møllersen","doi":"10.1016/j.jpi.2024.100410","DOIUrl":"10.1016/j.jpi.2024.100410","url":null,"abstract":"<div><div>Classifying breast cancer molecular subtypes is crucial for tailoring treatment strategies. While immunohistochemistry (IHC) and gene expression profiling are standard methods for molecular subtyping, IHC can be subjective, and gene profiling is costly and not widely accessible in many regions. Previous approaches have highlighted the potential application of deep learning models on hematoxylin and eosin (H&amp;E)-stained whole-slide images (WSIs) for molecular subtyping, but these efforts vary in their methods, datasets, and reported performance. In this work, we investigated whether H&amp;E-stained WSIs could be solely leveraged to predict breast cancer molecular subtypes (luminal A, B, HER2-enriched, and Basal). We used 1433 WSIs of breast cancer in a two-step pipeline: first, classifying tumor and non-tumor tiles to use only the tumor regions for molecular subtyping; and second, employing a One-vs-Rest (OvR) strategy to train four binary OvR classifiers and aggregating their results using an eXtreme Gradient Boosting model. The pipeline was tested on 221 hold-out WSIs, achieving an F1 score of 0.95 for tumor vs non-tumor classification and a macro F1 score of 0.73 for molecular subtyping. Our findings suggest that, with further validation, supervised deep learning models could serve as supportive tools for molecular subtyping in breast cancer. Our codes are made available to facilitate ongoing research and development.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"16 ","pages":"Article 100410"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11667687/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142886203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Economic evaluation: Impact on costs, time, and productivity of the incorporation of integrative digital pathology (IDP) in the anatomopathological analysis of breast cancer in a national reference public provider in Chile
Q2 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.jpi.2024.100417
Rony Lenz-Alcayaga , Daniela Paredes-Fernández , Fancy Gaete Verdejo , Luciano Páez-Pizarro , Karla Hernández-Sánchez

Introduction

The incidence of breast cancer has risen in Chile, along with the complexity of diagnosis. For accurate diagnosis, it is necessary to complement the morphology assessed with hematoxylin and eosin with additional techniques to evaluate specific tumor markers. Evaluating the impact on costs, time, and productivity of automated techniques integrated with digital pathology solutions is crucial.

Objectives

To estimate the impact on costs, time, and productivity of incorporating the automation of the HER2 in situ hybridization technique combined with integrative digital pathology (IDP) in breast cancer diagnosis in a Chilean public provider versus a manual technique.

Methods

This economic evaluation adopted a health economics multi-method approach. A decision model was developed to represent the current manual fluorescence in situ hybridization (FISH) scenario versus an automated dual in situ hybridization (DISH) plus IDP in breast cancer diagnosis. Business process management (BPM) methodology was applied for capturing working time and latencies, in combination with a time-driven activity-based costing (TDABC) methodology for estimating direct, total, and average cost (2023 USD) for both scenarios for the following vectors: Human resources, supplies, and equipment, sorted by pre-analytical, analytical, and post-analytical phases. Indirect costs (2023 USD) were also retrieved. Both BPM and TDABC served to estimate labor productivity.

Results

In the baseline scenario based on manual FISH, the turnaround time (TAT) was estimated at 1259 min, at an average total cost of $265.67, considering direct and indirect costs for all phases. An average of 20.5 FISH reports were submitted per pathologist monthly during the baseline. The automated DISH plus IDP scenario consumed 203 min per biopsy, at an average total cost of $231.08, considering direct and indirect costs for all phases; it also showed an average of 22.8 submitted reports per pathologist monthly. This represents a decrease of 13.02% in average total costs, an 83.86% decrease in TAT, and an average labor productivity increase of 11.29%.

Conclusions

The incorporation of automated DISH plus IDP in the pathology department of this public provider has resulted in reductions in the time required to perform the in situ hybridization technique, a decrease in total costs, and increased productivity. Particular attention should be given to adopting new technologies to accelerate processing times and workflow.
{"title":"Economic evaluation: Impact on costs, time, and productivity of the incorporation of integrative digital pathology (IDP) in the anatomopathological analysis of breast cancer in a national reference public provider in Chile","authors":"Rony Lenz-Alcayaga ,&nbsp;Daniela Paredes-Fernández ,&nbsp;Fancy Gaete Verdejo ,&nbsp;Luciano Páez-Pizarro ,&nbsp;Karla Hernández-Sánchez","doi":"10.1016/j.jpi.2024.100417","DOIUrl":"10.1016/j.jpi.2024.100417","url":null,"abstract":"<div><h3>Introduction</h3><div>The incidence of breast cancer has risen in Chile, along with the complexity of diagnosis. For accurate diagnosis, it is necessary to complement the morphology assessed with hematoxylin and eosin with additional techniques to evaluate specific tumor markers. Evaluating the impact on costs, time, and productivity of automated techniques integrated with digital pathology solutions is crucial.</div></div><div><h3>Objectives</h3><div>To estimate the impact on costs, time, and productivity of incorporating the automation of the HER2 in situ hybridization technique combined with integrative digital pathology (IDP) in breast cancer diagnosis in a Chilean public provider versus a manual technique.</div></div><div><h3>Methods</h3><div>This economic evaluation adopted a health economics multi-method approach. A decision model was developed to represent the current manual fluorescence in situ hybridization (FISH) scenario versus an automated dual in situ hybridization (DISH) plus IDP in breast cancer diagnosis. Business process management (BPM) methodology was applied for capturing working time and latencies, in combination with a time-driven activity-based costing (TDABC) methodology for estimating direct, total, and average cost (2023 USD) for both scenarios for the following vectors: Human resources, supplies, and equipment, sorted by pre-analytical, analytical, and post-analytical phases. Indirect costs (2023 USD) were also retrieved. Both BPM and TDABC served to estimate labor productivity.</div></div><div><h3>Results</h3><div>In the baseline scenario based on manual FISH, the turnaround time (TAT) was estimated at 1259 min, at an average total cost of $265.67, considering direct and indirect costs for all phases. An average of 20.5 FISH reports were submitted per pathologist monthly during the baseline. The automated DISH plus IDP scenario consumed 203 min per biopsy, at an average total cost of $231.08, considering direct and indirect costs for all phases; it also showed an average of 22.8 submitted reports per pathologist monthly. This represents a decrease of 13.02% in average total costs, an 83.86% decrease in TAT, and an average labor productivity increase of 11.29%.</div></div><div><h3>Conclusions</h3><div>The incorporation of automated DISH plus IDP in the pathology department of this public provider has resulted in reductions in the time required to perform the in situ hybridization technique, a decrease in total costs, and increased productivity. Particular attention should be given to adopting new technologies to accelerate processing times and workflow.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"16 ","pages":"Article 100417"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iris: A Next Generation Digital Pathology Rendering Engine 虹膜:下一代数字病理渲染引擎。
Q2 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.jpi.2024.100414
Ryan Erik Landvater, Ulysses Balis
Digital pathology is a tool of rapidly evolving importance within the discipline of pathology. Whole slide imaging promises numerous advantages; however, adoption is limited by challenges in ease of use and speed of high-quality image rendering relative to the simplicity and visual quality of glass slides. Herein, we introduce Iris, a new high-performance digital pathology rendering system. Specifically, we outline and detail the performance metrics of Iris Core, the core rendering engine technology. Iris Core comprises machine code modules written from the ground up in C++ and using Vulkan, a low-level and low-overhead cross-platform graphical processing unit application program interface, and our novel rapid tile buffering algorithms. We provide a detailed explanation of Iris Core's system architecture, including the stateless isolation of core processes, interprocess communication paradigms, and explicit synchronization paradigms that provide powerful control over the graphical processing unit. Iris Core achieves slide rendering at the sustained maximum frame rate on all tested platforms (120 FPS) and buffers an entire new slide field of view, without overlapping pixels, in 10 ms with enhanced detail in 30 ms. Further, it is able to buffer and compute high-fidelity reduction-enhancements for viewing low-power cytology with increased visual quality at a rate of 100–160 μs per slide tile, and with a cumulative median buffering rate of 1.36 GB of decompressed image data per second. This buffering rate allows for an entirely new field of view to be fully buffered and rendered in less than a single monitor refresh on a standard display, and high detail features within 2–3 monitor refresh frames. These metrics far exceed previously published specifications, beyond an order of magnitude in some contexts. The system shows no slowing with high use loads, but rather increases performance due to graphical processing unit cache control mechanisms and is “future-proof” due to near unlimited parallel scalability.
数字病理学是病理学学科中迅速发展的重要工具。全切片成像有许多优点;然而,相对于玻片的简单性和视觉质量,在易用性和高质量图像渲染速度方面的挑战限制了采用。本文介绍了一种新型的高性能数字病理绘制系统Iris。具体来说,我们概述并详细介绍了核心渲染引擎技术Iris Core的性能指标。Iris Core包括用c++从头开始编写的机器码模块,并使用Vulkan(一种低级、低开销的跨平台图形处理单元应用程序接口)和我们新颖的快速块缓冲算法。我们详细解释了Iris Core的系统架构,包括核心进程的无状态隔离、进程间通信范例和显式同步范例,这些范例提供了对图形处理单元的强大控制。Iris Core在所有测试平台上以持续的最大帧率(120 FPS)实现幻灯片渲染,并在10 ms内缓冲整个新幻灯片视场,没有重叠像素,并在30 ms内增强细节。此外,它能够缓冲和计算高保真的还原性增强,以提高视觉质量,以100-160 μs /每张幻灯片的速率观看低功率细胞学,并且具有1.36 GB /秒的累积中位数缓冲速率。这个缓冲速率允许一个全新的视野被完全缓冲,并在不到一个单一的显示器刷新标准显示器上渲染,并在2-3显示器刷新帧内提供高细节功能。这些指标远远超过了以前发布的规范,在某些上下文中超出了一个数量级。该系统在高使用负载下没有显示出速度变慢,而是由于图形处理单元缓存控制机制而提高了性能,并且由于几乎无限的并行可扩展性而具有“面向未来”的性能。
{"title":"Iris: A Next Generation Digital Pathology Rendering Engine","authors":"Ryan Erik Landvater,&nbsp;Ulysses Balis","doi":"10.1016/j.jpi.2024.100414","DOIUrl":"10.1016/j.jpi.2024.100414","url":null,"abstract":"<div><div>Digital pathology is a tool of rapidly evolving importance within the discipline of pathology. Whole slide imaging promises numerous advantages; however, adoption is limited by challenges in ease of use and speed of high-quality image rendering relative to the simplicity and visual quality of glass slides. Herein, we introduce Iris, a new high-performance digital pathology rendering system. Specifically, we outline and detail the performance metrics of Iris Core, the core rendering engine technology. Iris Core comprises machine code modules written from the ground up in C++ and using Vulkan, a low-level and low-overhead cross-platform graphical processing unit application program interface, and our novel rapid tile buffering algorithms. We provide a detailed explanation of Iris Core's system architecture, including the stateless isolation of core processes, interprocess communication paradigms, and explicit synchronization paradigms that provide powerful control over the graphical processing unit. Iris Core achieves slide rendering at the sustained maximum frame rate on all tested platforms (120 FPS) and buffers an entire new slide field of view, without overlapping pixels, in 10 ms with enhanced detail in 30 ms. Further, it is able to buffer and compute high-fidelity reduction-enhancements for viewing low-power cytology with increased visual quality at a rate of 100–160 μs per slide tile, and with a cumulative median buffering rate of 1.36 GB of decompressed image data per second. This buffering rate allows for an entirely new field of view to be fully buffered and rendered in less than a single monitor refresh on a standard display, and high detail features within 2–3 monitor refresh frames. These metrics far exceed previously published specifications, beyond an order of magnitude in some contexts. The system shows no slowing with high use loads, but rather increases performance due to graphical processing unit cache control mechanisms and is “future-proof” due to near unlimited parallel scalability.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"16 ","pages":"Article 100414"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742306/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143013435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A standards-based application for improving platelet transfusion workflow 改进血小板输注工作流程的标准化应用。
Q2 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.jpi.2024.100412
William Gordon , Maria Aguad , Layne Ainsworth , Samuel Aronson , Jane Baronas , Edward Comeau , Rory De La Paz , Justin B.L. Halls , Vincent T. Ho , Michael Oates , Adam Landman , Wen Lu , Shawn N. Murphy , Fei Wang , Indira Guleria , Sean R. Stowell , Melissa Y. Yeung , Edgar L. Milford , Richard M. Kaufman , William J. Lane

Objective

Thrombocytopenia is a common complication of hematopoietic stem-cell transplantation (HSCT), though many patients will become immune refractory to platelet transfusions over time. We built and evaluated an electronic health record (EHR)-integrated, standards-based application that enables blood-bank clinicians to match platelet inventory with patients using data previously not available at the point-of-care, like human leukocyte antigen (HLA) data for donors and recipients.

Materials and methods

The web-based application launches as an EHR-embedded application or as a standalone application. The application coalesces disparate data streams into a unified view, including platelet count, HLA data, demographics, and real-time inventory. We looked at application usage over time and developed a multivariable logistic regression model to compute odds ratios that a patient undergoing HSCT would have a complicated thrombocytopenia course, with several model covariates including pre-/post-application deployment.

Results

Usage of the application has been consistent since launch, with a slight dip during the first COVID wave. Our model, which included 376 patients in the final analysis, did not demonstrate a significantly decreased odds that a patient would have a complicated thrombocytopenia course after application deployment as compared to before application deployment.

Discussion

We built an EHR-integrated application to improve platelet transfusion processes. Whereas our model did not demonstrate decreased odds of a patient having a complicated thrombocytopenia course, there are other workflow and clinical benefits that will benefit from future evaluation.

Conclusion

A web-based, EHR-integrated application was built and integrated into our EHR system and is now part of the standard operating procedures of our blood bank.
目的:血小板减少症是造血干细胞移植(HSCT)的常见并发症,但随着时间的推移,许多患者会对血小板输注产生免疫难治性。我们建立并评估了一个电子健康记录(EHR)集成的、基于标准的应用程序,使血库临床医生能够使用以前在护理点无法获得的数据(如供体和受体的人类白细胞抗原(HLA)数据)将血小板清单与患者进行匹配。材料和方法:基于web的应用程序作为ehr嵌入式应用程序或作为独立应用程序启动。该应用程序将不同的数据流合并到一个统一的视图中,包括血小板计数、HLA数据、人口统计数据和实时库存。我们观察了一段时间内应用程序的使用情况,并开发了一个多变量逻辑回归模型来计算接受HSCT的患者将有一个复杂的血小板减少过程的比值比,其中包括几个模型协变量,包括应用前/应用后部署。结果:应用程序的使用自发布以来一直保持一致,在第一次COVID浪潮期间略有下降。我们的模型在最终分析中包括376名患者,与应用前相比,应用后患者出现复杂血小板减少病程的几率并没有显著降低。讨论:我们建立了一个ehr集成应用程序,以改善血小板输注过程。尽管我们的模型并没有证明患者出现复杂血小板减少病程的几率降低,但未来的评估将会带来其他工作流程和临床益处。结论:建立了一个基于网络的电子病历集成应用程序,并将其集成到我们的电子病历系统中,现在已成为我们血库标准操作程序的一部分。
{"title":"A standards-based application for improving platelet transfusion workflow","authors":"William Gordon ,&nbsp;Maria Aguad ,&nbsp;Layne Ainsworth ,&nbsp;Samuel Aronson ,&nbsp;Jane Baronas ,&nbsp;Edward Comeau ,&nbsp;Rory De La Paz ,&nbsp;Justin B.L. Halls ,&nbsp;Vincent T. Ho ,&nbsp;Michael Oates ,&nbsp;Adam Landman ,&nbsp;Wen Lu ,&nbsp;Shawn N. Murphy ,&nbsp;Fei Wang ,&nbsp;Indira Guleria ,&nbsp;Sean R. Stowell ,&nbsp;Melissa Y. Yeung ,&nbsp;Edgar L. Milford ,&nbsp;Richard M. Kaufman ,&nbsp;William J. Lane","doi":"10.1016/j.jpi.2024.100412","DOIUrl":"10.1016/j.jpi.2024.100412","url":null,"abstract":"<div><h3>Objective</h3><div>Thrombocytopenia is a common complication of hematopoietic stem-cell transplantation (HSCT), though many patients will become immune refractory to platelet transfusions over time. We built and evaluated an electronic health record (EHR)-integrated, standards-based application that enables blood-bank clinicians to match platelet inventory with patients using data previously not available at the point-of-care, like human leukocyte antigen (HLA) data for donors and recipients.</div></div><div><h3>Materials and methods</h3><div>The web-based application launches as an EHR-embedded application or as a standalone application. The application coalesces disparate data streams into a unified view, including platelet count, HLA data, demographics, and real-time inventory. We looked at application usage over time and developed a multivariable logistic regression model to compute odds ratios that a patient undergoing HSCT would have a complicated thrombocytopenia course, with several model covariates including pre-/post-application deployment.</div></div><div><h3>Results</h3><div>Usage of the application has been consistent since launch, with a slight dip during the first COVID wave. Our model, which included 376 patients in the final analysis, did not demonstrate a significantly decreased odds that a patient would have a complicated thrombocytopenia course after application deployment as compared to before application deployment.</div></div><div><h3>Discussion</h3><div>We built an EHR-integrated application to improve platelet transfusion processes. Whereas our model did not demonstrate decreased odds of a patient having a complicated thrombocytopenia course, there are other workflow and clinical benefits that will benefit from future evaluation.</div></div><div><h3>Conclusion</h3><div>A web-based, EHR-integrated application was built and integrated into our EHR system and is now part of the standard operating procedures of our blood bank.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"16 ","pages":"Article 100412"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11721207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142972501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing human phenotype ontology term extraction through synthetic case reports and embedding-based retrieval: A novel approach for improved biomedical data annotation 通过综合病例报告和基于嵌入的检索增强人类表型本体术语提取:一种改进生物医学数据注释的新方法。
Q2 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.jpi.2024.100409
Abdulkadir Albayrak , Yao Xiao , Piyush Mukherjee , Sarah S. Barnett , Cherisse A. Marcou , Steven N. Hart
With the increasing utilization of exome and genome sequencing in clinical and research genetics, accurate and automated extraction of human phenotype ontology (HPO) terms from clinical texts has become imperative. Traditional methods for HPO term extraction, such as PhenoTagger, often face limitations in coverage and precision. In this study, we propose a novel approach that leverages large language models (LLMs) to generate synthetic sentences with clinical context, which were semantically encoded into vector embeddings. These embeddings are linked to HPO terms, creating a robust knowledgebase that facilitates precise information retrieval. Our method circumvents the known issue of LLM hallucinations by storing and querying these embeddings within a true database, ensuring accurate context matching without the need for a predictive model. We evaluated the performance of three different embedding models, all of which demonstrated substantial improvements over PhenoTagger. Top recall (sensitivity), precision (positive-predictive value, PPV), and F1 are 0.64, 0.64, and 0.64, respectively, which were 31%, 10%, and 21% better than PhenoTagger. Furthermore, optimal performance was achieved when we combined the best performing embedding model with PhenoTagger (a.k.a. Fused model), resulting in recall (sensitivity), precision (PPV), and F1 values of 0.7, 0.7, and 0.7, respectively, which are 10%, 10%, and 10% better than the best embedding models. Our findings underscore the potential of this integrated approach to enhance the precision and reliability of HPO term extraction, offering a scalable and effective solution for biomedical data annotation.
随着外显子组和基因组测序在临床和研究遗传学中的应用越来越多,从临床文本中准确、自动地提取人类表型本体论(HPO)术语已经变得势在必行。传统的HPO术语提取方法,如PhenoTagger,往往面临覆盖范围和精度的限制。在本研究中,我们提出了一种利用大型语言模型(llm)生成具有临床上下文的合成句子的新方法,这些句子在语义上被编码为向量嵌入。这些嵌入与HPO术语相关联,创建了一个健壮的知识库,方便了精确的信息检索。我们的方法通过在真实的数据库中存储和查询这些嵌入,避免了已知的LLM幻觉问题,确保了准确的上下文匹配,而不需要预测模型。我们评估了三种不同嵌入模型的性能,它们都比PhenoTagger有了实质性的改进。toprecall (sensitivity)、precision (positive-predictive value, PPV)和F1分别为0.64、0.64和0.64,分别比PhenoTagger高31%、10%和21%。此外,当我们将表现最好的嵌入模型与PhenoTagger(又称融合模型)结合时,获得了最优的性能,召回率(灵敏度),精度(PPV)和F1值分别为0.7,0.7和0.7,分别比最佳嵌入模型高10%,10%和10%。我们的研究结果强调了这种集成方法在提高HPO术语提取的精度和可靠性方面的潜力,为生物医学数据标注提供了一种可扩展和有效的解决方案。
{"title":"Enhancing human phenotype ontology term extraction through synthetic case reports and embedding-based retrieval: A novel approach for improved biomedical data annotation","authors":"Abdulkadir Albayrak ,&nbsp;Yao Xiao ,&nbsp;Piyush Mukherjee ,&nbsp;Sarah S. Barnett ,&nbsp;Cherisse A. Marcou ,&nbsp;Steven N. Hart","doi":"10.1016/j.jpi.2024.100409","DOIUrl":"10.1016/j.jpi.2024.100409","url":null,"abstract":"<div><div>With the increasing utilization of exome and genome sequencing in clinical and research genetics, accurate and automated extraction of human phenotype ontology (HPO) terms from clinical texts has become imperative. Traditional methods for HPO term extraction, such as PhenoTagger, often face limitations in coverage and precision. In this study, we propose a novel approach that leverages large language models (LLMs) to generate synthetic sentences with clinical context, which were semantically encoded into vector embeddings. These embeddings are linked to HPO terms, creating a robust knowledgebase that facilitates precise information retrieval. Our method circumvents the known issue of LLM hallucinations by storing and querying these embeddings within a true database, ensuring accurate context matching without the need for a predictive model. We evaluated the performance of three different embedding models, all of which demonstrated substantial improvements over PhenoTagger. Top recall (sensitivity), precision (positive-predictive value, PPV), and F1 are 0.64, 0.64, and 0.64, respectively, which were 31%, 10%, and 21% better than PhenoTagger. Furthermore, optimal performance was achieved when we combined the best performing embedding model with PhenoTagger (a.k.a. Fused model), resulting in recall (sensitivity), precision (PPV), and F1 values of 0.7, 0.7, and 0.7, respectively, which are 10%, 10%, and 10% better than the best embedding models. Our findings underscore the potential of this integrated approach to enhance the precision and reliability of HPO term extraction, offering a scalable and effective solution for biomedical data annotation.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"16 ","pages":"Article 100409"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11667693/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142886205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Pathology Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1