Path-BigBird: An AI-Driven Transformer Approach to Classification of Cancer Pathology Reports.

IF 2.8 Q2 ONCOLOGY JCO Clinical Cancer Informatics Pub Date : 2024-02-01 DOI:10.1200/CCI.23.00148

Mayanka Chandrashekar, Isaac Lyngaas, Heidi A Hanson, Shang Gao, Xiao-Cheng Wu, John Gounley

{"title":"Path-BigBird: An AI-Driven Transformer Approach to Classification of Cancer Pathology Reports.","authors":"Mayanka Chandrashekar, Isaac Lyngaas, Heidi A Hanson, Shang Gao, Xiao-Cheng Wu, John Gounley","doi":"10.1200/CCI.23.00148","DOIUrl":null,"url":null,"abstract":"Purpose: Surgical pathology reports are critical for cancer diagnosis and management. To accurately extract information about tumor characteristics from pathology reports in near real time, we explore the impact of using domain-specific transformer models that understand cancer pathology reports.Methods: We built a pathology transformer model, Path-BigBird, by using 2.7 million pathology reports from six SEER cancer registries. We then compare different variations of Path-BigBird with two less computationally intensive methods: Hierarchical Self-Attention Network (HiSAN) classification model and an off-the-shelf clinical transformer model (Clinical BigBird). We use five pathology information extraction tasks for evaluation: site, subsite, laterality, histology, and behavior. Model performance is evaluated by using macro and micro F1 scores.Results: We found that Path-BigBird and Clinical BigBird outperformed the HiSAN in all tasks. Clinical BigBird performed better on the site and laterality tasks. Versions of the Path-BigBird model performed best on the two most difficult tasks: subsite (micro F1 score of 72.53, macro F1 score of 35.76) and histology (micro F1 score of 80.96, macro F1 score of 37.94). The largest performance gains over the HiSAN model were for histology, for which a Path-BigBird model increased the micro F1 score by 1.44 points and the macro F1 score by 3.55 points. Overall, the results suggest that a Path-BigBird model with a vocabulary derived from well-curated and deidentified data is the best-performing model.Conclusion: The Path-BigBird pathology transformer model improves automated information extraction from pathology reports. Although Path-BigBird outperforms Clinical BigBird and HiSAN, these less computationally expensive models still have utility when resources are constrained.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300148"},"PeriodicalIF":2.8000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10904099/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.23.00148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Surgical pathology reports are critical for cancer diagnosis and management. To accurately extract information about tumor characteristics from pathology reports in near real time, we explore the impact of using domain-specific transformer models that understand cancer pathology reports.

Methods: We built a pathology transformer model, Path-BigBird, by using 2.7 million pathology reports from six SEER cancer registries. We then compare different variations of Path-BigBird with two less computationally intensive methods: Hierarchical Self-Attention Network (HiSAN) classification model and an off-the-shelf clinical transformer model (Clinical BigBird). We use five pathology information extraction tasks for evaluation: site, subsite, laterality, histology, and behavior. Model performance is evaluated by using macro and micro F₁ scores.

Results: We found that Path-BigBird and Clinical BigBird outperformed the HiSAN in all tasks. Clinical BigBird performed better on the site and laterality tasks. Versions of the Path-BigBird model performed best on the two most difficult tasks: subsite (micro F₁ score of 72.53, macro F₁ score of 35.76) and histology (micro F₁ score of 80.96, macro F₁ score of 37.94). The largest performance gains over the HiSAN model were for histology, for which a Path-BigBird model increased the micro F₁ score by 1.44 points and the macro F₁ score by 3.55 points. Overall, the results suggest that a Path-BigBird model with a vocabulary derived from well-curated and deidentified data is the best-performing model.

Conclusion: The Path-BigBird pathology transformer model improves automated information extraction from pathology reports. Although Path-BigBird outperforms Clinical BigBird and HiSAN, these less computationally expensive models still have utility when resources are constrained.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Path-BigBird：一种人工智能驱动的癌症病理报告分类变换器方法。

目的：手术病理报告对于癌症诊断和管理至关重要。为了近乎实时地从病理报告中准确提取肿瘤特征信息，我们探索了使用特定领域的转换器模型对理解癌症病理报告的影响：方法：我们利用六个 SEER 癌症登记处的 270 万份病理报告建立了病理转换器模型 Path-BigBird。然后，我们将 Path-BigBird 的不同变体与两种计算密集度较低的方法进行了比较：分层自注意力网络（HiSAN）分类模型和现成的临床转化模型（Clinical BigBird）。我们使用五种病理信息提取任务进行评估：部位、亚部位、侧位、组织学和行为。模型性能通过宏观和微观 F1 分数进行评估：我们发现，Path-BigBird 和 Clinical BigBird 在所有任务中的表现都优于 HiSAN。临床 BigBird 在部位和侧向任务中表现更好。Path-BigBird 模型的各个版本在两个最难的任务中表现最佳：亚位点（微观 F1 得分为 72.53，宏观 F1 得分为 35.76）和组织学（微观 F1 得分为 80.96，宏观 F1 得分为 37.94）。与 HiSAN 模型相比，组学模型的性能提升最大，Path-BigBird 模型的微观 F1 分数提高了 1.44 分，宏观 F1 分数提高了 3.55 分。总之，研究结果表明，Path-BigBird 模型的词汇来源于精心整理和去标识化的数据，是表现最好的模型：结论：Path-BigBird 病理转换器模型改进了病理报告的自动信息提取。虽然 Path-BigBird 的性能优于 Clinical BigBird 和 HiSAN，但在资源有限的情况下，这些计算成本较低的模型仍具有实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊