Ricardo Loor-Torres MD , Yuqi Wu PhD , Esteban Cabezas MD , Mariana Borras-Osorio MD , David Toro-Tobon MD , Mayra Duran MD , Misk Al Zahidy MS , Maria Mateo Chavez MD , Cristian Soto Jacome MD , Jungwei W. Fan PhD , Naykky M. Singh Ospina MD , Yonghui Wu PhD , Juan P. Brito MD
{"title":"Use of Natural Language Processing to Extract and Classify Papillary Thyroid Cancer Features From Surgical Pathology Reports","authors":"Ricardo Loor-Torres MD , Yuqi Wu PhD , Esteban Cabezas MD , Mariana Borras-Osorio MD , David Toro-Tobon MD , Mayra Duran MD , Misk Al Zahidy MS , Maria Mateo Chavez MD , Cristian Soto Jacome MD , Jungwei W. Fan PhD , Naykky M. Singh Ospina MD , Yonghui Wu PhD , Juan P. Brito MD","doi":"10.1016/j.eprac.2024.08.008","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>We aim to use Natural Language Processing to automate the extraction and classification of thyroid cancer risk factors from pathology reports.</div></div><div><h3>Methods</h3><div>We analyzed 1410 surgical pathology reports from adult papillary thyroid cancer patients from 2010 to 2019. Structured and nonstructured reports were used to create a consensus-based ground truth dictionary and categorized them into modified recurrence risk levels. Nonstructured reports were narrative, while structured reports followed standardized formats. We developed ThyroPath, a rule-based Natural Language Processing pipeline, to extract and classify thyroid cancer features into risk categories. Training involved 225 reports (150 structured, 75 unstructured), with testing on 170 reports (120 structured, 50 unstructured) for evaluation. The pipeline's performance was assessed using both strict and lenient criteria for accuracy, precision, recall, and F1-score; a metric that combines precision and recall evaluation.</div></div><div><h3>Results</h3><div>In extraction tasks, ThyroPath achieved overall strict F-1 scores of 93% for structured reports and 90% for unstructured reports, covering 18 thyroid cancer pathology features. In classification tasks, ThyroPath-extracted information demonstrated an overall accuracy of 93% in categorizing reports based on their corresponding guideline-based risk of recurrence: 76.9% for high-risk, 86.8% for intermediate risk, and 100% for both low and very low-risk cases. However, ThyroPath achieved 100% accuracy across all risk categories with human extracted pathology information.</div></div><div><h3>Conclusions</h3><div>ThyroPath shows promise in automating the extraction and risk recurrence classification of thyroid pathology reports at large scale. It offers a solution to laborious manual reviews and advancing virtual registries. However, it requires further validation before implementation.</div></div>","PeriodicalId":11682,"journal":{"name":"Endocrine Practice","volume":"30 11","pages":"Pages 1051-1058"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Endocrine Practice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1530891X24006578","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Background
We aim to use Natural Language Processing to automate the extraction and classification of thyroid cancer risk factors from pathology reports.
Methods
We analyzed 1410 surgical pathology reports from adult papillary thyroid cancer patients from 2010 to 2019. Structured and nonstructured reports were used to create a consensus-based ground truth dictionary and categorized them into modified recurrence risk levels. Nonstructured reports were narrative, while structured reports followed standardized formats. We developed ThyroPath, a rule-based Natural Language Processing pipeline, to extract and classify thyroid cancer features into risk categories. Training involved 225 reports (150 structured, 75 unstructured), with testing on 170 reports (120 structured, 50 unstructured) for evaluation. The pipeline's performance was assessed using both strict and lenient criteria for accuracy, precision, recall, and F1-score; a metric that combines precision and recall evaluation.
Results
In extraction tasks, ThyroPath achieved overall strict F-1 scores of 93% for structured reports and 90% for unstructured reports, covering 18 thyroid cancer pathology features. In classification tasks, ThyroPath-extracted information demonstrated an overall accuracy of 93% in categorizing reports based on their corresponding guideline-based risk of recurrence: 76.9% for high-risk, 86.8% for intermediate risk, and 100% for both low and very low-risk cases. However, ThyroPath achieved 100% accuracy across all risk categories with human extracted pathology information.
Conclusions
ThyroPath shows promise in automating the extraction and risk recurrence classification of thyroid pathology reports at large scale. It offers a solution to laborious manual reviews and advancing virtual registries. However, it requires further validation before implementation.
期刊介绍:
Endocrine Practice (ISSN: 1530-891X), a peer-reviewed journal published twelve times a year, is the official journal of the American Association of Clinical Endocrinologists (AACE). The primary mission of Endocrine Practice is to enhance the health care of patients with endocrine diseases through continuing education of practicing endocrinologists.