用人工智能增强甲状腺病理学:使用 RUBY 从电子健康报告中自动提取数据。

IF 3.3 Q2 ONCOLOGY JCO Clinical Cancer Informatics Pub Date : 2024-12-01 Epub Date: 2024-12-10 DOI:10.1200/CCI.23.00263
Dorian Culié, Renaud Schiappa, Sara Contu, Eva Seutin, Tanguy Pace-Loscos, Gilles Poissonnet, Agathe Villarme, Alexandre Bozec, Emmanuel Chamorey
{"title":"用人工智能增强甲状腺病理学:使用 RUBY 从电子健康报告中自动提取数据。","authors":"Dorian Culié, Renaud Schiappa, Sara Contu, Eva Seutin, Tanguy Pace-Loscos, Gilles Poissonnet, Agathe Villarme, Alexandre Bozec, Emmanuel Chamorey","doi":"10.1200/CCI.23.00263","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Thyroid nodules are common in the general population, and assessing their malignancy risk is the initial step in care. Surgical exploration remains the sole definitive option for indeterminate nodules. Extensive database access is crucial for improving this initial assessment. Our objective was to develop an automated process using convolutional neural networks (CNNs) to extract and structure biomedical insights from electronic health reports (EHRs) in a large thyroid pathology cohort.</p><p><strong>Materials and methods: </strong>We randomly selected 1,500 patients with thyroid pathology from our cohort for model development and an additional 100 for testing. We then divided the cohort of 1,500 patients into training (70%) and validation (30%) sets. We used EHRs from initial surgeon visits, preanesthesia visits, ultrasound, surgery, and anatomopathology reports. We selected 42 variables of interest and had them manually annotated by a clinical expert. We developed RUBY-THYRO using six distinct CNN models from SpaCy, supplemented with keyword extraction rules and postprocessing. Evaluation against a gold standard database included calculating precision, recall, and F1 score.</p><p><strong>Results: </strong>Performance remained consistent across the test and validation sets, with the majority of variables (30/42) achieving performance metrics exceeding 90% for all metrics in both sets. Results differed according to the variables; pathologic tumor stage score achieved 100% in precision, recall, and F1 score, versus 45%, 28%, and 32% for the number of nodules in the test set, respectively. Surgical and preanesthesia reports demonstrated particularly high performance.</p><p><strong>Conclusion: </strong>Our study successfully implemented a CNN-based natural language processing (NLP) approach for extracting and structuring data from various EHRs in thyroid pathology. This highlights the potential of artificial intelligence-driven NLP techniques for extensive and cost-effective data extraction, paving the way for creating comprehensive, hospital-wide data warehouses.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300263"},"PeriodicalIF":3.3000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Thyroid Pathology With Artificial Intelligence: Automated Data Extraction From Electronic Health Reports Using RUBY.\",\"authors\":\"Dorian Culié, Renaud Schiappa, Sara Contu, Eva Seutin, Tanguy Pace-Loscos, Gilles Poissonnet, Agathe Villarme, Alexandre Bozec, Emmanuel Chamorey\",\"doi\":\"10.1200/CCI.23.00263\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Thyroid nodules are common in the general population, and assessing their malignancy risk is the initial step in care. Surgical exploration remains the sole definitive option for indeterminate nodules. Extensive database access is crucial for improving this initial assessment. Our objective was to develop an automated process using convolutional neural networks (CNNs) to extract and structure biomedical insights from electronic health reports (EHRs) in a large thyroid pathology cohort.</p><p><strong>Materials and methods: </strong>We randomly selected 1,500 patients with thyroid pathology from our cohort for model development and an additional 100 for testing. We then divided the cohort of 1,500 patients into training (70%) and validation (30%) sets. We used EHRs from initial surgeon visits, preanesthesia visits, ultrasound, surgery, and anatomopathology reports. We selected 42 variables of interest and had them manually annotated by a clinical expert. We developed RUBY-THYRO using six distinct CNN models from SpaCy, supplemented with keyword extraction rules and postprocessing. Evaluation against a gold standard database included calculating precision, recall, and F1 score.</p><p><strong>Results: </strong>Performance remained consistent across the test and validation sets, with the majority of variables (30/42) achieving performance metrics exceeding 90% for all metrics in both sets. Results differed according to the variables; pathologic tumor stage score achieved 100% in precision, recall, and F1 score, versus 45%, 28%, and 32% for the number of nodules in the test set, respectively. Surgical and preanesthesia reports demonstrated particularly high performance.</p><p><strong>Conclusion: </strong>Our study successfully implemented a CNN-based natural language processing (NLP) approach for extracting and structuring data from various EHRs in thyroid pathology. This highlights the potential of artificial intelligence-driven NLP techniques for extensive and cost-effective data extraction, paving the way for creating comprehensive, hospital-wide data warehouses.</p>\",\"PeriodicalId\":51626,\"journal\":{\"name\":\"JCO Clinical Cancer Informatics\",\"volume\":\"8 \",\"pages\":\"e2300263\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO Clinical Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1200/CCI.23.00263\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.23.00263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/10 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:甲状腺结节在普通人群中很常见,评估其恶性风险是治疗的第一步。对于不确定的结节,手术探查仍是唯一明确的选择。广泛的数据库访问对于改善这种初步评估至关重要。我们的目标是开发一种使用卷积神经网络(CNN)的自动化流程,以从大型甲状腺病理学队列中的电子健康报告(EHR)中提取并构建生物医学见解:我们从队列中随机选取了 1,500 名甲状腺病理患者进行模型开发,并另外选取了 100 名患者进行测试。然后,我们将这 1500 名患者分为训练集(70%)和验证集(30%)。我们使用了外科医生初次就诊、麻醉前就诊、超声检查、手术和解剖病理报告中的电子病历。我们选择了 42 个感兴趣的变量,并由临床专家进行人工标注。我们使用 SpaCy 的六个不同 CNN 模型开发了 RUBY-THYRO,并辅以关键词提取规则和后处理。根据金标准数据库进行的评估包括计算精确度、召回率和 F1 分数:测试集和验证集的性能保持一致,大多数变量(30/42)的性能指标在两个集的所有指标中都超过了 90%。变量不同,结果也不同;病理肿瘤分期得分的精确度、召回率和F1得分均为100%,而测试集中结节数量的精确度、召回率和F1得分分别为45%、28%和32%。手术报告和麻醉前报告的表现尤为突出:我们的研究成功实施了一种基于 CNN 的自然语言处理 (NLP) 方法,用于从甲状腺病理学的各种电子病历中提取和构建数据。这凸显了人工智能驱动的 NLP 技术在广泛且经济高效的数据提取方面的潜力,为创建全院范围的综合数据仓库铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Enhancing Thyroid Pathology With Artificial Intelligence: Automated Data Extraction From Electronic Health Reports Using RUBY.

Purpose: Thyroid nodules are common in the general population, and assessing their malignancy risk is the initial step in care. Surgical exploration remains the sole definitive option for indeterminate nodules. Extensive database access is crucial for improving this initial assessment. Our objective was to develop an automated process using convolutional neural networks (CNNs) to extract and structure biomedical insights from electronic health reports (EHRs) in a large thyroid pathology cohort.

Materials and methods: We randomly selected 1,500 patients with thyroid pathology from our cohort for model development and an additional 100 for testing. We then divided the cohort of 1,500 patients into training (70%) and validation (30%) sets. We used EHRs from initial surgeon visits, preanesthesia visits, ultrasound, surgery, and anatomopathology reports. We selected 42 variables of interest and had them manually annotated by a clinical expert. We developed RUBY-THYRO using six distinct CNN models from SpaCy, supplemented with keyword extraction rules and postprocessing. Evaluation against a gold standard database included calculating precision, recall, and F1 score.

Results: Performance remained consistent across the test and validation sets, with the majority of variables (30/42) achieving performance metrics exceeding 90% for all metrics in both sets. Results differed according to the variables; pathologic tumor stage score achieved 100% in precision, recall, and F1 score, versus 45%, 28%, and 32% for the number of nodules in the test set, respectively. Surgical and preanesthesia reports demonstrated particularly high performance.

Conclusion: Our study successfully implemented a CNN-based natural language processing (NLP) approach for extracting and structuring data from various EHRs in thyroid pathology. This highlights the potential of artificial intelligence-driven NLP techniques for extensive and cost-effective data extraction, paving the way for creating comprehensive, hospital-wide data warehouses.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.20
自引率
4.80%
发文量
190
期刊最新文献
CFO: Calibration-Free Odds Bayesian Designs for Dose Finding in Clinical Trials. Patient-Reported Outcomes: Comparing Functional Avoidance and Standard Thoracic Radiation Therapy in Lung Cancer. Advancements in Interoperability: Achieving Anatomic Pathology Reports That Adhere to International Standards and Are Both Human-Readable and Readily Computable. Incorporating Structured and Unstructured Data Sources to Identify and Characterize Hereditary Cancer Testing Among Veterans With Metastatic Castration-Resistant Prostate Cancer. Leveraging Radiotherapy Data for Precision Oncology: Veterans Affairs Granular Radiotherapy Information Database.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1