利用特定领域的大型语言模型对自由文本 MRI 报告进行 LI-RADS v2018 分类：一项可行性研究。

IF 5.6 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Insights into Imaging Pub Date : 2024-11-22 DOI:10.1186/s13244-024-01850-1

Mario Matute-González, Anna Darnell, Marc Comas-Cufí, Javier Pazó, Alexandre Soler, Belén Saborido, Ezequiel Mauro, Juan Turnes, Alejandro Forner, María Reig, Jordi Rimola

{"title":"利用特定领域的大型语言模型对自由文本 MRI 报告进行 LI-RADS v2018 分类：一项可行性研究。","authors":"Mario Matute-González, Anna Darnell, Marc Comas-Cufí, Javier Pazó, Alexandre Soler, Belén Saborido, Ezequiel Mauro, Juan Turnes, Alejandro Forner, María Reig, Jordi Rimola","doi":"10.1186/s13244-024-01850-1","DOIUrl":null,"url":null,"abstract":"Objective: To develop a domain-specific large language model (LLM) for LI-RADS v2018 categorization of hepatic observations based on free-text descriptions extracted from MRI reports.Material and methods: This retrospective study included 291 small liver observations, divided into training (n = 141), validation (n = 30), and test (n = 120) datasets. Of these, 120 were fictitious, and 171 were extracted from 175 MRI reports from a single institution. The algorithm's performance was compared to two independent radiologists and one hepatologist in a human replacement scenario, and considering two combined strategies (double reading with arbitration and triage). Agreement on LI-RADS category and dichotomic malignancy (LR-4, LR-5, and LR-M) were estimated using linear-weighted κ statistics and Cohen's κ, respectively. Sensitivity and specificity for LR-5 were calculated. The consensus agreement of three other radiologists served as the ground truth.Results: The model showed moderate agreement against the ground truth for both LI-RADS categorization (κ = 0.54 [95% CI: 0.42-0.65]) and the dichotomized approach (κ = 0.58 [95% CI: 0.42-0.73]). Sensitivity and specificity for LR-5 were 0.76 (95% CI: 0.69-0.86) and 0.96 (95% CI: 0.91-1.00), respectively. When the chatbot was used as a triage tool, performance improved for LI-RADS categorization (κ = 0.86/0.87 for the two independent radiologists and κ = 0.76 for the hepatologist), dichotomized malignancy (κ = 0.94/0.91 and κ = 0.87) and LR-5 identification (1.00/0.98 and 0.85 sensitivity, 0.96/0.92 and 0.92 specificity), with no statistical significance compared to the human readers' individual performance. Through this strategy, the workload decreased by 45%.Conclusion: LI-RADS v2018 categorization from unlabelled MRI reports is feasible using our LLM, and it enhances the efficiency of data curation.Critical relevance statement: Our proof-of-concept study provides novel insights into the potential applications of LLMs, offering a real-world example of how these tools could be integrated into a local workflow to optimize data curation for research purposes.Key points: Automatic LI-RADS categorization from free-text reports would be beneficial to workflow and data mining. LiverAI, a GPT-4-based model, supported various strategies improving data curation efficiency by up to 60%. LLMs can integrate into workflows, significantly reducing radiologists' workload.","PeriodicalId":13639,"journal":{"name":"Insights into Imaging","volume":"15 1","pages":"280"},"PeriodicalIF":5.6000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11584817/pdf/","citationCount":"0","resultStr":"{\"title\":\"Utilizing a domain-specific large language model for LI-RADS v2018 categorization of free-text MRI reports: a feasibility study.\",\"authors\":\"Mario Matute-González, Anna Darnell, Marc Comas-Cufí, Javier Pazó, Alexandre Soler, Belén Saborido, Ezequiel Mauro, Juan Turnes, Alejandro Forner, María Reig, Jordi Rimola\",\"doi\":\"10.1186/s13244-024-01850-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: To develop a domain-specific large language model (LLM) for LI-RADS v2018 categorization of hepatic observations based on free-text descriptions extracted from MRI reports.Material and methods: This retrospective study included 291 small liver observations, divided into training (n = 141), validation (n = 30), and test (n = 120) datasets. Of these, 120 were fictitious, and 171 were extracted from 175 MRI reports from a single institution. The algorithm's performance was compared to two independent radiologists and one hepatologist in a human replacement scenario, and considering two combined strategies (double reading with arbitration and triage). Agreement on LI-RADS category and dichotomic malignancy (LR-4, LR-5, and LR-M) were estimated using linear-weighted κ statistics and Cohen's κ, respectively. Sensitivity and specificity for LR-5 were calculated. The consensus agreement of three other radiologists served as the ground truth.Results: The model showed moderate agreement against the ground truth for both LI-RADS categorization (κ = 0.54 [95% CI: 0.42-0.65]) and the dichotomized approach (κ = 0.58 [95% CI: 0.42-0.73]). Sensitivity and specificity for LR-5 were 0.76 (95% CI: 0.69-0.86) and 0.96 (95% CI: 0.91-1.00), respectively. When the chatbot was used as a triage tool, performance improved for LI-RADS categorization (κ = 0.86/0.87 for the two independent radiologists and κ = 0.76 for the hepatologist), dichotomized malignancy (κ = 0.94/0.91 and κ = 0.87) and LR-5 identification (1.00/0.98 and 0.85 sensitivity, 0.96/0.92 and 0.92 specificity), with no statistical significance compared to the human readers' individual performance. Through this strategy, the workload decreased by 45%.Conclusion: LI-RADS v2018 categorization from unlabelled MRI reports is feasible using our LLM, and it enhances the efficiency of data curation.Critical relevance statement: Our proof-of-concept study provides novel insights into the potential applications of LLMs, offering a real-world example of how these tools could be integrated into a local workflow to optimize data curation for research purposes.Key points: Automatic LI-RADS categorization from free-text reports would be beneficial to workflow and data mining. LiverAI, a GPT-4-based model, supported various strategies improving data curation efficiency by up to 60%. LLMs can integrate into workflows, significantly reducing radiologists' workload.\",\"PeriodicalId\":13639,\"journal\":{\"name\":\"Insights into Imaging\",\"volume\":\"15 1\",\"pages\":\"280\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11584817/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Insights into Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s13244-024-01850-1\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Insights into Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13244-024-01850-1","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

目的根据从核磁共振成像报告中提取的自由文本描述，为LI-RADS v2018肝脏观察结果的分类开发特定领域的大语言模型（LLM）：这项回顾性研究包括 291 个小肝脏观察结果，分为训练数据集（n = 141）、验证数据集（n = 30）和测试数据集（n = 120）。其中，120个数据集是虚构的，171个数据集是从一家机构的175份磁共振成像报告中提取的。在人工替代的情况下，算法的性能与两名独立放射科医生和一名肝病医生的性能进行了比较，并考虑了两种组合策略（双读与仲裁和分流）。分别使用线性加权κ统计和Cohen's κ估算了LI-RADS类别和二分法恶性程度（LR-4、LR-5和LR-M）的一致性。计算了 LR-5 的敏感性和特异性。其他三位放射科医生的共识作为基本事实：该模型与LI-RADS分类（κ = 0.54 [95% CI: 0.42-0.65]）和二分法（κ = 0.58 [95% CI: 0.42-0.73]）的基本事实显示出中等程度的一致性。LR-5 的灵敏度和特异度分别为 0.76（95% CI：0.69-0.86）和 0.96（95% CI：0.91-1.00）。将聊天机器人用作分诊工具后，LI-RADS 分类（两名独立放射科医生的灵敏度分别为 κ = 0.86/0.87 和 κ = 0.76）、二分法恶性肿瘤（κ = 0.94/0.91 和 κ = 0.87）和 LR-5 鉴别（灵敏度分别为 1.00/0.98 和 0.85，特异度分别为 0.96/0.92 和 0.92）的性能均有所提高，但与人类读者的个人性能相比无统计学意义。通过这一策略，工作量减少了 45%：使用我们的 LLM 可以从无标记的 MRI 报告中对 LI-RADS v2018 进行分类，并提高了数据整理的效率：我们的概念验证研究为 LLM 的潜在应用提供了新的见解，为如何将这些工具集成到本地工作流程中以优化用于研究目的的数据整理提供了一个真实的例子：要点：从自由文本报告中自动进行 LI-RADS 分类有利于工作流程和数据挖掘。基于 GPT-4 模型的 LiverAI 支持各种策略，使数据整理效率提高了 60%。LLM 可以集成到工作流程中，大大减轻放射医师的工作量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Utilizing a domain-specific large language model for LI-RADS v2018 categorization of free-text MRI reports: a feasibility study.

Objective: To develop a domain-specific large language model (LLM) for LI-RADS v2018 categorization of hepatic observations based on free-text descriptions extracted from MRI reports.

Material and methods: This retrospective study included 291 small liver observations, divided into training (n = 141), validation (n = 30), and test (n = 120) datasets. Of these, 120 were fictitious, and 171 were extracted from 175 MRI reports from a single institution. The algorithm's performance was compared to two independent radiologists and one hepatologist in a human replacement scenario, and considering two combined strategies (double reading with arbitration and triage). Agreement on LI-RADS category and dichotomic malignancy (LR-4, LR-5, and LR-M) were estimated using linear-weighted κ statistics and Cohen's κ, respectively. Sensitivity and specificity for LR-5 were calculated. The consensus agreement of three other radiologists served as the ground truth.

Results: The model showed moderate agreement against the ground truth for both LI-RADS categorization (κ = 0.54 [95% CI: 0.42-0.65]) and the dichotomized approach (κ = 0.58 [95% CI: 0.42-0.73]). Sensitivity and specificity for LR-5 were 0.76 (95% CI: 0.69-0.86) and 0.96 (95% CI: 0.91-1.00), respectively. When the chatbot was used as a triage tool, performance improved for LI-RADS categorization (κ = 0.86/0.87 for the two independent radiologists and κ = 0.76 for the hepatologist), dichotomized malignancy (κ = 0.94/0.91 and κ = 0.87) and LR-5 identification (1.00/0.98 and 0.85 sensitivity, 0.96/0.92 and 0.92 specificity), with no statistical significance compared to the human readers' individual performance. Through this strategy, the workload decreased by 45%.

Conclusion: LI-RADS v2018 categorization from unlabelled MRI reports is feasible using our LLM, and it enhances the efficiency of data curation.

Critical relevance statement: Our proof-of-concept study provides novel insights into the potential applications of LLMs, offering a real-world example of how these tools could be integrated into a local workflow to optimize data curation for research purposes.

Key points: Automatic LI-RADS categorization from free-text reports would be beneficial to workflow and data mining. LiverAI, a GPT-4-based model, supported various strategies improving data curation efficiency by up to 60%. LLMs can integrate into workflows, significantly reducing radiologists' workload.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Insights into Imaging Medicine-Radiology, Nuclear Medicine and Imaging

CiteScore

7.30

自引率

4.30%

发文量

182

审稿时长

13 weeks

期刊介绍： Insights into Imaging (I³) is a peer-reviewed open access journal published under the brand SpringerOpen. All content published in the journal is freely available online to anyone, anywhere! I³ continuously updates scientific knowledge and progress in best-practice standards in radiology through the publication of original articles and state-of-the-art reviews and opinions, along with recommendations and statements from the leading radiological societies in Europe. Founded by the European Society of Radiology (ESR), I³ creates a platform for educational material, guidelines and recommendations, and a forum for topics of controversy. A balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes I³ an indispensable source for current information in this field. I³ is owned by the ESR, however authors retain copyright to their article according to the Creative Commons Attribution License (see Copyright and License Agreement). All articles can be read, redistributed and reused for free, as long as the author of the original work is cited properly. The open access fees (article-processing charges) for this journal are kindly sponsored by ESR for all Members. The journal went open access in 2012, which means that all articles published since then are freely available online.