Bootstrapping BI-RADS classification using large language models and transformers in breast magnetic resonance imaging reports.

IF 6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Visual Computing for Industry Biomedicine and Art Pub Date : 2025-04-03 DOI:10.1186/s42492-025-00189-8

Yuxin Liu, Xiang Zhang, Weiwei Cao, Wenju Cui, Tao Tan, Yuqin Peng, Jiayi Huang, Zhen Lei, Jun Shen, Jian Zheng

{"title":"Bootstrapping BI-RADS classification using large language models and transformers in breast magnetic resonance imaging reports.","authors":"Yuxin Liu, Xiang Zhang, Weiwei Cao, Wenju Cui, Tao Tan, Yuqin Peng, Jiayi Huang, Zhen Lei, Jun Shen, Jian Zheng","doi":"10.1186/s42492-025-00189-8","DOIUrl":null,"url":null,"abstract":"<p><p>Breast cancer is one of the most common malignancies among women globally. Magnetic resonance imaging (MRI), as the final non-invasive diagnostic tool before biopsy, provides detailed free-text reports that support clinical decision-making. Therefore, the effective utilization of the information in MRI reports to make reliable decisions is crucial for patient care. This study proposes a novel method for BI-RADS classification using breast MRI reports. Large language models are employed to transform free-text reports into structured reports. Specifically, missing category information (MCI) that is absent in the free-text reports is supplemented by assigning default values to the missing categories in the structured reports. To ensure data privacy, a locally deployed Qwen-Chat model is employed. Furthermore, to enhance the domain-specific adaptability, a knowledge-driven prompt is designed. The Qwen-7B-Chat model is fine-tuned specifically for structuring breast MRI reports. To prevent information loss and enable comprehensive learning of all report details, a fusion strategy is introduced, combining free-text and structured reports to train the classification model. Experimental results show that the proposed BI-RADS classification method outperforms existing report classification methods across multiple evaluation metrics. Furthermore, an external test set from a different hospital is used to validate the robustness of the proposed approach. The proposed structured method surpasses GPT-4o in terms of performance. Ablation experiments confirm that the knowledge-driven prompt, MCI, and the fusion strategy are crucial to the model's performance.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"8"},"PeriodicalIF":6.0000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11968601/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Computing for Industry Biomedicine and Art","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s42492-025-00189-8","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Breast cancer is one of the most common malignancies among women globally. Magnetic resonance imaging (MRI), as the final non-invasive diagnostic tool before biopsy, provides detailed free-text reports that support clinical decision-making. Therefore, the effective utilization of the information in MRI reports to make reliable decisions is crucial for patient care. This study proposes a novel method for BI-RADS classification using breast MRI reports. Large language models are employed to transform free-text reports into structured reports. Specifically, missing category information (MCI) that is absent in the free-text reports is supplemented by assigning default values to the missing categories in the structured reports. To ensure data privacy, a locally deployed Qwen-Chat model is employed. Furthermore, to enhance the domain-specific adaptability, a knowledge-driven prompt is designed. The Qwen-7B-Chat model is fine-tuned specifically for structuring breast MRI reports. To prevent information loss and enable comprehensive learning of all report details, a fusion strategy is introduced, combining free-text and structured reports to train the classification model. Experimental results show that the proposed BI-RADS classification method outperforms existing report classification methods across multiple evaluation metrics. Furthermore, an external test set from a different hospital is used to validate the robustness of the proposed approach. The proposed structured method surpasses GPT-4o in terms of performance. Ablation experiments confirm that the knowledge-driven prompt, MCI, and the fusion strategy are crucial to the model's performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在乳房磁共振成像报告中使用大语言模型和变压器引导BI-RADS分类。

乳腺癌是全球妇女最常见的恶性肿瘤之一。磁共振成像（MRI）作为活组织检查前的最后一种无创诊断工具，可提供详细的自由文本报告，为临床决策提供支持。因此，有效利用磁共振成像报告中的信息做出可靠的决策对患者护理至关重要。本研究提出了一种利用乳腺 MRI 报告进行 BI-RADS 分类的新方法。该方法采用大型语言模型将自由文本报告转化为结构化报告。具体来说，通过为结构化报告中缺失的类别分配默认值，来补充自由文本报告中缺失的类别信息（MCI）。为确保数据隐私，采用了本地部署的 Qwen-Chat 模型。此外，为了增强特定领域的适应性，还设计了一个知识驱动的提示。Qwen-7B-Chat 模型专门针对结构化乳腺 MRI 报告进行了微调。为防止信息丢失并全面学习所有报告细节，引入了一种融合策略，结合自由文本和结构化报告来训练分类模型。实验结果表明，在多个评估指标上，所提出的 BI-RADS 分类方法优于现有的报告分类方法。此外，还使用了来自不同医院的外部测试集来验证所提方法的鲁棒性。所提出的结构化方法在性能上超过了 GPT-4o。消融实验证实，知识驱动的提示、MCI 和融合策略对模型的性能至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Visual Computing for Industry Biomedicine and Art Multiple-

CiteScore

5.60

自引率

0.00%

发文量