Data augmentation based on large language models for radiological report classification

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge-Based Systems Pub Date : 2025-01-10 Epub Date: 2024-11-20 DOI:10.1016/j.knosys.2024.112745

Jaime Collado-Montañez, María-Teresa Martín-Valdivia, Eugenio Martínez-Cámara

{"title":"Data augmentation based on large language models for radiological report classification","authors":"Jaime Collado-Montañez, María-Teresa Martín-Valdivia, Eugenio Martínez-Cámara","doi":"10.1016/j.knosys.2024.112745","DOIUrl":null,"url":null,"abstract":"<div><div>The International Classification of Diseases (ICD) is fundamental in the field of healthcare as it provides a standardized framework for the classification and coding of medical diagnoses and procedures, enabling the understanding of international public health patterns and trends. However, manually classifying medical reports according to this standard is a slow, tedious and error-prone process, which shows the need for automated systems to offload the healthcare professional of this task and to reduce the number of errors. In this paper, we propose an automated classification system based on Natural Language Processing to analyze radiological reports and classify them according to the ICD-10. Since the specialized use of the language of radiological reports and the usual unbalanced distribution of medical report sets, we propose a methodology grounded in leveraging large language models for augmenting the data of unrepresented classes and adapting the classification language models to the specific use of the language of radiological reports. The results show that the proposed methodology enhances the classification performance on the CARES corpus of radiological reports.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"308 ","pages":"Article 112745"},"PeriodicalIF":7.6000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124013790","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The International Classification of Diseases (ICD) is fundamental in the field of healthcare as it provides a standardized framework for the classification and coding of medical diagnoses and procedures, enabling the understanding of international public health patterns and trends. However, manually classifying medical reports according to this standard is a slow, tedious and error-prone process, which shows the need for automated systems to offload the healthcare professional of this task and to reduce the number of errors. In this paper, we propose an automated classification system based on Natural Language Processing to analyze radiological reports and classify them according to the ICD-10. Since the specialized use of the language of radiological reports and the usual unbalanced distribution of medical report sets, we propose a methodology grounded in leveraging large language models for augmenting the data of unrepresented classes and adapting the classification language models to the specific use of the language of radiological reports. The results show that the proposed methodology enhances the classification performance on the CARES corpus of radiological reports.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于大语言模型的数据扩增，用于放射学报告分类

国际疾病分类（ICD）是医疗保健领域的基础，因为它为医疗诊断和程序的分类和编码提供了一个标准化框架，使人们能够了解国际公共卫生模式和趋势。然而，根据这一标准对医疗报告进行人工分类是一个缓慢、乏味且容易出错的过程，这表明需要自动化系统来减轻医疗专业人员的工作负担，并减少错误的发生。在本文中，我们提出了一种基于自然语言处理的自动分类系统，用于分析放射报告并根据 ICD-10 进行分类。由于放射学报告语言的特殊用途以及医疗报告集通常的不均衡分布，我们提出了一种方法，利用大型语言模型来增强未代表类别的数据，并使分类语言模型适应放射学报告语言的特殊用途。结果表明，所提出的方法提高了 CARES 放射报告语料库的分类性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.