Data augmentation based on large language models for radiological report classification

IF 7.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge-Based Systems Pub Date : 2024-11-20 DOI:10.1016/j.knosys.2024.112745
Jaime Collado-Montañez, María-Teresa Martín-Valdivia, Eugenio Martínez-Cámara
{"title":"Data augmentation based on large language models for radiological report classification","authors":"Jaime Collado-Montañez,&nbsp;María-Teresa Martín-Valdivia,&nbsp;Eugenio Martínez-Cámara","doi":"10.1016/j.knosys.2024.112745","DOIUrl":null,"url":null,"abstract":"<div><div>The International Classification of Diseases (ICD) is fundamental in the field of healthcare as it provides a standardized framework for the classification and coding of medical diagnoses and procedures, enabling the understanding of international public health patterns and trends. However, manually classifying medical reports according to this standard is a slow, tedious and error-prone process, which shows the need for automated systems to offload the healthcare professional of this task and to reduce the number of errors. In this paper, we propose an automated classification system based on Natural Language Processing to analyze radiological reports and classify them according to the ICD-10. Since the specialized use of the language of radiological reports and the usual unbalanced distribution of medical report sets, we propose a methodology grounded in leveraging large language models for augmenting the data of unrepresented classes and adapting the classification language models to the specific use of the language of radiological reports. The results show that the proposed methodology enhances the classification performance on the CARES corpus of radiological reports.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"308 ","pages":"Article 112745"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124013790","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The International Classification of Diseases (ICD) is fundamental in the field of healthcare as it provides a standardized framework for the classification and coding of medical diagnoses and procedures, enabling the understanding of international public health patterns and trends. However, manually classifying medical reports according to this standard is a slow, tedious and error-prone process, which shows the need for automated systems to offload the healthcare professional of this task and to reduce the number of errors. In this paper, we propose an automated classification system based on Natural Language Processing to analyze radiological reports and classify them according to the ICD-10. Since the specialized use of the language of radiological reports and the usual unbalanced distribution of medical report sets, we propose a methodology grounded in leveraging large language models for augmenting the data of unrepresented classes and adapting the classification language models to the specific use of the language of radiological reports. The results show that the proposed methodology enhances the classification performance on the CARES corpus of radiological reports.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于大语言模型的数据扩增,用于放射学报告分类
国际疾病分类(ICD)是医疗保健领域的基础,因为它为医疗诊断和程序的分类和编码提供了一个标准化框架,使人们能够了解国际公共卫生模式和趋势。然而,根据这一标准对医疗报告进行人工分类是一个缓慢、乏味且容易出错的过程,这表明需要自动化系统来减轻医疗专业人员的工作负担,并减少错误的发生。在本文中,我们提出了一种基于自然语言处理的自动分类系统,用于分析放射报告并根据 ICD-10 进行分类。由于放射学报告语言的特殊用途以及医疗报告集通常的不均衡分布,我们提出了一种方法,利用大型语言模型来增强未代表类别的数据,并使分类语言模型适应放射学报告语言的特殊用途。结果表明,所提出的方法提高了 CARES 放射报告语料库的分类性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
期刊最新文献
Progressive de-preference task-specific processing for generalizable person re-identification GKA-GPT: Graphical knowledge aggregation for multiturn dialog generation A novel spatio-temporal feature interleaved contrast learning neural network from a robustness perspective PSNet: A non-uniform illumination correction method for underwater images based pseudo-siamese network A novel domain-private-suppress meta-recognition network based universal domain generalization for machinery fault diagnosis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1