xr - llava:用于解释胸部x线图像的多模态大语言模型。

IF 4.7 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING European Radiology Pub Date : 2025-01-15 DOI:10.1007/s00330-024-11339-6
Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, Soon Ho Yoon
{"title":"xr - llava:用于解释胸部x线图像的多模态大语言模型。","authors":"Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, Soon Ho Yoon","doi":"10.1007/s00330-024-11339-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aimed to develop an open-source multimodal large language model (CXR-LLaVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists.</p><p><strong>Materials and methods: </strong>For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLaVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting.</p><p><strong>Results: </strong>The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.56 for six major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports.</p><p><strong>Conclusion: </strong>This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts.</p><p><strong>Key points: </strong>Question How can a multimodal large language model be adapted to interpret chest X-rays and generate radiologic reports? Findings The developed CXR-LLaVA model effectively detects major pathological findings in chest X-rays and generates radiologic reports with a higher accuracy compared to general-purpose models. Clinical relevance This study demonstrates the potential of multimodal large language models to support radiologists by autonomously generating chest X-ray reports, potentially reducing diagnostic workloads and improving radiologist efficiency.</p>","PeriodicalId":12076,"journal":{"name":"European Radiology","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CXR-LLaVA: a multimodal large language model for interpreting chest X-ray images.\",\"authors\":\"Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, Soon Ho Yoon\",\"doi\":\"10.1007/s00330-024-11339-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>This study aimed to develop an open-source multimodal large language model (CXR-LLaVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists.</p><p><strong>Materials and methods: </strong>For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLaVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting.</p><p><strong>Results: </strong>The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.56 for six major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports.</p><p><strong>Conclusion: </strong>This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts.</p><p><strong>Key points: </strong>Question How can a multimodal large language model be adapted to interpret chest X-rays and generate radiologic reports? Findings The developed CXR-LLaVA model effectively detects major pathological findings in chest X-rays and generates radiologic reports with a higher accuracy compared to general-purpose models. Clinical relevance This study demonstrates the potential of multimodal large language models to support radiologists by autonomously generating chest X-ray reports, potentially reducing diagnostic workloads and improving radiologist efficiency.</p>\",\"PeriodicalId\":12076,\"journal\":{\"name\":\"European Radiology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s00330-024-11339-6\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00330-024-11339-6","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究旨在开发一个开源的多模态大型语言模型(CXR-LLaVA),用于解释胸部x射线图像(cxr),利用大型语言模型(llm)的最新进展,有可能复制人类放射科医生的图像解释技能。材料和方法:对于训练,我们收集了592,580个公开可用的cxr,其中374,881个具有特定放射学异常的标签(数据集1),217,699个提供自由文本放射学报告(数据集2)。在使用数据集1预训练视觉转换器后,我们将其与受LLaVA网络影响的LLM集成。然后,对模型进行微调,主要使用数据集2。评估了该模型对主要病理发现的诊断性能,以及人类放射科医生对放射学报告的可接受性,以衡量其自主报告的潜力。结果:该模型在测试集中表现出色,在MIMIC内部测试集中,6项主要病理结果的F1平均得分为0.81,在外部测试集中,6项主要病理结果的F1平均得分为0.56。在两组测试中,该模型的F1分数都超过了GPT-4-vision和Gemini-Pro-Vision。在人类放射科医生对外部测试集的评估中,该模型在自主报告中取得了72.7%的成功率,略低于地面真实报告的84.0%。结论:本研究强调了多模式llm在CXR解释方面的巨大潜力,同时也承认了其性能局限性。尽管存在这些挑战,但我们相信将我们的模型开源将促进进一步的研究,扩大其在各种临床环境中的有效性和适用性。多模态大语言模型如何适用于解释胸部x光片并生成放射学报告?与通用模型相比,开发的CXR-LLaVA模型可以有效地检测胸部x射线中的主要病理发现,并生成更高准确性的放射学报告。本研究证明了多模态大语言模型的潜力,通过自主生成胸部x射线报告来支持放射科医生,有可能减少诊断工作量并提高放射科医生的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CXR-LLaVA: a multimodal large language model for interpreting chest X-ray images.

Objective: This study aimed to develop an open-source multimodal large language model (CXR-LLaVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists.

Materials and methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLaVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting.

Results: The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.56 for six major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports.

Conclusion: This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts.

Key points: Question How can a multimodal large language model be adapted to interpret chest X-rays and generate radiologic reports? Findings The developed CXR-LLaVA model effectively detects major pathological findings in chest X-rays and generates radiologic reports with a higher accuracy compared to general-purpose models. Clinical relevance This study demonstrates the potential of multimodal large language models to support radiologists by autonomously generating chest X-ray reports, potentially reducing diagnostic workloads and improving radiologist efficiency.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
European Radiology
European Radiology 医学-核医学
CiteScore
11.60
自引率
8.50%
发文量
874
审稿时长
2-4 weeks
期刊介绍: European Radiology (ER) continuously updates scientific knowledge in radiology by publication of strong original articles and state-of-the-art reviews written by leading radiologists. A well balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes ER an indispensable source for current information in this field. This is the Journal of the European Society of Radiology, and the official journal of a number of societies. From 2004-2008 supplements to European Radiology were published under its companion, European Radiology Supplements, ISSN 1613-3749.
期刊最新文献
Dual-energy CT iodine concentration as a biomarker for immunotherapy treatment response in metastatic melanoma and renal cell carcinoma patients. Thank you to our 2024 reviewers! AI-powered FDG-PET radiomics: a door to better Alzheimer's disease classification? Comparison of the diagnostic performance of non-contrast MR angiography and planar V/Q scintigraphy for pulmonary embolism: a systematic review and meta-analysis. Abdominopelvic imaging in the follow-up of testicular germ-cell tumors in adults: recommendations of the Scrotal and Penile Imaging Working Group of the European Society of Urogenital Radiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1