开发大规模医学视觉问题解答数据集

IF 5.4 Q1 MEDICINE, RESEARCH & EXPERIMENTAL Communications medicine Pub Date : 2024-12-21 DOI:10.1038/s43856-024-00709-2
Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya Zhang, Yanfeng Wang, Weidi Xie
{"title":"开发大规模医学视觉问题解答数据集","authors":"Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya Zhang, Yanfeng Wang, Weidi Xie","doi":"10.1038/s43856-024-00709-2","DOIUrl":null,"url":null,"abstract":"Medical Visual Question Answering (MedVQA) enhances diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret medical images. This study aims to redefine MedVQA as a generation task that mirrors human–machine interaction and to develop a model capable of integrating complex visual and textual information. We constructed a large-scale medical visual-question answering dataset, PMC-VQA, containing 227,000 VQA pairs across 149,000 images that span various modalities and diseases. We introduced a generative model that aligns visual information from a pre-trained vision encoder with a large language model. This model was initially trained on PMC-VQA and subsequently fine-tuned on multiple public benchmarks. Here, we show that our model significantly outperforms existing MedVQA models in generating relevant, accurate free-form answers. We also propose a manually verified test set that presents a greater challenge and serves as a robust measure to monitor the advancement of generative MedVQA methods. The PMC-VQA dataset proves to be an essential resource for the research community, and our model marks a significant breakthrough in MedVQA. We maintain a leaderboard to facilitate comprehensive evaluation and comparison, providing a centralized resource for benchmarking state-of-the-art approaches. Medical images play a crucial role in healthcare, but interpreting them accurately can be challenging. This study developed an artificial intelligence system that can answer questions about medical images, similar to how a medical expert would explain findings to patients. We created a large collection of medical images paired with questions and answers to train our AI system, covering various types of medical scans and conditions. Our system can generate detailed, accurate responses to questions about medical images, performing better than existing approaches. The system and dataset we developed are freely available to researchers, which should help advance the field of medical image interpretation and ultimately improve healthcare delivery. Zhang et al. investigate how the large body of publicly available images from the biomedical domain can be used to generate a new medical visual question-answering dataset. Along with the resulting benchmark dataset, the authors propose a novel visual-language model and compare its performance against existing approaches.","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":" ","pages":"1-13"},"PeriodicalIF":5.4000,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43856-024-00709-2.pdf","citationCount":"0","resultStr":"{\"title\":\"Development of a large-scale medical visual question-answering dataset\",\"authors\":\"Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya Zhang, Yanfeng Wang, Weidi Xie\",\"doi\":\"10.1038/s43856-024-00709-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Medical Visual Question Answering (MedVQA) enhances diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret medical images. This study aims to redefine MedVQA as a generation task that mirrors human–machine interaction and to develop a model capable of integrating complex visual and textual information. We constructed a large-scale medical visual-question answering dataset, PMC-VQA, containing 227,000 VQA pairs across 149,000 images that span various modalities and diseases. We introduced a generative model that aligns visual information from a pre-trained vision encoder with a large language model. This model was initially trained on PMC-VQA and subsequently fine-tuned on multiple public benchmarks. Here, we show that our model significantly outperforms existing MedVQA models in generating relevant, accurate free-form answers. We also propose a manually verified test set that presents a greater challenge and serves as a robust measure to monitor the advancement of generative MedVQA methods. The PMC-VQA dataset proves to be an essential resource for the research community, and our model marks a significant breakthrough in MedVQA. We maintain a leaderboard to facilitate comprehensive evaluation and comparison, providing a centralized resource for benchmarking state-of-the-art approaches. Medical images play a crucial role in healthcare, but interpreting them accurately can be challenging. This study developed an artificial intelligence system that can answer questions about medical images, similar to how a medical expert would explain findings to patients. We created a large collection of medical images paired with questions and answers to train our AI system, covering various types of medical scans and conditions. Our system can generate detailed, accurate responses to questions about medical images, performing better than existing approaches. The system and dataset we developed are freely available to researchers, which should help advance the field of medical image interpretation and ultimately improve healthcare delivery. Zhang et al. investigate how the large body of publicly available images from the biomedical domain can be used to generate a new medical visual question-answering dataset. Along with the resulting benchmark dataset, the authors propose a novel visual-language model and compare its performance against existing approaches.\",\"PeriodicalId\":72646,\"journal\":{\"name\":\"Communications medicine\",\"volume\":\" \",\"pages\":\"1-13\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2024-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.nature.com/articles/s43856-024-00709-2.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.nature.com/articles/s43856-024-00709-2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, RESEARCH & EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43856-024-00709-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

摘要

医学视觉问答(MedVQA)通过利用人工智能来解释医学图像,提高了诊断的准确性和医疗保健服务。本研究旨在将MedVQA重新定义为反映人机交互的生成任务,并开发能够集成复杂视觉和文本信息的模型。我们构建了一个大规模的医学视觉问答数据集PMC-VQA,包含227,000个VQA对,跨越149,000张图像,涵盖各种模式和疾病。我们引入了一个生成模型,该模型将来自预训练的视觉编码器的视觉信息与大型语言模型对齐。该模型最初在PMC-VQA上进行了训练,随后在多个公共基准上进行了微调。在这里,我们证明了我们的模型在生成相关的、准确的自由格式答案方面明显优于现有的MedVQA模型。我们还提出了一个手动验证的测试集,它提出了更大的挑战,并作为监测生成式MedVQA方法进步的稳健措施。PMC-VQA数据集被证明是研究界的重要资源,我们的模型标志着MedVQA的重大突破。我们维护一个排行榜,以促进全面的评估和比较,为最先进的方法提供一个集中的基准资源。医学图像在医疗保健中发挥着至关重要的作用,但准确地解释它们可能具有挑战性。这项研究开发了一种人工智能系统,可以回答有关医学图像的问题,类似于医学专家向患者解释发现的方式。我们创建了大量的医学图像,并与问题和答案配对,以训练我们的人工智能系统,涵盖各种类型的医学扫描和条件。我们的系统可以对有关医学图像的问题生成详细、准确的回答,比现有的方法表现得更好。我们开发的系统和数据集可供研究人员免费使用,这将有助于推进医学图像解释领域,并最终改善医疗保健服务。Zhang等人研究了如何使用来自生物医学领域的大量公开可用图像来生成新的医学视觉问答数据集。随着得到的基准数据集,作者提出了一种新的视觉语言模型,并将其与现有方法的性能进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Development of a large-scale medical visual question-answering dataset
Medical Visual Question Answering (MedVQA) enhances diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret medical images. This study aims to redefine MedVQA as a generation task that mirrors human–machine interaction and to develop a model capable of integrating complex visual and textual information. We constructed a large-scale medical visual-question answering dataset, PMC-VQA, containing 227,000 VQA pairs across 149,000 images that span various modalities and diseases. We introduced a generative model that aligns visual information from a pre-trained vision encoder with a large language model. This model was initially trained on PMC-VQA and subsequently fine-tuned on multiple public benchmarks. Here, we show that our model significantly outperforms existing MedVQA models in generating relevant, accurate free-form answers. We also propose a manually verified test set that presents a greater challenge and serves as a robust measure to monitor the advancement of generative MedVQA methods. The PMC-VQA dataset proves to be an essential resource for the research community, and our model marks a significant breakthrough in MedVQA. We maintain a leaderboard to facilitate comprehensive evaluation and comparison, providing a centralized resource for benchmarking state-of-the-art approaches. Medical images play a crucial role in healthcare, but interpreting them accurately can be challenging. This study developed an artificial intelligence system that can answer questions about medical images, similar to how a medical expert would explain findings to patients. We created a large collection of medical images paired with questions and answers to train our AI system, covering various types of medical scans and conditions. Our system can generate detailed, accurate responses to questions about medical images, performing better than existing approaches. The system and dataset we developed are freely available to researchers, which should help advance the field of medical image interpretation and ultimately improve healthcare delivery. Zhang et al. investigate how the large body of publicly available images from the biomedical domain can be used to generate a new medical visual question-answering dataset. Along with the resulting benchmark dataset, the authors propose a novel visual-language model and compare its performance against existing approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Early cardio-oncology intervention in thoracic radiotherapy: prospective single-arm pilot study. Differences in walking access to healthcare facilities between formal and informal areas in 19 sub-Saharan African cities. Multiple long-term conditions as the next transition in the global diabetes epidemic. An axis-specific mitral annuloplasty ring eliminates mitral regurgitation allowing mitral annular motion in an ovine model. Awareness of human microbiome may promote healthier lifestyle and more positive environmental attitudes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1