{"title":"Integrating language into medical visual recognition and reasoning: A survey","authors":"Yinbin Lu , Alan Wang","doi":"10.1016/j.media.2025.103514","DOIUrl":null,"url":null,"abstract":"<div><div>Vision-Language Models (VLMs) are regarded as efficient paradigms that build a bridge between visual perception and textual interpretation. For medical visual tasks, they can benefit from expert observation and physician knowledge extracted from textual context, thereby improving the visual understanding of models. Motivated by the fact that extensive medical reports are commonly attached to medical imaging, medical VLMs have triggered more and more interest, serving not only as self-supervised learning in the pretraining stage but also as a means to introduce auxiliary information into medical visual perception. To strengthen the understanding of such a promising direction, this survey aims to provide an in-depth exploration and review of medical VLMs for various visual recognition and reasoning tasks. Firstly, we present an introduction to medical VLMs. Then, we provide preliminaries and delve into how to exploit language in medical visual tasks from diverse perspectives. Further, we investigate publicly available VLM datasets and discuss the challenges and future perspectives. We expect that the comprehensive discussion about state-of-the-art medical VLMs will make researchers realize their significant potential.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"102 ","pages":"Article 103514"},"PeriodicalIF":10.7000,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525000623","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Vision-Language Models (VLMs) are regarded as efficient paradigms that build a bridge between visual perception and textual interpretation. For medical visual tasks, they can benefit from expert observation and physician knowledge extracted from textual context, thereby improving the visual understanding of models. Motivated by the fact that extensive medical reports are commonly attached to medical imaging, medical VLMs have triggered more and more interest, serving not only as self-supervised learning in the pretraining stage but also as a means to introduce auxiliary information into medical visual perception. To strengthen the understanding of such a promising direction, this survey aims to provide an in-depth exploration and review of medical VLMs for various visual recognition and reasoning tasks. Firstly, we present an introduction to medical VLMs. Then, we provide preliminaries and delve into how to exploit language in medical visual tasks from diverse perspectives. Further, we investigate publicly available VLM datasets and discuss the challenges and future perspectives. We expect that the comprehensive discussion about state-of-the-art medical VLMs will make researchers realize their significant potential.
期刊介绍:
Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.