{"title":"Study-level cross-modal retrieval of chest x-ray images and reports with adapter-based fine-tuning.","authors":"Yingjie Chen, Weihua Ou, Zhifan Gao, Lingge Lai, Yang Wu, Qianqian Chen","doi":"10.1088/1361-6560/adaf05","DOIUrl":null,"url":null,"abstract":"<p><p>Cross-modal retrieval is crucial for improving clinical decision-making and report generation. However, current technologies mainly focus on linking single images with reports, ignoring the need to comprehensively observe multiple images in real clinical environments. Additionally, differences in imaging equipment, scanning parameters, geographic regions, and reporting styles in chest x-rays and reports cause inconsistent data distributions, which challenge model reliability and generalization. To address these challenges, we propose a study-level cross-modal retrieval task for chest x-rays and reports to better meet clinical needs. Our study-level approach involves cross-modal retrieval between multiple images and reports from patient exams. Given a set of study-level images or reports, our method retrieves relevant reports or images from a database, providing a more realistic reflection of clinical scenarios compared to traditional methods that link single images with reports. Furthermore, we introduce an adapter-based pre-training and fine-tuning method to enhance model generalization across diverse data distributions. Through comprehensive experiments, we demonstrate the advantages of our method in pre-training and fine-tuning. In the pre-training stage, we compare our method with the latest techniques, showing the effectiveness of integrating study-level image features using a vision transformer and aligning them with report features. In the fine-tuning stage, we compare the adapter-based fine-tuning method with the latest methods of full-parameter fine-tuning and conduct ablation studies with common head-based and full-parameter fine-tuning methods, proving our method's efficiency and significant potential for practical clinical applications. This study proposes a study-level cross-modal retrieval task for matching chest x-ray images and reports. By employing a pre-training and fine-tuning strategy with adapter modules, it addresses the issue of data distribution inconsistency and improves retrieval performance.</p>","PeriodicalId":20185,"journal":{"name":"Physics in medicine and biology","volume":"70 4","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physics in medicine and biology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1088/1361-6560/adaf05","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Cross-modal retrieval is crucial for improving clinical decision-making and report generation. However, current technologies mainly focus on linking single images with reports, ignoring the need to comprehensively observe multiple images in real clinical environments. Additionally, differences in imaging equipment, scanning parameters, geographic regions, and reporting styles in chest x-rays and reports cause inconsistent data distributions, which challenge model reliability and generalization. To address these challenges, we propose a study-level cross-modal retrieval task for chest x-rays and reports to better meet clinical needs. Our study-level approach involves cross-modal retrieval between multiple images and reports from patient exams. Given a set of study-level images or reports, our method retrieves relevant reports or images from a database, providing a more realistic reflection of clinical scenarios compared to traditional methods that link single images with reports. Furthermore, we introduce an adapter-based pre-training and fine-tuning method to enhance model generalization across diverse data distributions. Through comprehensive experiments, we demonstrate the advantages of our method in pre-training and fine-tuning. In the pre-training stage, we compare our method with the latest techniques, showing the effectiveness of integrating study-level image features using a vision transformer and aligning them with report features. In the fine-tuning stage, we compare the adapter-based fine-tuning method with the latest methods of full-parameter fine-tuning and conduct ablation studies with common head-based and full-parameter fine-tuning methods, proving our method's efficiency and significant potential for practical clinical applications. This study proposes a study-level cross-modal retrieval task for matching chest x-ray images and reports. By employing a pre-training and fine-tuning strategy with adapter modules, it addresses the issue of data distribution inconsistency and improves retrieval performance.
期刊介绍:
The development and application of theoretical, computational and experimental physics to medicine, physiology and biology. Topics covered are: therapy physics (including ionizing and non-ionizing radiation); biomedical imaging (e.g. x-ray, magnetic resonance, ultrasound, optical and nuclear imaging); image-guided interventions; image reconstruction and analysis (including kinetic modelling); artificial intelligence in biomedical physics and analysis; nanoparticles in imaging and therapy; radiobiology; radiation protection and patient dose monitoring; radiation dosimetry