Objective:
Radiology report generation (RRG) is a transformative technology in the field of radiology imaging that aims to address the critical need for consistency and comprehensiveness in diagnostic interpretation. Although recent advances in graph-based representation learning have demonstrated excellent performance in disease progression modeling, their application in radiology report generation still suffers from three inherent limitations: (i) semantic separation between local image features and free-text descriptions, (ii) inherent noise in automated medical concept annotation, and (iii) lack of anatomical constraints in cross-modal attention mechanisms.
Method:
This study proposes a pseudo-label and knowledge-guided comparative learning (PKCL) framework, which addresses the above issues through a novel fusion of dynamic query learning and knowledge-guided contrastive learning. The PKCL framework employs a trainable cross-modal query matrix (QM) to learn shared representations through parameter-sharing self-attention mechanisms between imaging and text encoders. The QM is used during training to query disease-related visual regions in reports and enables dynamic alignment between radiological features and textual descriptions during both training and inference. Additionally, this method combines pseudo labels with an adaptive top-k weighted feature fusion strategy to enhance learning from standard comparisons and leverages pre-built knowledge graphs via the XRayVision (Cohen et al., 2022) model to account for disease relationships and anatomical dependencies, thereby improving the clinical accuracy of generated reports.
Results:
Comprehensive evaluations on the IU-Xray and MIMIC-CXR datasets demonstrate that PKCL achieves state-of-the-art performance on both natural language generation metrics and clinical efficacy metrics. Specifically, it obtains 0.499 BLEU-1 and 0.374 RL on IU-Xray, and 0.346 BLEU-1 and 0.277 RL on MIMIC-CXR, outperforming prior methods such as R2GEN and CMCL.
Furthermore, PKCL exhibited robust generalization on the out-of-domain Montgomery County X-ray Set, effectively handling its low-resource conditions and brief, diagnostic-level textual supervision.
Conclusion:
The framework’s ability to maintain semantic consistency when generating clinically relevant reports represents a significant advancement over existing methods, particularly in capturing the subtle relationships between radiological findings and their textual descriptions.
扫码关注我们
求助内容:
应助结果提醒方式:
