The increasing number of computed tomography (CT) scan examinations and the time-intensive nature of manual analysis necessitate efficient automated methods to assist radiologists in managing their increasing workload. While deep learning approaches primarily classify abnormalities from three-dimensional (3D) CT images, radiologists also incorporate clinical indications and patient demographics, such as age and sex, for diagnosis. This study aims to enhance multilabel abnormality classification and automated report generation by integrating imaging and non-imaging data. We propose a multimodal deep learning model that combines 3D chest CT scans, clinical information reports, patient age, and sex to improve diagnostic accuracy. Our method extracts visual features from 3D volumes using a visual encoder, textual features from clinical indications via a pretrained language model, and demographic features through a lightweight feedforward neural network. These extracted features are projected into a shared representation space, concatenated, and processed by a projection head to predict abnormalities. For the multilabel classification task, incorporating clinical indications and patient demographics into an existing visual encoder, called CT-Net, improves the F1 score to 51.58, representing a increase over CT-Net alone. For the automated report generation task, we extend two existing methods, CT2Rep and CT-AGRG, by integrating clinical indications and demographic data. This integration enhances Clinical Efficacy metrics, yielding an F1 score improvement of for the CT2Rep extension and for the CT-AGRG extension. Our findings suggest that incorporating patient demographics and clinical information into deep learning frameworks can significantly improve automated CT scan analysis. This approach has the potential to enhance radiological workflows and facilitate more comprehensive and accurate abnormality detection in clinical practice.
扫码关注我们
求助内容:
应助结果提醒方式:
