Deep understanding of radiology reports: leveraging dynamic convolution in chest X-ray images

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Technologies and Applications Pub Date : 2023-11-29 DOI:10.1108/dta-07-2023-0307

Tarun Jaiswal, Manju Pandey, Priyanka Tripathi

{"title":"Deep understanding of radiology reports: leveraging dynamic convolution in chest X-ray images","authors":"Tarun Jaiswal, Manju Pandey, Priyanka Tripathi","doi":"10.1108/dta-07-2023-0307","DOIUrl":null,"url":null,"abstract":"<h3>Purpose</h3>\n<p>The purpose of this study is to investigate and demonstrate the advancements achieved in the field of chest X-ray image captioning through the utilization of dynamic convolutional encoder–decoder networks (DyCNN). Typical convolutional neural networks (CNNs) are unable to capture both local and global contextual information effectively and apply a uniform operation to all pixels in an image. To address this, we propose an innovative approach that integrates a dynamic convolution operation at the encoder stage, improving image encoding quality and disease detection. In addition, a decoder based on the gated recurrent unit (GRU) is used for language modeling, and an attention network is incorporated to enhance consistency. This novel combination allows for improved feature extraction, mimicking the expertise of radiologists by selectively focusing on important areas and producing coherent captions with valuable clinical information.</p>\n<h3>Design/methodology/approach</h3>\n<p>In this study, we have presented a new report generation approach that utilizes dynamic convolution applied Resnet-101 (DyCNN) as an encoder (Verelst and Tuytelaars, 2019) and GRU as a decoder (Dey and Salemt, 2017; Pan et al., 2020), along with an attention network (see Figure 1). This integration innovatively extends the capabilities of image encoding and sequential caption generation, representing a shift from conventional CNN architectures. With its ability to dynamically adapt receptive fields, the DyCNN excels at capturing features of varying scales within the CXR images. This dynamic adaptability significantly enhances the granularity of feature extraction, enabling precise representation of localized abnormalities and structural intricacies. By incorporating this flexibility into the encoding process, our model can distil meaningful and contextually rich features from the radiographic data. While the attention mechanism enables the model to selectively focus on different regions of the image during caption generation. The attention mechanism enhances the report generation process by allowing the model to assign different importance weights to different regions of the image, mimicking human perception. In parallel, the GRU-based decoder adds a critical dimension to the process by ensuring a smooth, sequential generation of captions.</p>\n<h3>Findings</h3>\n<p>The findings of this study highlight the significant advancements achieved in chest X-ray image captioning through the utilization of dynamic convolutional encoder–decoder networks (DyCNN). Experiments conducted using the IU-Chest X-ray datasets showed that the proposed model outperformed other state-of-the-art approaches. The model achieved notable scores, including a BLEU_1 score of 0.591, a BLEU_2 score of 0.347, a BLEU_3 score of 0.277 and a BLEU_4 score of 0.155. These results highlight the efficiency and efficacy of the model in producing precise radiology reports, enhancing image interpretation and clinical decision-making.</p>\n<h3>Originality/value</h3>\n<p>This work is the first of its kind, which employs DyCNN as an encoder to extract features from CXR images. In addition, GRU as the decoder for language modeling was utilized and the attention mechanisms into the model architecture were incorporated.</p>","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"8 6","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-07-2023-0307","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

The purpose of this study is to investigate and demonstrate the advancements achieved in the field of chest X-ray image captioning through the utilization of dynamic convolutional encoder–decoder networks (DyCNN). Typical convolutional neural networks (CNNs) are unable to capture both local and global contextual information effectively and apply a uniform operation to all pixels in an image. To address this, we propose an innovative approach that integrates a dynamic convolution operation at the encoder stage, improving image encoding quality and disease detection. In addition, a decoder based on the gated recurrent unit (GRU) is used for language modeling, and an attention network is incorporated to enhance consistency. This novel combination allows for improved feature extraction, mimicking the expertise of radiologists by selectively focusing on important areas and producing coherent captions with valuable clinical information.

Design/methodology/approach

In this study, we have presented a new report generation approach that utilizes dynamic convolution applied Resnet-101 (DyCNN) as an encoder (Verelst and Tuytelaars, 2019) and GRU as a decoder (Dey and Salemt, 2017; Pan et al., 2020), along with an attention network (see Figure 1). This integration innovatively extends the capabilities of image encoding and sequential caption generation, representing a shift from conventional CNN architectures. With its ability to dynamically adapt receptive fields, the DyCNN excels at capturing features of varying scales within the CXR images. This dynamic adaptability significantly enhances the granularity of feature extraction, enabling precise representation of localized abnormalities and structural intricacies. By incorporating this flexibility into the encoding process, our model can distil meaningful and contextually rich features from the radiographic data. While the attention mechanism enables the model to selectively focus on different regions of the image during caption generation. The attention mechanism enhances the report generation process by allowing the model to assign different importance weights to different regions of the image, mimicking human perception. In parallel, the GRU-based decoder adds a critical dimension to the process by ensuring a smooth, sequential generation of captions.

Findings

The findings of this study highlight the significant advancements achieved in chest X-ray image captioning through the utilization of dynamic convolutional encoder–decoder networks (DyCNN). Experiments conducted using the IU-Chest X-ray datasets showed that the proposed model outperformed other state-of-the-art approaches. The model achieved notable scores, including a BLEU_1 score of 0.591, a BLEU_2 score of 0.347, a BLEU_3 score of 0.277 and a BLEU_4 score of 0.155. These results highlight the efficiency and efficacy of the model in producing precise radiology reports, enhancing image interpretation and clinical decision-making.

Originality/value

This work is the first of its kind, which employs DyCNN as an encoder to extract features from CXR images. In addition, GRU as the decoder for language modeling was utilized and the attention mechanisms into the model architecture were incorporated.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

对放射学报告的深刻理解:利用胸部x射线图像的动态卷积

本研究的目的是研究和展示动态卷积编码器-解码器网络(DyCNN)在胸部x线图像字幕领域取得的进展。典型的卷积神经网络(cnn)无法有效地捕获局部和全局上下文信息，并对图像中的所有像素应用统一的操作。为了解决这个问题，我们提出了一种创新的方法，在编码器阶段集成了动态卷积操作，提高了图像编码质量和疾病检测。此外，采用基于门控循环单元(GRU)的解码器进行语言建模，并引入注意网络增强一致性。这种新颖的组合允许改进特征提取，模仿放射科医生的专业知识，选择性地关注重要领域，并产生具有有价值的临床信息的连贯字幕。在本研究中，我们提出了一种新的报告生成方法，该方法利用动态卷积将Resnet-101 (DyCNN)作为编码器(Verelst和Tuytelaars, 2019)和GRU作为解码器(Dey和Salemt, 2017;Pan等人，2020)，以及一个注意力网络(见图1)。这种集成创新地扩展了图像编码和顺序标题生成的能力，代表了传统CNN架构的转变。凭借其动态适应接受域的能力，DyCNN擅长捕捉CXR图像中不同尺度的特征。这种动态适应性显著提高了特征提取的粒度，能够精确地表示局部异常和结构复杂性。通过将这种灵活性纳入编码过程，我们的模型可以从射线照相数据中提取有意义和上下文丰富的特征。而注意机制使模型能够在标题生成过程中选择性地关注图像的不同区域。注意机制通过允许模型为图像的不同区域分配不同的重要权重来增强报告生成过程，模仿人类的感知。与此同时，基于gru的解码器通过确保字幕的流畅、顺序生成，为该过程增加了一个关键维度。本研究的发现强调了通过使用动态卷积编码器-解码器网络(DyCNN)在胸部x线图像字幕方面取得的重大进展。使用u -胸部x射线数据集进行的实验表明，所提出的模型优于其他最先进的方法。模型得分显著，其中BLEU_1得分为0.591,BLEU_2得分为0.347,BLEU_3得分为0.277,BLEU_4得分为0.155。这些结果突出了该模型在生成精确的放射学报告，增强图像解释和临床决策方面的效率和功效。独创性/价值这项工作是同类工作中的第一个，它使用DyCNN作为编码器从CXR图像中提取特征。此外，利用GRU作为语言建模的解码器，并将注意机制纳入模型体系结构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊