Manas Ranjan Mohanty, Pradeep Kumar Mallick, Annapareddy V N Reddy
{"title":"利用堆叠特征集合和swin变换器集成优化肺部胸部X光片分类。","authors":"Manas Ranjan Mohanty, Pradeep Kumar Mallick, Annapareddy V N Reddy","doi":"10.1088/2057-1976/ad8c46","DOIUrl":null,"url":null,"abstract":"<p><p>This research presents an integrated framework designed to automate the classification of pulmonary chest x-ray images. Leveraging convolutional neural networks (CNNs) with a focus on transformer architectures, the aim is to improve both the accuracy and efficiency of pulmonary chest x-ray image analysis. A central aspect of this approach involves utilizing pre-trained networks such as VGG16, ResNet50, and MobileNetV2 to create a feature ensemble. A notable innovation is the adoption of a stacked ensemble technique, which combines outputs from multiple pre-trained models to generate a comprehensive feature representation. In the feature ensemble approach, each image undergoes individual processing through the three pre-trained networks, and pooled images are extracted just before the flatten layer of each model. Consequently, three pooled images in 2D grayscale format are obtained for each original image. These pooled images serve as samples for creating 3D images resembling RGB images through stacking, intended for classifier input in subsequent analysis stages. By incorporating stacked pooling layers to facilitate feature ensemble, a broader range of features is utilized while effectively managing complexities associated with processing the augmented feature pool. Moreover, the study incorporates the Swin Transformer architecture, known for effectively capturing both local and global features. The Swin Transformer architecture is further optimized using the artificial hummingbird algorithm (AHA). By fine-tuning hyperparameters such as patch size, multi-layer perceptron (MLP) ratio, and channel numbers, the AHA optimization technique aims to maximize classification accuracy. The proposed integrated framework, featuring the AHA-optimized Swin Transformer classifier utilizing stacked features, is evaluated using three diverse chest x-ray datasets-VinDr-CXR, PediCXR, and MIMIC-CXR. The observed accuracies of 98.874%, 98.528%, and 98.958% respectively, underscore the robustness and generalizability of the developed model across various clinical scenarios and imaging conditions.</p>","PeriodicalId":8896,"journal":{"name":"Biomedical Physics & Engineering Express","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing pulmonary chest x-ray classification with stacked feature ensemble and swin transformer integration.\",\"authors\":\"Manas Ranjan Mohanty, Pradeep Kumar Mallick, Annapareddy V N Reddy\",\"doi\":\"10.1088/2057-1976/ad8c46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This research presents an integrated framework designed to automate the classification of pulmonary chest x-ray images. Leveraging convolutional neural networks (CNNs) with a focus on transformer architectures, the aim is to improve both the accuracy and efficiency of pulmonary chest x-ray image analysis. A central aspect of this approach involves utilizing pre-trained networks such as VGG16, ResNet50, and MobileNetV2 to create a feature ensemble. A notable innovation is the adoption of a stacked ensemble technique, which combines outputs from multiple pre-trained models to generate a comprehensive feature representation. In the feature ensemble approach, each image undergoes individual processing through the three pre-trained networks, and pooled images are extracted just before the flatten layer of each model. Consequently, three pooled images in 2D grayscale format are obtained for each original image. These pooled images serve as samples for creating 3D images resembling RGB images through stacking, intended for classifier input in subsequent analysis stages. By incorporating stacked pooling layers to facilitate feature ensemble, a broader range of features is utilized while effectively managing complexities associated with processing the augmented feature pool. Moreover, the study incorporates the Swin Transformer architecture, known for effectively capturing both local and global features. The Swin Transformer architecture is further optimized using the artificial hummingbird algorithm (AHA). By fine-tuning hyperparameters such as patch size, multi-layer perceptron (MLP) ratio, and channel numbers, the AHA optimization technique aims to maximize classification accuracy. The proposed integrated framework, featuring the AHA-optimized Swin Transformer classifier utilizing stacked features, is evaluated using three diverse chest x-ray datasets-VinDr-CXR, PediCXR, and MIMIC-CXR. The observed accuracies of 98.874%, 98.528%, and 98.958% respectively, underscore the robustness and generalizability of the developed model across various clinical scenarios and imaging conditions.</p>\",\"PeriodicalId\":8896,\"journal\":{\"name\":\"Biomedical Physics & Engineering Express\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Physics & Engineering Express\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1088/2057-1976/ad8c46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Physics & Engineering Express","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2057-1976/ad8c46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
摘要
这项研究提出了一个综合框架,旨在自动对肺部胸部 X 光图像进行分类。利用以变压器架构为重点的卷积神经网络 (CNN),旨在提高肺部胸部 X 光图像分析的准确性和效率。这种方法的核心是利用 VGG16、ResNet50 和 MobileNetV2 等预先训练好的网络来创建特征集合。一个值得注意的创新是采用了堆叠集合技术,将多个预训练模型的输出结果结合起来,生成一个综合的特征表示。在特征集合方法中,每幅图像都要经过三个预训练网络的单独处理,并在每个模型的扁平化层之前提取集合图像。因此,每张原始图像都会得到三张二维灰度格式的集合图像。这些汇集图像可作为样本,通过堆叠创建类似于 RGB 图像的三维图像,用于后续分析阶段的分类器输入。通过采用堆叠集合层来促进特征集合,可以利用更广泛的特征,同时有效管理与处理增强特征池相关的复杂性。此外,这项研究还采用了 Swin Transformer 架构,该架构以有效捕捉局部和全局特征而著称。利用人工蜂鸟算法(AHA)进一步优化了 Swin Transformer 架构。通过微调补丁大小、多层感知器(MLP)比例和通道数等超参数,AHA 优化技术旨在最大限度地提高分类准确性。利用堆叠特征的 AHA 优化 Swin Transformer 分类器,所提出的集成框架通过三个不同的胸部 X 光数据集进行了评估:VinDr-CXR、PediCXR 和 MIMIC-CXR。观察到的准确率分别为 98.874%、98.528% 和 98.958%,这凸显了所开发模型在各种临床场景和成像条件下的稳健性和通用性。
Optimizing pulmonary chest x-ray classification with stacked feature ensemble and swin transformer integration.
This research presents an integrated framework designed to automate the classification of pulmonary chest x-ray images. Leveraging convolutional neural networks (CNNs) with a focus on transformer architectures, the aim is to improve both the accuracy and efficiency of pulmonary chest x-ray image analysis. A central aspect of this approach involves utilizing pre-trained networks such as VGG16, ResNet50, and MobileNetV2 to create a feature ensemble. A notable innovation is the adoption of a stacked ensemble technique, which combines outputs from multiple pre-trained models to generate a comprehensive feature representation. In the feature ensemble approach, each image undergoes individual processing through the three pre-trained networks, and pooled images are extracted just before the flatten layer of each model. Consequently, three pooled images in 2D grayscale format are obtained for each original image. These pooled images serve as samples for creating 3D images resembling RGB images through stacking, intended for classifier input in subsequent analysis stages. By incorporating stacked pooling layers to facilitate feature ensemble, a broader range of features is utilized while effectively managing complexities associated with processing the augmented feature pool. Moreover, the study incorporates the Swin Transformer architecture, known for effectively capturing both local and global features. The Swin Transformer architecture is further optimized using the artificial hummingbird algorithm (AHA). By fine-tuning hyperparameters such as patch size, multi-layer perceptron (MLP) ratio, and channel numbers, the AHA optimization technique aims to maximize classification accuracy. The proposed integrated framework, featuring the AHA-optimized Swin Transformer classifier utilizing stacked features, is evaluated using three diverse chest x-ray datasets-VinDr-CXR, PediCXR, and MIMIC-CXR. The observed accuracies of 98.874%, 98.528%, and 98.958% respectively, underscore the robustness and generalizability of the developed model across various clinical scenarios and imaging conditions.
期刊介绍:
BPEX is an inclusive, international, multidisciplinary journal devoted to publishing new research on any application of physics and/or engineering in medicine and/or biology. Characterized by a broad geographical coverage and a fast-track peer-review process, relevant topics include all aspects of biophysics, medical physics and biomedical engineering. Papers that are almost entirely clinical or biological in their focus are not suitable. The journal has an emphasis on publishing interdisciplinary work and bringing research fields together, encompassing experimental, theoretical and computational work.