{"title":"Focal modulation network for lung segmentation in chest X-ray images","authors":"ŞABAN ÖZTÜRK, TOLGA ÇUKUR","doi":"10.55730/1300-0632.4031","DOIUrl":null,"url":null,"abstract":": Segmentation of lung regions is of key importance for the automatic analysis of Chest X-Ray (CXR) images, which have a vital role in the detection of various pulmonary diseases. Precise identification of lung regions is the basic prerequisite for disease diagnosis and treatment planning. However, achieving precise lung segmentation poses significant challenges due to factors such as variations in anatomical shape and size, the presence of strong edges at the rib cage and clavicle, and overlapping anatomical structures resulting from diverse diseases. Although commonly considered as the de-facto standard in medical image segmentation, the convolutional UNet architecture and its variants fall short in addressing these challenges, primarily due to the limited ability to model long-range dependencies between image features. While vision transformers equipped with self-attention mechanisms excel at capturing long-range relationships, either a coarse-grained global self-attention or a fine-grained local self-attention is typically adopted for segmentation tasks on high-resolution images to alleviate quadratic computational cost at the expense of performance loss. This paper introduces a focal modulation UNet model (FMN-UNet) to enhance segmentation performance by effectively aggregating fine-grained local and coarse-grained global relations at a reasonable computational cost. FMN-UNet first encodes CXR images via a convolutional encoder to suppress background regions and extract latent feature maps at a relatively modest resolution. FMN-UNet then leverages global and local attention mechanisms to model contextual relationships across the images. These contextual feature maps are convolutionally decoded to produce segmentation masks. The segmentation performance of FMN-UNet is compared against state-of-the-art methods on three public CXR datasets (JSRT, Montgomery, and Shenzhen). Experiments in each dataset demonstrate the superior performance of FMN-UNet against baselines.","PeriodicalId":49410,"journal":{"name":"Turkish Journal of Electrical Engineering and Computer Sciences","volume":"206 1","pages":"0"},"PeriodicalIF":1.2000,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Turkish Journal of Electrical Engineering and Computer Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.55730/1300-0632.4031","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
: Segmentation of lung regions is of key importance for the automatic analysis of Chest X-Ray (CXR) images, which have a vital role in the detection of various pulmonary diseases. Precise identification of lung regions is the basic prerequisite for disease diagnosis and treatment planning. However, achieving precise lung segmentation poses significant challenges due to factors such as variations in anatomical shape and size, the presence of strong edges at the rib cage and clavicle, and overlapping anatomical structures resulting from diverse diseases. Although commonly considered as the de-facto standard in medical image segmentation, the convolutional UNet architecture and its variants fall short in addressing these challenges, primarily due to the limited ability to model long-range dependencies between image features. While vision transformers equipped with self-attention mechanisms excel at capturing long-range relationships, either a coarse-grained global self-attention or a fine-grained local self-attention is typically adopted for segmentation tasks on high-resolution images to alleviate quadratic computational cost at the expense of performance loss. This paper introduces a focal modulation UNet model (FMN-UNet) to enhance segmentation performance by effectively aggregating fine-grained local and coarse-grained global relations at a reasonable computational cost. FMN-UNet first encodes CXR images via a convolutional encoder to suppress background regions and extract latent feature maps at a relatively modest resolution. FMN-UNet then leverages global and local attention mechanisms to model contextual relationships across the images. These contextual feature maps are convolutionally decoded to produce segmentation masks. The segmentation performance of FMN-UNet is compared against state-of-the-art methods on three public CXR datasets (JSRT, Montgomery, and Shenzhen). Experiments in each dataset demonstrate the superior performance of FMN-UNet against baselines.
期刊介绍:
The Turkish Journal of Electrical Engineering & Computer Sciences is published electronically 6 times a year by the Scientific and Technological Research Council of Turkey (TÜBİTAK)
Accepts English-language manuscripts in the areas of power and energy, environmental sustainability and energy efficiency, electronics, industry applications, control systems, information and systems, applied electromagnetics, communications, signal and image processing, tomographic image reconstruction, face recognition, biometrics, speech processing, video processing and analysis, object recognition, classification, feature extraction, parallel and distributed computing, cognitive systems, interaction, robotics, digital libraries and content, personalized healthcare, ICT for mobility, sensors, and artificial intelligence.
Contribution is open to researchers of all nationalities.