Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9979831
Wanyue Jiang, C. Wang, Liyue Li, Sheng Wang
Existing convolutional neural networks (CNNs) still perform excellently in object detection over remote sensing images, although remote sensing images are more complicated than natural images. But these approaches are very dependent on the quality and quantity of data. when the number of annotated samples gets smaller, the performance of existing CNNs sharply gets worse. Few-shot object detection (FSOD) can alleviate this problem but still has a lot of improvement space. In this work, to further improve the detection performance, We propose our Dual-Enhanced-CNN model. And the main improvements are as follows: 1) We design a weighted cross image attention to learn the interaction information across both images and channels and then improve the detection capabilities of the query image. 2) We design a new adaptive weight loss to focus more on the targets from novel classes and the targets with poor detection performance. We have conducted multiple experiments on the large-scale remote sensing dataset named DIOR. And the higher detection accuracy and relatively stable experimental performance prove the superiority of our method.
{"title":"Dual-Enhanced-CNN for Few-Shot Object Detection in Remote Sensing Images","authors":"Wanyue Jiang, C. Wang, Liyue Li, Sheng Wang","doi":"10.1109/CISP-BMEI56279.2022.9979831","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979831","url":null,"abstract":"Existing convolutional neural networks (CNNs) still perform excellently in object detection over remote sensing images, although remote sensing images are more complicated than natural images. But these approaches are very dependent on the quality and quantity of data. when the number of annotated samples gets smaller, the performance of existing CNNs sharply gets worse. Few-shot object detection (FSOD) can alleviate this problem but still has a lot of improvement space. In this work, to further improve the detection performance, We propose our Dual-Enhanced-CNN model. And the main improvements are as follows: 1) We design a weighted cross image attention to learn the interaction information across both images and channels and then improve the detection capabilities of the query image. 2) We design a new adaptive weight loss to focus more on the targets from novel classes and the targets with poor detection performance. We have conducted multiple experiments on the large-scale remote sensing dataset named DIOR. And the higher detection accuracy and relatively stable experimental performance prove the superiority of our method.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130880763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9980118
Jad Botros, F. Mourad-Chehade, D. Laplanche
Heart failure (HF) is a chronic heart condition that increases mortality, morbidity, and healthcare costs. The electrocardiogram (ECG) is a noninvasive and straightforward diagnostic tool that can reveal detectable changes in HF. Because of their small amplitude and duration, these changes can be subtle and potentially misclassified during manual interpretation or when analyzed by clinicians. This paper reports a 7 -layer deep convolutional neural network (CNN) model for HF automatic detection. The proposed CNN model requires only minimal pre-processing of ECG signals and does not require any engineered features. The model is trained and tested using an unbalanced and a balanced datasets extracted from the MIT-BIH and the BIDMC databases, achieving an accuracy of 99.73%, a sensitivity of 99.58%, and a specificity of 99.83% when the dataset is unbalanced and an accuracy of 99.26%, a sensitivity of 99.37%, and a specificity of 99.12% when the dataset is balanced.
{"title":"Detection of Heart Failure Using a Convolutional Neural Network via ECG Signals","authors":"Jad Botros, F. Mourad-Chehade, D. Laplanche","doi":"10.1109/CISP-BMEI56279.2022.9980118","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980118","url":null,"abstract":"Heart failure (HF) is a chronic heart condition that increases mortality, morbidity, and healthcare costs. The electrocardiogram (ECG) is a noninvasive and straightforward diagnostic tool that can reveal detectable changes in HF. Because of their small amplitude and duration, these changes can be subtle and potentially misclassified during manual interpretation or when analyzed by clinicians. This paper reports a 7 -layer deep convolutional neural network (CNN) model for HF automatic detection. The proposed CNN model requires only minimal pre-processing of ECG signals and does not require any engineered features. The model is trained and tested using an unbalanced and a balanced datasets extracted from the MIT-BIH and the BIDMC databases, achieving an accuracy of 99.73%, a sensitivity of 99.58%, and a specificity of 99.83% when the dataset is unbalanced and an accuracy of 99.26%, a sensitivity of 99.37%, and a specificity of 99.12% when the dataset is balanced.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131169899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9979924
Tianyu Lu, Zhan Xu, Xiaoying Chen, Yanqing Huang, Cheng Wang
High precision spaceborne SAR radar echo simulation is crucial for spaceborne SAR system design and signal processing algorithm improvement. In this paper, an equivalent scatters SAR echo generation method is proposed to generate SAR echo data. According to the requirements of high-precision modeling and simulation of spaceborne SAR system, a novel system modeling is proposed as well as some related simulation methods. Various types of errors existed in the system are analyzed. Methods for error simulation are presented as well. Finally, spaceborne SAR echoes of stripmap mode and spotlight mode are simulated and the echo data are processed with Chirp-Scaling algorithm, which verifies the high precision spaceborne SAR echo generation simulation.
{"title":"Design of a High-Precision Simulation System for Multi-mode Spaceborne SAR Echo Generation","authors":"Tianyu Lu, Zhan Xu, Xiaoying Chen, Yanqing Huang, Cheng Wang","doi":"10.1109/CISP-BMEI56279.2022.9979924","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979924","url":null,"abstract":"High precision spaceborne SAR radar echo simulation is crucial for spaceborne SAR system design and signal processing algorithm improvement. In this paper, an equivalent scatters SAR echo generation method is proposed to generate SAR echo data. According to the requirements of high-precision modeling and simulation of spaceborne SAR system, a novel system modeling is proposed as well as some related simulation methods. Various types of errors existed in the system are analyzed. Methods for error simulation are presented as well. Finally, spaceborne SAR echoes of stripmap mode and spotlight mode are simulated and the echo data are processed with Chirp-Scaling algorithm, which verifies the high precision spaceborne SAR echo generation simulation.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131258970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9980166
Jiahao Shen, Qiang Tong, Zhanqi Cui, Zanqiang Dong, Xiulei Liu
Although spatial features of molecules have been widely used for molecular property prediction, the importance of interactive features is coming to the surface these days. By using molecular voxel-based representation, each voxel contains the distribution of atoms, while each channel corresponds to one atomic type. Thus the interaction between multiple atoms is actually contained in the channel information. In this work, we propose a gated spatial-channel transformer (GatedSCT) network for molecular property prediction. We design a channel transformer to capture interactive features from channels, which indicates the relationship between multiple atoms. Also, a spatial transformer is used to extract spatial features of molecules. We apply a gated mechanism to merge these two parts efficiently. Since our proposed network takes advantage of channel information, the experiments show that it can predict molecular properties more accurately than other networks.
{"title":"A Gated Spatial-Channel Transformer Network for the Prediction of Molecular Properties","authors":"Jiahao Shen, Qiang Tong, Zhanqi Cui, Zanqiang Dong, Xiulei Liu","doi":"10.1109/CISP-BMEI56279.2022.9980166","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980166","url":null,"abstract":"Although spatial features of molecules have been widely used for molecular property prediction, the importance of interactive features is coming to the surface these days. By using molecular voxel-based representation, each voxel contains the distribution of atoms, while each channel corresponds to one atomic type. Thus the interaction between multiple atoms is actually contained in the channel information. In this work, we propose a gated spatial-channel transformer (GatedSCT) network for molecular property prediction. We design a channel transformer to capture interactive features from channels, which indicates the relationship between multiple atoms. Also, a spatial transformer is used to extract spatial features of molecules. We apply a gated mechanism to merge these two parts efficiently. Since our proposed network takes advantage of channel information, the experiments show that it can predict molecular properties more accurately than other networks.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133033610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9980271
Xiaoning Liu, Jinhong Liu, Peiyao Guo, Dongcheng Tuo, Shaotong Tian, Yi Jiang
With the continuous development of neural network technology, the generation methods of fake images are gradually improved. More and more faked photos and face changing videos appear on major social media platforms, causing people to pay attention to their reputation security, information security, and public opinion guidance. At present, the spatial detection model has excellent results. But most of them need a large number of training sets as support. Simultaneously, the frequency detection model primarily uses complex feature extraction operations, and most of the two detection models only detect a single fake image generation method. Given the above two points, this paper makes the following work: Based on the common issues and attention mechanism of fake image generation methods on the network, FAD-Net (Frequency-domain Attention Detection Network) is designed, which is suitable for most fake image generation methods. We use the frequency domain image as the network input to train the detector. Good detection results are obtained on 11 fake image generation methods such as Deepfakes and Gan series. Compared with the best spatial detection model, FAD-Net uses a smaller training set and shorter training time to get better detection generalization, which shows the superiority of frequency information in fake image detection generalization.
随着神经网络技术的不断发展,伪图像的生成方法也在逐步完善。越来越多的假照片和变脸视频出现在各大社交媒体平台上,引起人们对其声誉安全、信息安全和舆论引导的关注。目前,空间检测模型取得了很好的效果。但它们大多需要大量的训练集作为支持。同时,频率检测模型主要采用复杂的特征提取操作,两种检测模型大多只检测单一的假图像生成方法。基于以上两点,本文做了以下工作:基于网络上假图像生成方法的共性问题和注意机制,设计了适用于大多数假图像生成方法的频域注意检测网络(FAD-Net, frequency domain attention Detection network)。我们使用频域图像作为网络输入来训练检测器。在Deepfakes、Gan系列等11种伪图像生成方法上均取得了较好的检测效果。与最佳的空间检测模型相比,FAD-Net使用更小的训练集和更短的训练时间获得了更好的检测泛化,显示了频率信息在假图像检测泛化中的优势。
{"title":"FAD-Net: Fake Images Detection and Generalization Based on Frequency Domain Transformation","authors":"Xiaoning Liu, Jinhong Liu, Peiyao Guo, Dongcheng Tuo, Shaotong Tian, Yi Jiang","doi":"10.1109/CISP-BMEI56279.2022.9980271","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980271","url":null,"abstract":"With the continuous development of neural network technology, the generation methods of fake images are gradually improved. More and more faked photos and face changing videos appear on major social media platforms, causing people to pay attention to their reputation security, information security, and public opinion guidance. At present, the spatial detection model has excellent results. But most of them need a large number of training sets as support. Simultaneously, the frequency detection model primarily uses complex feature extraction operations, and most of the two detection models only detect a single fake image generation method. Given the above two points, this paper makes the following work: Based on the common issues and attention mechanism of fake image generation methods on the network, FAD-Net (Frequency-domain Attention Detection Network) is designed, which is suitable for most fake image generation methods. We use the frequency domain image as the network input to train the detector. Good detection results are obtained on 11 fake image generation methods such as Deepfakes and Gan series. Compared with the best spatial detection model, FAD-Net uses a smaller training set and shorter training time to get better detection generalization, which shows the superiority of frequency information in fake image detection generalization.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132163125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9980313
Shiming Liang, Shengxi Diao
This work presents a brown-out detection circuit architecture with the most reported selectable reference levels, which can work in a wide temperature range. Compared with reported BOD circuits, the proposed BOD circuit separates the power supply of brown-out detection and functional circuit from the external power supply, which gives higher robustness. The proposed architecture is implemented in a 12 nm CMOS process, occupying a 0.01 mm- Area. The post layout simulation shows that the circuit can realize a detection delay within 1us in the temperature range of −40-125°C
{"title":"A Power Separation Fast Response Brown-Out Detection Structure","authors":"Shiming Liang, Shengxi Diao","doi":"10.1109/CISP-BMEI56279.2022.9980313","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980313","url":null,"abstract":"This work presents a brown-out detection circuit architecture with the most reported selectable reference levels, which can work in a wide temperature range. Compared with reported BOD circuits, the proposed BOD circuit separates the power supply of brown-out detection and functional circuit from the external power supply, which gives higher robustness. The proposed architecture is implemented in a 12 nm CMOS process, occupying a 0.01 mm- Area. The post layout simulation shows that the circuit can realize a detection delay within 1us in the temperature range of −40-125°C","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124372017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9980070
Yanqin Kang, Jin Liu, Tao Liu, Jun Qiang
Imaging in the field of low-dose computed tomography (LDCT) tend to be rather noisy and artificial but is diagnostically useful. One approach to improve the quality of LDCT images is to use deep learning (DL) techniques. DL-based methods produce state-of-the-art performance in low-level medical image restoration tasks but remain defect to interpret due to their black-box constructions. In this paper, we present a simple yet effective LDCT image denoising model by combining the advantages of a residual strategy and a multilayer convolutional analysis-based sparse encoder (CASE). Inspired by convolutional sparse coding (CSC), we constructed a multilayer CASE to sufficiently capture and represent hierarchical image features and designed CASE-net to achieve improved LDCT noise artifact suppression. Moreover, a hybrid loss function, e.g. mean absolute error (MAE) loss, edge loss and perceptual loss, was used to achieve better denoising effects. Experiments on the MAYO and UIH datasets demonstrated the performance of our framework. The results prove that the proposed approach can restrain noise and artifacts and maintain tissue structure during the LDCT imaging.
{"title":"Denoising Low-Dose CT Images Using a Multi-Layer Convolutional Analysis-Based Sparse Encoder Network","authors":"Yanqin Kang, Jin Liu, Tao Liu, Jun Qiang","doi":"10.1109/CISP-BMEI56279.2022.9980070","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980070","url":null,"abstract":"Imaging in the field of low-dose computed tomography (LDCT) tend to be rather noisy and artificial but is diagnostically useful. One approach to improve the quality of LDCT images is to use deep learning (DL) techniques. DL-based methods produce state-of-the-art performance in low-level medical image restoration tasks but remain defect to interpret due to their black-box constructions. In this paper, we present a simple yet effective LDCT image denoising model by combining the advantages of a residual strategy and a multilayer convolutional analysis-based sparse encoder (CASE). Inspired by convolutional sparse coding (CSC), we constructed a multilayer CASE to sufficiently capture and represent hierarchical image features and designed CASE-net to achieve improved LDCT noise artifact suppression. Moreover, a hybrid loss function, e.g. mean absolute error (MAE) loss, edge loss and perceptual loss, was used to achieve better denoising effects. Experiments on the MAYO and UIH datasets demonstrated the performance of our framework. The results prove that the proposed approach can restrain noise and artifacts and maintain tissue structure during the LDCT imaging.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124591489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9979910
Lele Guan, Zhan Xu, Lu Tian, Chenrui Shi
As the space environment of information transmission becomes more and more complex, the accuracy of communication becomes a new challenge. In order to study the anti-jamming performance of code shift keying (CSK), this paper mainly simulates the performance of CSK and binary phase shift keying (BPSK) under 5 signal interferences, and discusses it according to the bit error rates (BER) obtained by simulation. In the face of different forms of signal interference, CSK has better anti-jamming performance than BPSK. Under various forms of signal interference, CSK has the strongest anti-interference ability to multi-tone interference, and its anti-interference ability to pulse interference, 50% narrowband interference, single-tone interference and channel noise decreases successively.
{"title":"Analysis and Simulation of Interference Effects on CSK Modulation Systems","authors":"Lele Guan, Zhan Xu, Lu Tian, Chenrui Shi","doi":"10.1109/CISP-BMEI56279.2022.9979910","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979910","url":null,"abstract":"As the space environment of information transmission becomes more and more complex, the accuracy of communication becomes a new challenge. In order to study the anti-jamming performance of code shift keying (CSK), this paper mainly simulates the performance of CSK and binary phase shift keying (BPSK) under 5 signal interferences, and discusses it according to the bit error rates (BER) obtained by simulation. In the face of different forms of signal interference, CSK has better anti-jamming performance than BPSK. Under various forms of signal interference, CSK has the strongest anti-interference ability to multi-tone interference, and its anti-interference ability to pulse interference, 50% narrowband interference, single-tone interference and channel noise decreases successively.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114344925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9980185
Yitong Luo, Chenxi Li, Yiman Sun, Hong Fan
The 3D U-Net model employs end-to-end training ways and does not demand pre-training process, but the limited acceptance range of convolutional kernel makes it difficult to establish an explicit long-range dependency, resulting in poor segmentation accuracy in magnetic resonance (MR) image. This paper presents an promoted 3D U-Net architecture that incorporates the Transformer in 3D U-Net (Trans3DUNet) to segment multi-modal MR images, called MMTrans3DUNet. Firstly, the tokenized image blocks from a convolutional neural network (CNN) feature mapping are encoded by Transformer as the input sequence to extract the global context. Then, the decoder up-sampling the encoded features and coalesce them in CNN feature mapping with high resolution to achieve exact positioning. Moreover, according to the characteristics of MR images with multiple imaging modes, the four modalities images (t l, t lce, t2, flair) are fused and put into the Trans3DUNet model for training, which can overcome the problem that the single-modal MR image cannot sufficiently subdivide the lesion in the relevant area. The experimental results on the BraTS2018 and BraTS2019 dataset show that MMTrans3DUNet model can further promote the efficiency and precision of segmentation due to the image information of multiple modes which can complement each other.
{"title":"Multi-Modal Magnetic Resonance Images Segmentation Based on An Improved 3DUNet","authors":"Yitong Luo, Chenxi Li, Yiman Sun, Hong Fan","doi":"10.1109/CISP-BMEI56279.2022.9980185","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980185","url":null,"abstract":"The 3D U-Net model employs end-to-end training ways and does not demand pre-training process, but the limited acceptance range of convolutional kernel makes it difficult to establish an explicit long-range dependency, resulting in poor segmentation accuracy in magnetic resonance (MR) image. This paper presents an promoted 3D U-Net architecture that incorporates the Transformer in 3D U-Net (Trans3DUNet) to segment multi-modal MR images, called MMTrans3DUNet. Firstly, the tokenized image blocks from a convolutional neural network (CNN) feature mapping are encoded by Transformer as the input sequence to extract the global context. Then, the decoder up-sampling the encoded features and coalesce them in CNN feature mapping with high resolution to achieve exact positioning. Moreover, according to the characteristics of MR images with multiple imaging modes, the four modalities images (t l, t lce, t2, flair) are fused and put into the Trans3DUNet model for training, which can overcome the problem that the single-modal MR image cannot sufficiently subdivide the lesion in the relevant area. The experimental results on the BraTS2018 and BraTS2019 dataset show that MMTrans3DUNet model can further promote the efficiency and precision of segmentation due to the image information of multiple modes which can complement each other.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114819143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9980328
Saiwei Li, Qiang Tong, Xuhong Liu, Zhanqi Cui, Xiulei Liu
Tiny object detection has been a challenging topic in computer vision recent years. Moreover, in remote sensing field, smaller and clustered tiny objects make its detection more difficult compared to ground-based images. This makes general detectors fail to achieve good performance when facing tiny objects in remote sensing images. In this paper, we propose a Mask Augmented Attention Feature Pyramid Network(MA2-FPN) to detect tiny objects in remote sensing images, which consists of two modules, Attention Enhancement Module(AEM) and Mask Supervision Module(MSM). Specifically, AEM aggregates tiny target context and spatial feature information by large kernel separable convolutional attention mechanism, and MSM supervises AEM through a segmentation attention loss to aggregate attention information more accurately while suppressing the influence of irrelevant background. Experiments based on the AI-TOD benchmark show that our MA2-FPN achieves state-of-the-art(SOTA) level.
{"title":"MA2-FPN for Tiny Object Detection from Remote Sensing Images","authors":"Saiwei Li, Qiang Tong, Xuhong Liu, Zhanqi Cui, Xiulei Liu","doi":"10.1109/CISP-BMEI56279.2022.9980328","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980328","url":null,"abstract":"Tiny object detection has been a challenging topic in computer vision recent years. Moreover, in remote sensing field, smaller and clustered tiny objects make its detection more difficult compared to ground-based images. This makes general detectors fail to achieve good performance when facing tiny objects in remote sensing images. In this paper, we propose a Mask Augmented Attention Feature Pyramid Network(MA2-FPN) to detect tiny objects in remote sensing images, which consists of two modules, Attention Enhancement Module(AEM) and Mask Supervision Module(MSM). Specifically, AEM aggregates tiny target context and spatial feature information by large kernel separable convolutional attention mechanism, and MSM supervises AEM through a segmentation attention loss to aggregate attention information more accurately while suppressing the influence of irrelevant background. Experiments based on the AI-TOD benchmark show that our MA2-FPN achieves state-of-the-art(SOTA) level.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116436173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}