Pneumonia is a life-threatening respiratory infection that affects millions of individuals worldwide. Early and accurate diagnosis of pneumonia is crucial for effective treatment and patient care. In recent years, deep learning techniques have shown remarkable promise in automating the diagnosis of pneumonia from X-ray images. However, the inherent variability in X-ray images and the complexity of pneumonia patterns pose significant challenges to achieving high classification accuracy. In this paper, we propose a novel approach for pneumonia X-ray image classification based on multiple model ensemble. Our method leverages the strengths of diverse deep learning architectures and achieves superior classification performance compared to single models. We conducted extensive experiments on both public and private datasets, and the proposed method achieved accuracy improvements of 7.53 and 3.36, respectively. The experimental results indicate that the proposed method has high usability.
肺炎是一种危及生命的呼吸道感染,影响着全球数百万人。肺炎的早期准确诊断对于有效治疗和患者护理至关重要。近年来,深度学习技术在根据 X 光图像自动诊断肺炎方面显示出了显著的前景。然而,X 光图像固有的可变性和肺炎模式的复杂性给实现高分类准确性带来了巨大挑战。在本文中,我们提出了一种基于多模型集合的肺炎 X 光图像分类新方法。我们的方法充分利用了多种深度学习架构的优势,与单一模型相比,分类性能更优越。我们在公共数据集和私有数据集上进行了大量实验,结果表明所提出的方法分别提高了 7.53 和 3.36 的准确率。实验结果表明,所提出的方法具有很高的可用性。
{"title":"Multimodel ensemble-based Pneumonia x-ray image classification","authors":"Guanglong Zheng","doi":"10.1117/12.3014404","DOIUrl":"https://doi.org/10.1117/12.3014404","url":null,"abstract":"Pneumonia is a life-threatening respiratory infection that affects millions of individuals worldwide. Early and accurate diagnosis of pneumonia is crucial for effective treatment and patient care. In recent years, deep learning techniques have shown remarkable promise in automating the diagnosis of pneumonia from X-ray images. However, the inherent variability in X-ray images and the complexity of pneumonia patterns pose significant challenges to achieving high classification accuracy. In this paper, we propose a novel approach for pneumonia X-ray image classification based on multiple model ensemble. Our method leverages the strengths of diverse deep learning architectures and achieves superior classification performance compared to single models. We conducted extensive experiments on both public and private datasets, and the proposed method achieved accuracy improvements of 7.53 and 3.36, respectively. The experimental results indicate that the proposed method has high usability.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"47 4","pages":"129691M - 129691M-5"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The FBS_YOLO3 vehicle detection algorithm is a novel solution to the challenge of detecting vehicles in unstructured road scenarios with limited warning information. This algorithm builds upon the YOLOv3 model to deliver advanced multi-scale target detection. Firstly, FBS_YOLO3 incorporates four convolutional residual structures into the YOLOv3 backbone network to obtain deeper feature knowledge via down-sampling. Secondly, the feature fusion network is improved by implementing a PAN network structure which enhances the accuracy and robustness of viewpoint recognition through top-down and bottom-up feature fusion. Lastly, the K-means clustering fusion cross-comparison loss function is utilized to redefine the anchor frame by employing a K-means fusion cross-ratio loss function. This innovative approach solves the issue of mismatching the predetermined anchor frame size of the YOLOv3 network. Experimental results demonstrate that FBS_YOLO3 on a self-built dataset can improve mAP by 3.15% compared with the original network, while maintaining a quick detection rate of 37 fps. Moreover, FBS_YOLO3 can accurately detect vehicles, identify viewpoint information, and effectively solve the warning information insufficiency problem in unstructured road scenarios.
{"title":"FBS_YOLO3 vehicle detection algorithm based on viewpoint information","authors":"Chunbao Huo, Zengwen Chen, Zhibo Tong, Ya Zheng","doi":"10.1117/12.3014408","DOIUrl":"https://doi.org/10.1117/12.3014408","url":null,"abstract":"The FBS_YOLO3 vehicle detection algorithm is a novel solution to the challenge of detecting vehicles in unstructured road scenarios with limited warning information. This algorithm builds upon the YOLOv3 model to deliver advanced multi-scale target detection. Firstly, FBS_YOLO3 incorporates four convolutional residual structures into the YOLOv3 backbone network to obtain deeper feature knowledge via down-sampling. Secondly, the feature fusion network is improved by implementing a PAN network structure which enhances the accuracy and robustness of viewpoint recognition through top-down and bottom-up feature fusion. Lastly, the K-means clustering fusion cross-comparison loss function is utilized to redefine the anchor frame by employing a K-means fusion cross-ratio loss function. This innovative approach solves the issue of mismatching the predetermined anchor frame size of the YOLOv3 network. Experimental results demonstrate that FBS_YOLO3 on a self-built dataset can improve mAP by 3.15% compared with the original network, while maintaining a quick detection rate of 37 fps. Moreover, FBS_YOLO3 can accurately detect vehicles, identify viewpoint information, and effectively solve the warning information insufficiency problem in unstructured road scenarios.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"17 4","pages":"129690S - 129690S-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DR (Diabetic retinopathy) a chronic progressive disease which affects eyesight and even causes blindness. It is significance to carry out the identification and severity diagnosis of DR, timely diagnosis and treatment of DR Patients, improve the people’s quality, especially the elderly, and improve the efficiency of diagnosis. In this study, with the goal of efficient and accurate division of DR Levels, a DR Recognition and classification algorithm based on ResNet and transfer learning is proposed. Firstly, shallow feature extraction module of ResNet18 is used to get retinal image feature, and then the fully connected classification structure model of DR Is designed. Then the transfer learning method is combined to train the network weights to improve the generalization ability of the model, ResNet-18 is selected as the backbone network model for feature extracting. Results show that the accuracy of the training set reaches to provide useful guidance for DR Automatic diagnosis, and effectively alleviates the problem of low accuracy of DR Classification
DR(糖尿病视网膜病变)是一种影响视力甚至导致失明的慢性进展性疾病。对 DR 进行识别和严重程度诊断,及时诊断和治疗 DR 患者,提高人们尤其是老年人的生活质量,提高诊断效率具有重要意义。本研究以高效、准确地划分 DR 级别为目标,提出了一种基于 ResNet 和迁移学习的 DR 识别与分类算法。首先,利用 ResNet18 的浅层特征提取模块获取视网膜图像特征,然后设计出 DR 的全连接分类结构模型。然后结合迁移学习方法训练网络权重,提高模型的泛化能力,并选择 ResNet-18 作为特征提取的骨干网络模型。结果表明,训练集的准确率达到了为 DR 自动诊断提供有用指导的水平,并有效缓解了 DR 分类准确率低的问题。
{"title":"Detection algorithm for diabetic retinopathy based on ResNet and transfer learning","authors":"Weihua Wang, Li Lei","doi":"10.1117/12.3014400","DOIUrl":"https://doi.org/10.1117/12.3014400","url":null,"abstract":"DR (Diabetic retinopathy) a chronic progressive disease which affects eyesight and even causes blindness. It is significance to carry out the identification and severity diagnosis of DR, timely diagnosis and treatment of DR Patients, improve the people’s quality, especially the elderly, and improve the efficiency of diagnosis. In this study, with the goal of efficient and accurate division of DR Levels, a DR Recognition and classification algorithm based on ResNet and transfer learning is proposed. Firstly, shallow feature extraction module of ResNet18 is used to get retinal image feature, and then the fully connected classification structure model of DR Is designed. Then the transfer learning method is combined to train the network weights to improve the generalization ability of the model, ResNet-18 is selected as the backbone network model for feature extracting. Results show that the accuracy of the training set reaches to provide useful guidance for DR Automatic diagnosis, and effectively alleviates the problem of low accuracy of DR Classification","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"57 3","pages":"129690G - 129690G-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To obtain higher imaging quality, the dark level generated during the operation of CMOS image sensors (CIS) needs to be corrected. In this paper, a dark level correction circuit is designed based on a 4 T active pixel, which includes a dark current cancellation circuit and a switched capacitor amplifier circuit. First, the dark current is collected in real time by using the dark pixels in the periphery of the face array, and the dark current noise is read out and differed from the image signals output from the columns to obtain a more accurate output signal, thus eliminating the dark level caused by the dark current. Then the switched-capacitor amplifier is used to collect and amplify the signals to facilitate the subsequent ADC processing. Based on the 110 nm process for the proposed method of specific circuit design verification, the verification results show that the dark level correction circuit designed in this paper through a real-time sampling of the dark pixels of the periphery of the array can be reduced to the exposure stage of the dark current noise to more than 85% of the original.
{"title":"Research on dark level correction method for CMOS image sensors","authors":"Yizhe Wang, Zhongjie Guo, Youmei Guo","doi":"10.1117/12.3014385","DOIUrl":"https://doi.org/10.1117/12.3014385","url":null,"abstract":"To obtain higher imaging quality, the dark level generated during the operation of CMOS image sensors (CIS) needs to be corrected. In this paper, a dark level correction circuit is designed based on a 4 T active pixel, which includes a dark current cancellation circuit and a switched capacitor amplifier circuit. First, the dark current is collected in real time by using the dark pixels in the periphery of the face array, and the dark current noise is read out and differed from the image signals output from the columns to obtain a more accurate output signal, thus eliminating the dark level caused by the dark current. Then the switched-capacitor amplifier is used to collect and amplify the signals to facilitate the subsequent ADC processing. Based on the 110 nm process for the proposed method of specific circuit design verification, the verification results show that the dark level correction circuit designed in this paper through a real-time sampling of the dark pixels of the periphery of the array can be reduced to the exposure stage of the dark current noise to more than 85% of the original.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"58 3","pages":"1296916 - 1296916-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lung cancer is the disease with the highest incidence rate and mortality of cancer in China, which seriously threatens human life safety. Pulmonary nodules are the main factor leading to lung cancer, and their precise identification plays a crucial role in clinical diagnosis. This paper proposes a lung nodule detection model that combines global image information to address issues. The model is based on improved YOLOV5 network. Finally, comparative experiments have verified the accuracy and effectiveness of this model.
{"title":"Detection and recongnition of pulmonary nodules based on convolution neural network","authors":"Qiangchao Shi, Zhibing Shu","doi":"10.1117/12.3014478","DOIUrl":"https://doi.org/10.1117/12.3014478","url":null,"abstract":"Lung cancer is the disease with the highest incidence rate and mortality of cancer in China, which seriously threatens human life safety. Pulmonary nodules are the main factor leading to lung cancer, and their precise identification plays a crucial role in clinical diagnosis. This paper proposes a lung nodule detection model that combines global image information to address issues. The model is based on improved YOLOV5 network. Finally, comparative experiments have verified the accuracy and effectiveness of this model.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"24 1","pages":"129692E - 129692E-7"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Interactive video object segmentation (iVOS), which aims to efficiently produce high-quality segmentation masks of the target object in a video with user interactions. Recently, numerous works are proposed to advance the task of iVOS. However, their usages on user intent are limited. First, typical modules usually try to direct generate the segmentation without any further exploration on the input interaction, which misses valuable information. Second, recent iVOS approaches also do not consider the raw interactive information. As a result, the final segmentation results will be poisoned by the erroneous information given by the previous round’s segmentation masks. To solve the aforementioned weaknesses, in this paper, an Iterative Segmentation and Propagation based iVOS method is proposed to conduct better user intent exploration, namely ISP. ISP directly models user intent into the PGI2M module and TP module. Specifically, ISP first extracts a coarse-grained segmentation mask by analyzing the user’s input. Subsequently, this mask is used as a prior to aid the PGI2M module. Secondly, ISP presents a new interaction-driven self-attention module to recall the user’s intent in the TP module. Extensive experiments on two public datasets show the superiority of ISP over existing methods.
{"title":"Iterative segmentation and propagation based interactive video object segmentation","authors":"Sihan Luo, Sizhe Yang, Xia Yuan","doi":"10.1117/12.3014487","DOIUrl":"https://doi.org/10.1117/12.3014487","url":null,"abstract":"Interactive video object segmentation (iVOS), which aims to efficiently produce high-quality segmentation masks of the target object in a video with user interactions. Recently, numerous works are proposed to advance the task of iVOS. However, their usages on user intent are limited. First, typical modules usually try to direct generate the segmentation without any further exploration on the input interaction, which misses valuable information. Second, recent iVOS approaches also do not consider the raw interactive information. As a result, the final segmentation results will be poisoned by the erroneous information given by the previous round’s segmentation masks. To solve the aforementioned weaknesses, in this paper, an Iterative Segmentation and Propagation based iVOS method is proposed to conduct better user intent exploration, namely ISP. ISP directly models user intent into the PGI2M module and TP module. Specifically, ISP first extracts a coarse-grained segmentation mask by analyzing the user’s input. Subsequently, this mask is used as a prior to aid the PGI2M module. Secondly, ISP presents a new interaction-driven self-attention module to recall the user’s intent in the TP module. Extensive experiments on two public datasets show the superiority of ISP over existing methods.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"43 5","pages":"129691A - 129691A-10"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the continuous improvement of intelligent management level in red bayberry orchards, the demand for automatic picking and automatic sorting is becoming increasingly apparent. The prerequisite for achieving these automated processes is to quickly identify the maturity of red bayberries by object detection. In this study, we classified red bayberry into 8 levels of maturity and achieved an object detection precision of 88.9%. We used a fast object detection model, combined with small object optimization methods and small feature extraction layers to get higher precision.
{"title":"Research on object detection for small objects in agriculture: taking red bayberry as an example","authors":"Shan Hua, Kaiyuan Han, Shuangwei Li, Minjie Xu, Shouyan Zhu, Zhifu Xu","doi":"10.1117/12.3014464","DOIUrl":"https://doi.org/10.1117/12.3014464","url":null,"abstract":"With the continuous improvement of intelligent management level in red bayberry orchards, the demand for automatic picking and automatic sorting is becoming increasingly apparent. The prerequisite for achieving these automated processes is to quickly identify the maturity of red bayberries by object detection. In this study, we classified red bayberry into 8 levels of maturity and achieved an object detection precision of 88.9%. We used a fast object detection model, combined with small object optimization methods and small feature extraction layers to get higher precision.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"59 2","pages":"129692B - 129692B-7"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Starting from the problem that rice extraction from remote sensing images still faces effective feature construction and extraction model, the feature optimization and combined deep learning model are considered. Taking Sentinel-2A image as data source, a multi-dimensional feature data set including spectral features, red edge features, vegetation index, water index and texture features is constructed. The ReliefF-RFE algorithm is used to optimize the features of the data set for rice extraction, and the combined UPerNet-Swin Transformer model is used to extract the rice from the study area based on the optimized features. Comparison with other feature combination schemes and deep learning models demonstrates that: (1) using the optimized features based on the ReliefF-RFE algorithm has the best segmentation effect for rice extraction, which its accuracy, recall rate, F1 score and IoU reach 92.77%, 92.28%, 92.52% and 86.09%, respectively, and (2) compared with PSPNet, Unet, DeepLabv3+ and the original UPerNet models, the combined UPerNet-Swin Transformer model has fewer misclassifications and omissions under the same optimal feature combination schemes, which the F1 score and IoU are increased by 11.12% and 17.46%, respectively
{"title":"Rice extraction from Sentinel-2A image based on feature optimization and UPerNet:Swin Transformer model","authors":"Yu Wei, Bo Wei, Xianhua Liang, Zhiwei Qi","doi":"10.1117/12.3014406","DOIUrl":"https://doi.org/10.1117/12.3014406","url":null,"abstract":"Starting from the problem that rice extraction from remote sensing images still faces effective feature construction and extraction model, the feature optimization and combined deep learning model are considered. Taking Sentinel-2A image as data source, a multi-dimensional feature data set including spectral features, red edge features, vegetation index, water index and texture features is constructed. The ReliefF-RFE algorithm is used to optimize the features of the data set for rice extraction, and the combined UPerNet-Swin Transformer model is used to extract the rice from the study area based on the optimized features. Comparison with other feature combination schemes and deep learning models demonstrates that: (1) using the optimized features based on the ReliefF-RFE algorithm has the best segmentation effect for rice extraction, which its accuracy, recall rate, F1 score and IoU reach 92.77%, 92.28%, 92.52% and 86.09%, respectively, and (2) compared with PSPNet, Unet, DeepLabv3+ and the original UPerNet models, the combined UPerNet-Swin Transformer model has fewer misclassifications and omissions under the same optimal feature combination schemes, which the F1 score and IoU are increased by 11.12% and 17.46%, respectively","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"15 2-4","pages":"129691L - 129691L-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The brocade weaving craft has a long history, with exquisite patterns and profound cultural connotations. It is an excellent representative of Chinese silk culture and an eye-catching business card in the intangible cultural heritage of mankind. The process of making brocade is a very complicated craft. In order to be able to detect defects in time during the production process, an improved SE-SSD fabric defect detection algorithm is proposed for the low efficiency of defect detection in traditional production, the large model affects the deployment and the shortcomings of DB-YOLOv3. By improving the network structure and optimizing the prior frame adjustment mechanism, the algorithm improves the ability of model feature extraction and greatly reduces the parameters and calculation of the network. The experimental results show that the SE-SSD algorithm effectively improves the missed detection of linear and weak target defects. Compared with the SSD network, the detection accuracy is increased by 27.55%, reaching 93.08% mAP, the detection speed is increased to 49FPS, and the network parameters are reduced. 51.5%, which improves the practicability of the algorithm, and the ability to detect small target defects still needs to be improved.
{"title":"Research on brocade defect detection algorithm based on deep learning","authors":"Ning Yun","doi":"10.1117/12.3014538","DOIUrl":"https://doi.org/10.1117/12.3014538","url":null,"abstract":"The brocade weaving craft has a long history, with exquisite patterns and profound cultural connotations. It is an excellent representative of Chinese silk culture and an eye-catching business card in the intangible cultural heritage of mankind. The process of making brocade is a very complicated craft. In order to be able to detect defects in time during the production process, an improved SE-SSD fabric defect detection algorithm is proposed for the low efficiency of defect detection in traditional production, the large model affects the deployment and the shortcomings of DB-YOLOv3. By improving the network structure and optimizing the prior frame adjustment mechanism, the algorithm improves the ability of model feature extraction and greatly reduces the parameters and calculation of the network. The experimental results show that the SE-SSD algorithm effectively improves the missed detection of linear and weak target defects. Compared with the SSD network, the detection accuracy is increased by 27.55%, reaching 93.08% mAP, the detection speed is increased to 49FPS, and the network parameters are reduced. 51.5%, which improves the practicability of the algorithm, and the ability to detect small target defects still needs to be improved.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"12 2","pages":"1296907 - 1296907-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In response to the problem of low accuracy in detecting moving targets in minefield images due to indistinct target features, complex background information, and frequent occlusions, this paper proposes a deep learning-based method for minefield moving target detection. Firstly, a fully dynamic convolutional structure is incorporated into the convolutional block of the backbone feature extraction network to reduce redundant information and enhance feature extraction capability. Secondly, the Swin Transformer network structure is introduced during the feature fusion process to enhance the perception of local geometric information. Finally, a coordinate attention mechanism is added to update the fused feature maps, thus enhancing the network's ability to detect occluded targets and targets in low-light conditions. The proposed algorithm is evaluated on a self-built minefield dataset and the Pascal VOC dataset through ablation experiments, and the results show that it significantly improves the average accuracy of target detection in minefield images.
{"title":"Research on mine moving target detection method based on deep learning","authors":"Jiaheng Zhang, Peng Mei, Yongsheng Yang","doi":"10.1117/12.3014398","DOIUrl":"https://doi.org/10.1117/12.3014398","url":null,"abstract":"In response to the problem of low accuracy in detecting moving targets in minefield images due to indistinct target features, complex background information, and frequent occlusions, this paper proposes a deep learning-based method for minefield moving target detection. Firstly, a fully dynamic convolutional structure is incorporated into the convolutional block of the backbone feature extraction network to reduce redundant information and enhance feature extraction capability. Secondly, the Swin Transformer network structure is introduced during the feature fusion process to enhance the perception of local geometric information. Finally, a coordinate attention mechanism is added to update the fused feature maps, thus enhancing the network's ability to detect occluded targets and targets in low-light conditions. The proposed algorithm is evaluated on a self-built minefield dataset and the Pascal VOC dataset through ablation experiments, and the results show that it significantly improves the average accuracy of target detection in minefield images.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"30 3","pages":"1296926 - 1296926-10"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140512087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}