Since the surveillance cameras are usually mounted at a high position to overlook targets, tilt-angle faces on overhead view are common in the public video surveillance environment. Face recognition approaches based on deep learning models have achieved excellent performance, but there remains a large gap for the overlooking surveillance scenarios. The results of face recognition depend not only on the structure of the model, but also on the completeness and diversity of the training samples. The existing multi-pose face datasets do not cover complete top-view face samples, and the models trained by them thus cannot provide satisfactory accuracy. To this end, this paper pioneers a multi-view tilt-angle face dataset (TFD), which is collected with an elaborately devised overhead capture equipment. TFD contains 11,124 face images from 927 subjects, covering a variety of tilt angles on the overhead view. To verify the validity of the constructed dataset, we further conduct comprehensive face detection and recognition experiments using the corresponding models trained by WiderFace, Webface and our TFD, respectively. Experimental results show that our TFD substantially promotes the face detection and recognition accuracy under the top-view situation. TFD is available at https://github.com/huang1204510135/D FD.
{"title":"A Tilt-Angle Face Dataset And Its Validation","authors":"Nanxi Wang, Zhongyuan Wang, Zheng He, Baojin Huang, Liguo Zhou, Zhen Han","doi":"10.1109/ICIP42928.2021.9506052","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506052","url":null,"abstract":"Since the surveillance cameras are usually mounted at a high position to overlook targets, tilt-angle faces on overhead view are common in the public video surveillance environment. Face recognition approaches based on deep learning models have achieved excellent performance, but there remains a large gap for the overlooking surveillance scenarios. The results of face recognition depend not only on the structure of the model, but also on the completeness and diversity of the training samples. The existing multi-pose face datasets do not cover complete top-view face samples, and the models trained by them thus cannot provide satisfactory accuracy. To this end, this paper pioneers a multi-view tilt-angle face dataset (TFD), which is collected with an elaborately devised overhead capture equipment. TFD contains 11,124 face images from 927 subjects, covering a variety of tilt angles on the overhead view. To verify the validity of the constructed dataset, we further conduct comprehensive face detection and recognition experiments using the corresponding models trained by WiderFace, Webface and our TFD, respectively. Experimental results show that our TFD substantially promotes the face detection and recognition accuracy under the top-view situation. TFD is available at https://github.com/huang1204510135/D FD.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117330933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506077
Jonathan Monsalve, M. Márquez, I. Esnaola, H. Arguello
Compressive covariance sampling (CCS) theory aims to recover the covariance matrix (CM) of a signal, instead of the signal itself, from a reduced set of random linear projections. Although several theoretical works demonstrate the CCS theory’s advantages in compressive spectral imaging tasks, a real optical implementation has no been proposed. Therefore, this paper proposes a compressive spectral sensing protocol for the dual-dispersive coded aperture spectral snapshot imager (DD-CASSI) to directly estimate the covariance matrix of the signal. Specifically, we propose a coded aperture design that allows recasting the vector sensing problem into matrix form, which enables to exploit the covariance matrix structure such as positive-semidefiniteness, low-rank, or Toeplitz. Additionally, a low-rank approximation of the image is reconstructed using a Principal Components Analysis (PCA) based method. In order to test the precision of the reconstruction, some spectral signatures of the image are captured with a spectrometer and compared with those obtained in the reconstruction using the covariance matrix. Results show the reconstructed spectrum is accurate with a spectral angle mapper (SAM) of less than 14°. RGB image composites of the spectral image also provide evidence of a correct color reconstruction.
{"title":"Compressive Covariance Matrix Estimation from a Dual-Dispersive Coded Aperture Spectral Imager","authors":"Jonathan Monsalve, M. Márquez, I. Esnaola, H. Arguello","doi":"10.1109/ICIP42928.2021.9506077","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506077","url":null,"abstract":"Compressive covariance sampling (CCS) theory aims to recover the covariance matrix (CM) of a signal, instead of the signal itself, from a reduced set of random linear projections. Although several theoretical works demonstrate the CCS theory’s advantages in compressive spectral imaging tasks, a real optical implementation has no been proposed. Therefore, this paper proposes a compressive spectral sensing protocol for the dual-dispersive coded aperture spectral snapshot imager (DD-CASSI) to directly estimate the covariance matrix of the signal. Specifically, we propose a coded aperture design that allows recasting the vector sensing problem into matrix form, which enables to exploit the covariance matrix structure such as positive-semidefiniteness, low-rank, or Toeplitz. Additionally, a low-rank approximation of the image is reconstructed using a Principal Components Analysis (PCA) based method. In order to test the precision of the reconstruction, some spectral signatures of the image are captured with a spectrometer and compared with those obtained in the reconstruction using the covariance matrix. Results show the reconstructed spectrum is accurate with a spectral angle mapper (SAM) of less than 14°. RGB image composites of the spectral image also provide evidence of a correct color reconstruction.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115989694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506228
Huigang Zhang, Liuan Wang, Jun Sun
The mainstream object detection algorithms rely on recognizing object instances individually, but do not consider the high-level relationship among objects in context. This will inevitably lead to biased detection results, due to the lack of commonsense knowledge that humans often use to assist the task for object identification. In this paper, we present a novel reasoning module to endow the current detection systems with the power of commonsense knowledge. Specifically, we use graph attention network (GAT) to represent the knowledge among objects. The knowledge covers visual and semantic relations. Through the iterative update of GAT, the object features can be enriched. Experiments on the COCO detection benchmark indicate that our knowledge-based reasoning network has achieved consistent improvements upon various CNN detectors. We achieved 1.9 and 1.8 points higher Average Precision (AP) than Faster-RCNN and Mask-RCNN respectively, when using ResNet50-FPN as backbone.
{"title":"Knowledge-Based Reasoning Network For Object Detection","authors":"Huigang Zhang, Liuan Wang, Jun Sun","doi":"10.1109/ICIP42928.2021.9506228","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506228","url":null,"abstract":"The mainstream object detection algorithms rely on recognizing object instances individually, but do not consider the high-level relationship among objects in context. This will inevitably lead to biased detection results, due to the lack of commonsense knowledge that humans often use to assist the task for object identification. In this paper, we present a novel reasoning module to endow the current detection systems with the power of commonsense knowledge. Specifically, we use graph attention network (GAT) to represent the knowledge among objects. The knowledge covers visual and semantic relations. Through the iterative update of GAT, the object features can be enriched. Experiments on the COCO detection benchmark indicate that our knowledge-based reasoning network has achieved consistent improvements upon various CNN detectors. We achieved 1.9 and 1.8 points higher Average Precision (AP) than Faster-RCNN and Mask-RCNN respectively, when using ResNet50-FPN as backbone.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"493 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123562959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506552
Christopher B. Kuhn, M. Hofbauer, Ziqi Xu, G. Petrovic, E. Steinbach
We propose a pixel-accurate failure prediction approach for semantic video segmentation. The proposed scheme improves previously proposed failure prediction methods which so far disregarded the temporal information in videos. Our approach consists of two main steps: First, we train an LSTM-based model to detect spatio-temporal patterns that indicate pixel-wise misclassifications in the current video frame. Second, we use sequences of failure predictions to train a denoising autoencoder that both refines the current failure prediction and predicts future misclassifications. Since public data sets for this scenario are limited, we introduce the large-scale densely annotated video driving (DAVID) data set generated using the CARLA simulator. We evaluate our approach on the real-world Cityscapes data set and the simulator-based DAVID data set. Our experimental results show that spatiotemporal failure prediction outperforms single-image failure prediction by up to 8.8%. Refining the prediction using a sequence of previous failure predictions further improves the performance by a significant 15.2% and allows to accurately predict misclassifications for future frames. While we focus our study on driving videos, the proposed approach is general and can be easily used in other scenarios as well.
{"title":"Pixel-Wise Failure Prediction For Semantic Video Segmentation","authors":"Christopher B. Kuhn, M. Hofbauer, Ziqi Xu, G. Petrovic, E. Steinbach","doi":"10.1109/ICIP42928.2021.9506552","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506552","url":null,"abstract":"We propose a pixel-accurate failure prediction approach for semantic video segmentation. The proposed scheme improves previously proposed failure prediction methods which so far disregarded the temporal information in videos. Our approach consists of two main steps: First, we train an LSTM-based model to detect spatio-temporal patterns that indicate pixel-wise misclassifications in the current video frame. Second, we use sequences of failure predictions to train a denoising autoencoder that both refines the current failure prediction and predicts future misclassifications. Since public data sets for this scenario are limited, we introduce the large-scale densely annotated video driving (DAVID) data set generated using the CARLA simulator. We evaluate our approach on the real-world Cityscapes data set and the simulator-based DAVID data set. Our experimental results show that spatiotemporal failure prediction outperforms single-image failure prediction by up to 8.8%. Refining the prediction using a sequence of previous failure predictions further improves the performance by a significant 15.2% and allows to accurately predict misclassifications for future frames. While we focus our study on driving videos, the proposed approach is general and can be easily used in other scenarios as well.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123694197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506152
Mohamed Dhia Besbes, Hedi Tabia, Yousri Kessentini, Bassem Ben Hamed
Vehicle re-identification (re-ID) aims to automatically find vehicle identity from a large number of vehicle images captured from multiple cameras. Most existing vehicle re-ID approaches rely on fully supervised learning methodologies, where large amounts of annotated training data are required, which is an expensive task. In this paper, we focus our interest on semi-supervised vehicle re-ID, where each identity has a single labeled and multiple unlabeled samples in the training. We propose a framework which gradually labels vehicle images taken from surveillance cameras. Our framework is based on a deep Convolutional Neural Network (CNN), which is progressively learned using a feature anchoring regularization process. The experiments conducted on various publicly available datasets demonstrate the efficiency of our framework in re-ID tasks. Our approach with only 20% labeled data shows interesting performance compared to the state-of-the-art supervised methods trained on fully labeled data.
{"title":"Progressive Learning With Anchoring Regularization For Vehicle Re-Identification","authors":"Mohamed Dhia Besbes, Hedi Tabia, Yousri Kessentini, Bassem Ben Hamed","doi":"10.1109/ICIP42928.2021.9506152","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506152","url":null,"abstract":"Vehicle re-identification (re-ID) aims to automatically find vehicle identity from a large number of vehicle images captured from multiple cameras. Most existing vehicle re-ID approaches rely on fully supervised learning methodologies, where large amounts of annotated training data are required, which is an expensive task. In this paper, we focus our interest on semi-supervised vehicle re-ID, where each identity has a single labeled and multiple unlabeled samples in the training. We propose a framework which gradually labels vehicle images taken from surveillance cameras. Our framework is based on a deep Convolutional Neural Network (CNN), which is progressively learned using a feature anchoring regularization process. The experiments conducted on various publicly available datasets demonstrate the efficiency of our framework in re-ID tasks. Our approach with only 20% labeled data shows interesting performance compared to the state-of-the-art supervised methods trained on fully labeled data.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123960386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506355
Cristiano Santos, Mateus M. Gonçalves, G. Corrêa, M. Porto
In recent years, 3D point clouds have gained popularity thanks to technological advances such as the increased computational power and the availability of low-cost devices for acquisition of 3D information, like RGBD sensors. However, raw point clouds demand a large amount of data for their representation, and compression is mandatory to allow efficient transmission and storage. Inter-frame prediction is a widely used approach to achieve high compression rates in 2D video encoders, but the current literature still lacks solutions that efficiently exploit temporal redundancy for point cloud encoding. In this work, we propose a novel inter-frame prediction for 3D point cloud compression, which explores temporal redundancies in the 3D space. Moreover, a mode decision algorithm is also proposed to dynamically choose the best encoding mode between inter and intra prediction. The proposed method yields a bitrate reduction of 15.6% and 3.5% for geometry and luma information respectively, with no significant impact in objective quality when compared to the MPEG 3DG solution, called G-PCC.
{"title":"Block-Based Inter-Frame Prediction For Dynamic Point Cloud Compression","authors":"Cristiano Santos, Mateus M. Gonçalves, G. Corrêa, M. Porto","doi":"10.1109/ICIP42928.2021.9506355","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506355","url":null,"abstract":"In recent years, 3D point clouds have gained popularity thanks to technological advances such as the increased computational power and the availability of low-cost devices for acquisition of 3D information, like RGBD sensors. However, raw point clouds demand a large amount of data for their representation, and compression is mandatory to allow efficient transmission and storage. Inter-frame prediction is a widely used approach to achieve high compression rates in 2D video encoders, but the current literature still lacks solutions that efficiently exploit temporal redundancy for point cloud encoding. In this work, we propose a novel inter-frame prediction for 3D point cloud compression, which explores temporal redundancies in the 3D space. Moreover, a mode decision algorithm is also proposed to dynamically choose the best encoding mode between inter and intra prediction. The proposed method yields a bitrate reduction of 15.6% and 3.5% for geometry and luma information respectively, with no significant impact in objective quality when compared to the MPEG 3DG solution, called G-PCC.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125772850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506405
Vikanksh Nath, C. Chattopadhyay
Surface defect recognition of products is a necessary process to guarantee the quality of industrial production. This paper proposes a hybrid model, S2D2Net (Steel Surface Defect Diagnosis Network), for an efficient and robust inspection of the steel surface during the manufacturing process. The S2D2Net uses a pretrained ImageNet model as a feature extractor and learns a Capsule Network over the extracted features. The experimental results on a publicly available steel surface defect dataset (NEU) show that S2D2Net achieved 99.17% accuracy with minimal training data and improved by 9.59% over its closest competitor based on GAN. S2D2Net proved its robustness by achieving 94.7% accuracy on a diversity enhanced dataset, ENEU, and improved by 3.6% over its closest competitor. It has better, robust recognition performance compared to other state-of-the-art DNN-based detectors.
{"title":"S2D2Net: An Improved Approach For Robust Steel Surface Defects Diagnosis With Small Sample Learning","authors":"Vikanksh Nath, C. Chattopadhyay","doi":"10.1109/ICIP42928.2021.9506405","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506405","url":null,"abstract":"Surface defect recognition of products is a necessary process to guarantee the quality of industrial production. This paper proposes a hybrid model, S2D2Net (Steel Surface Defect Diagnosis Network), for an efficient and robust inspection of the steel surface during the manufacturing process. The S2D2Net uses a pretrained ImageNet model as a feature extractor and learns a Capsule Network over the extracted features. The experimental results on a publicly available steel surface defect dataset (NEU) show that S2D2Net achieved 99.17% accuracy with minimal training data and improved by 9.59% over its closest competitor based on GAN. S2D2Net proved its robustness by achieving 94.7% accuracy on a diversity enhanced dataset, ENEU, and improved by 3.6% over its closest competitor. It has better, robust recognition performance compared to other state-of-the-art DNN-based detectors.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125886062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506661
Parnian Afshar, Shahin Heidarian, F. Naderkhani, M. Rafiee, A. Oikonomou, K. Plataniotis, Arash Mohammadi
The unprecedented COVID-19 pandemic has been remarkably impacting the world and influencing a broad aspect of people’s lives since its first emergence in late 2019. The highly contagious nature of the COVID-19 has raised the necessity of developing deep learning-based diagnostic tools to identify the infected cases in the early stages. Recently, we proposed a fully-automated framework based on Capsule Networks, referred to as the CT-CAPS, to distinguish COVID-19 infection from normal and Community Acquired Pneumonia (CAP) cases using chest Computed Tomography (CT) scans. Although CT scans can provide a comprehensive illustration of the lung abnormalities, COVID-19 lung manifestations highly overlap with the CAP findings making their identification challenging even for experienced radiologists. Here, the CT-CAPS is augmented with a wide range of clinical/demographic data, including patients’ gender, age, weight and symptoms. More specifically, we propose a hybrid deep learning model that utilizes both clinical/demographic data and CT scans to classify COVID-19 and non-COVID cases using a Random Forest Classifier. The proposed hybrid model specifies the most important predictive factors increasing the explainability of the model. The experimental results show that the proposed hybrid model improves the CT-CAPS performance, achieving accuracy of 90.8%, sensitivity of 94.5% and specificity of 86.0%.
{"title":"Hybrid Deep Learning Model For Diagnosis Of Covid-19 Using Ct Scans And Clinical/Demographic Data","authors":"Parnian Afshar, Shahin Heidarian, F. Naderkhani, M. Rafiee, A. Oikonomou, K. Plataniotis, Arash Mohammadi","doi":"10.1109/ICIP42928.2021.9506661","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506661","url":null,"abstract":"The unprecedented COVID-19 pandemic has been remarkably impacting the world and influencing a broad aspect of people’s lives since its first emergence in late 2019. The highly contagious nature of the COVID-19 has raised the necessity of developing deep learning-based diagnostic tools to identify the infected cases in the early stages. Recently, we proposed a fully-automated framework based on Capsule Networks, referred to as the CT-CAPS, to distinguish COVID-19 infection from normal and Community Acquired Pneumonia (CAP) cases using chest Computed Tomography (CT) scans. Although CT scans can provide a comprehensive illustration of the lung abnormalities, COVID-19 lung manifestations highly overlap with the CAP findings making their identification challenging even for experienced radiologists. Here, the CT-CAPS is augmented with a wide range of clinical/demographic data, including patients’ gender, age, weight and symptoms. More specifically, we propose a hybrid deep learning model that utilizes both clinical/demographic data and CT scans to classify COVID-19 and non-COVID cases using a Random Forest Classifier. The proposed hybrid model specifies the most important predictive factors increasing the explainability of the model. The experimental results show that the proposed hybrid model improves the CT-CAPS performance, achieving accuracy of 90.8%, sensitivity of 94.5% and specificity of 86.0%.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125968349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506239
Gozde Sahin, L. Itti
Occlusion handling is one of the important challenges in the field of visual tracking, especially for real-time applications, where further processing for occlusion reasoning may not always be possible. In this paper, an occlusion-aware real-time object tracker is proposed, which enhances the baseline SiamRPN model with an additional branch that directly predicts the occlusion level of the object. Experimental results on GOT-10k and VOT benchmarks show that learning to predict occlusion levels end-to-end in this multi-task learning framework helps improve tracking accuracy, especially on frames that contain occlusions. Up to 7% improvement on EAO scores can be observed for occluded frames, which are only 11% of the data. The performance results over all frames also indicate the model does favorably compared to the other trackers.
{"title":"Multi-Task Occlusion Learning for Real-Time Visual Object Tracking","authors":"Gozde Sahin, L. Itti","doi":"10.1109/ICIP42928.2021.9506239","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506239","url":null,"abstract":"Occlusion handling is one of the important challenges in the field of visual tracking, especially for real-time applications, where further processing for occlusion reasoning may not always be possible. In this paper, an occlusion-aware real-time object tracker is proposed, which enhances the baseline SiamRPN model with an additional branch that directly predicts the occlusion level of the object. Experimental results on GOT-10k and VOT benchmarks show that learning to predict occlusion levels end-to-end in this multi-task learning framework helps improve tracking accuracy, especially on frames that contain occlusions. Up to 7% improvement on EAO scores can be observed for occluded frames, which are only 11% of the data. The performance results over all frames also indicate the model does favorably compared to the other trackers.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124765150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506159
Haichao Cao, Shiliang Pu, Wenming Tan
The accurate segmentation of breast masses in mammography images is a key step in the diagnosis of early breast cancer. To solve the problem of various shapes and sizes of breast masses, this paper proposes a cascaded UNet architecture, which is referred to as CasUNet. CasUNet contains six UNet subnetworks, the network depth increases from 1 to 6, and the output features between adjacent subnetworks are cascaded. Furthermore, we have integrated the channel attention mechanism based on CasUNet, hoping that it can focus on the important feature maps. Aiming at the problem that the edges of irregular breast masses are difficult to segment, a multi-stage cascaded training method is presented, which can gradually expand the context information of breast masses to assist the training of the segmentation model. To alleviate the problem of fewer training samples, a data augmentation method for background migration is proposed. This method transfers the background of the unlabeled samples to the labeled samples through the histogram specification technique, thereby improving the diversity of the training data. The above method has been experimentally verified on two datasets, INbreast and DDSM. Experimental results show that the proposed method can obtain competitive segmentation performance.
{"title":"A Novel Method For Segmentation Of Breast Masses Based On Mammography Images","authors":"Haichao Cao, Shiliang Pu, Wenming Tan","doi":"10.1109/ICIP42928.2021.9506159","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506159","url":null,"abstract":"The accurate segmentation of breast masses in mammography images is a key step in the diagnosis of early breast cancer. To solve the problem of various shapes and sizes of breast masses, this paper proposes a cascaded UNet architecture, which is referred to as CasUNet. CasUNet contains six UNet subnetworks, the network depth increases from 1 to 6, and the output features between adjacent subnetworks are cascaded. Furthermore, we have integrated the channel attention mechanism based on CasUNet, hoping that it can focus on the important feature maps. Aiming at the problem that the edges of irregular breast masses are difficult to segment, a multi-stage cascaded training method is presented, which can gradually expand the context information of breast masses to assist the training of the segmentation model. To alleviate the problem of fewer training samples, a data augmentation method for background migration is proposed. This method transfers the background of the unlabeled samples to the labeled samples through the histogram specification technique, thereby improving the diversity of the training data. The above method has been experimentally verified on two datasets, INbreast and DDSM. Experimental results show that the proposed method can obtain competitive segmentation performance.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124815449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}