Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506732
Sungjoon Yoon, Kyujin Shim, Kayoung Park, Changick Kim
Multiple object tracking (MOT), a popular subject in computer vision with broad application areas, aims to detect and track multiple objects across an input video. However, recent learning-based MOT methods require strong supervision on both the bounding box and the ID of each object for every frame used during training, which induces a heightened cost for obtaining labeled data. In this paper, we propose a weakly-supervised MOT framework that enables the accurate tracking of multiple objects while being trained without object ID ground truth labels. Our model is trained only with the bounding box information with a novel masked warping loss that drives the network to indirectly learn how to track objects through a video. Specifically, valid object center points in the current frame are warped with the predicted offset vector and enforced to be equal to the valid object center points in the previous frame. With this approach, we obtain an MOT accuracy on par with those of the state-of-the-art fully supervised MOT models, which use both the bounding boxes and object ID as ground truth labels, on the MOT17 dataset.
{"title":"Weakly-Supervised Multiple Object Tracking Via A Masked Center Point Warping Loss","authors":"Sungjoon Yoon, Kyujin Shim, Kayoung Park, Changick Kim","doi":"10.1109/ICIP42928.2021.9506732","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506732","url":null,"abstract":"Multiple object tracking (MOT), a popular subject in computer vision with broad application areas, aims to detect and track multiple objects across an input video. However, recent learning-based MOT methods require strong supervision on both the bounding box and the ID of each object for every frame used during training, which induces a heightened cost for obtaining labeled data. In this paper, we propose a weakly-supervised MOT framework that enables the accurate tracking of multiple objects while being trained without object ID ground truth labels. Our model is trained only with the bounding box information with a novel masked warping loss that drives the network to indirectly learn how to track objects through a video. Specifically, valid object center points in the current frame are warped with the predicted offset vector and enforced to be equal to the valid object center points in the previous frame. With this approach, we obtain an MOT accuracy on par with those of the state-of-the-art fully supervised MOT models, which use both the bounding boxes and object ID as ground truth labels, on the MOT17 dataset.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"84 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113992394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, image quality is generally described by a mean opinion score (MOS). However, we observe that an image’s quality ratings given by a group of subjects may not follow a Gaussian distribution and the image quality can not be fully described by a MOS. In this paper, we propose to describe the image quality using a parameterized distribution rather than a MOS, and an objective method is also proposed to predict the image quality score distribution (IQSD). Specifically, we selected 100 images from the LIVE database and invited a large group of subjects to evaluate the quality of these images. By analyzing the subjective quality ratings, we find that the IQSD can be well modeled by an alpha stable model and this model can reflect much more information than MOS. Therefore, we propose an algorithm to model the IQSD described by an alpha stable model. Features are extracted from images based on natural scene statistics and support vector regressors are trained to predict the IQSD described by an alpha stable model. We validate the proposed IQSD prediction model on the collected subjective quality ratings. Experimental results verify the effectiveness of the proposed algorithm in modeling the IQSD.
{"title":"Modeling Image Quality Score Distribution Using Alpha Stable Model","authors":"Yixuan Gao, Xiongkuo Min, Wenhan Zhu, Xiao-Ping Zhang, Guangtao Zhai","doi":"10.1109/ICIP42928.2021.9506196","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506196","url":null,"abstract":"In recent years, image quality is generally described by a mean opinion score (MOS). However, we observe that an image’s quality ratings given by a group of subjects may not follow a Gaussian distribution and the image quality can not be fully described by a MOS. In this paper, we propose to describe the image quality using a parameterized distribution rather than a MOS, and an objective method is also proposed to predict the image quality score distribution (IQSD). Specifically, we selected 100 images from the LIVE database and invited a large group of subjects to evaluate the quality of these images. By analyzing the subjective quality ratings, we find that the IQSD can be well modeled by an alpha stable model and this model can reflect much more information than MOS. Therefore, we propose an algorithm to model the IQSD described by an alpha stable model. Features are extracted from images based on natural scene statistics and support vector regressors are trained to predict the IQSD described by an alpha stable model. We validate the proposed IQSD prediction model on the collected subjective quality ratings. Experimental results verify the effectiveness of the proposed algorithm in modeling the IQSD.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"360 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124530462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506118
Ankit Yadu, P. Suhas, N. Sinha
A singular problem that mars the wide applicability of machine learning (ML) models is the lack of generalizability and interpretability. The ML community is increasingly working on bridging this gap. Prominent among them are methods that study causal significance of features, with techniques such as Average Causal Effect (ACE). In this paper, our objective is to utilize the causal analysis framework to measure the significance level of the features in binary classification task. Towards this, we propose a novel ACE-based metric called “Absolute area under ACE (A-ACE)” which computes the area of the absolute value of the ACE across different permissible levels of intervention. The performance of the proposed metric is illustrated on (i) ILSVRC (Imagenet) dataset and (ii) MNIST data set $(sim 42000$ images) by considering pair-wise binary classification problem. Encouraging results have been observed on these two datasets. The computed metric values are found to be higher - peak performance of 10x higher than other for ILSVRC dataset and 50% higher than others for MNIST dataset - at precisely those locations that human intuition would mark as distinguishing regions. The method helps to capture the quantifiable metric which represents the distinction between the classes learnt by the model. This metric aids in visual explanation of the model’s prediction and thus, makes the model more trustworthy.
{"title":"Class Specific Interpretability in CNN Using Causal Analysis","authors":"Ankit Yadu, P. Suhas, N. Sinha","doi":"10.1109/ICIP42928.2021.9506118","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506118","url":null,"abstract":"A singular problem that mars the wide applicability of machine learning (ML) models is the lack of generalizability and interpretability. The ML community is increasingly working on bridging this gap. Prominent among them are methods that study causal significance of features, with techniques such as Average Causal Effect (ACE). In this paper, our objective is to utilize the causal analysis framework to measure the significance level of the features in binary classification task. Towards this, we propose a novel ACE-based metric called “Absolute area under ACE (A-ACE)” which computes the area of the absolute value of the ACE across different permissible levels of intervention. The performance of the proposed metric is illustrated on (i) ILSVRC (Imagenet) dataset and (ii) MNIST data set $(sim 42000$ images) by considering pair-wise binary classification problem. Encouraging results have been observed on these two datasets. The computed metric values are found to be higher - peak performance of 10x higher than other for ILSVRC dataset and 50% higher than others for MNIST dataset - at precisely those locations that human intuition would mark as distinguishing regions. The method helps to capture the quantifiable metric which represents the distinction between the classes learnt by the model. This metric aids in visual explanation of the model’s prediction and thus, makes the model more trustworthy.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124536793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506647
Jianjun Lei, Ting Guo, Bo Peng, Chuanbo Yu
In the past few years, monocular 3D object detection has attracted increasing attention due to the merit of low cost and wide range of applications. In this paper, a depth-assisted joint detection network (MonoDAJD) is proposed for monocular 3D object detection. Specifically, a consistency-aware joint detection mechanism is proposed to jointly detect objects in the image and depth map, and exploit the localization information from the depth detection stream to optimize the detection results. To obtain more accurate 3D bounding boxes, an orientation-embedded NMS is designed by introducing the orientation confidence prediction and embedding the orientation confidence into the traditional NMS. Experimental results on the widely used KITTI benchmark demonstrate that the proposed method achieves promising performance compared with the state-of-the-art monocular 3D object detection methods.
{"title":"Depth-Assisted Joint Detection Network For Monocular 3d Object Detection","authors":"Jianjun Lei, Ting Guo, Bo Peng, Chuanbo Yu","doi":"10.1109/ICIP42928.2021.9506647","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506647","url":null,"abstract":"In the past few years, monocular 3D object detection has attracted increasing attention due to the merit of low cost and wide range of applications. In this paper, a depth-assisted joint detection network (MonoDAJD) is proposed for monocular 3D object detection. Specifically, a consistency-aware joint detection mechanism is proposed to jointly detect objects in the image and depth map, and exploit the localization information from the depth detection stream to optimize the detection results. To obtain more accurate 3D bounding boxes, an orientation-embedded NMS is designed by introducing the orientation confidence prediction and embedding the orientation confidence into the traditional NMS. Experimental results on the widely used KITTI benchmark demonstrate that the proposed method achieves promising performance compared with the state-of-the-art monocular 3D object detection methods.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124222499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506117
T. N. Huu, V. V. Duong, B. Jeon
The motion estimation which plays an important role in video coding requires much computation for encoding. In this paper, from the ray motion characteristics in the lenslet plenoptic video, we derive a new motion search model and propose a fast and efficient microlens-based motion search method. Theoretical analysis and experimental results have verified the new model and demonstrated its efficiency in search. Under the HEVC random-access configuration, we achieve not only substantial encoding time reduction (56.7%), but also bitrate saving of 1.3% on average compared to relevant existing works. Under the low delay configuration, the performances are 23.3% and 2.3%, respectively for encoding time reduction and bitrate saving.
{"title":"FAST and Efficient Microlens-Based Motion Search for Plenoptic Video Coding","authors":"T. N. Huu, V. V. Duong, B. Jeon","doi":"10.1109/ICIP42928.2021.9506117","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506117","url":null,"abstract":"The motion estimation which plays an important role in video coding requires much computation for encoding. In this paper, from the ray motion characteristics in the lenslet plenoptic video, we derive a new motion search model and propose a fast and efficient microlens-based motion search method. Theoretical analysis and experimental results have verified the new model and demonstrated its efficiency in search. Under the HEVC random-access configuration, we achieve not only substantial encoding time reduction (56.7%), but also bitrate saving of 1.3% on average compared to relevant existing works. Under the low delay configuration, the performances are 23.3% and 2.3%, respectively for encoding time reduction and bitrate saving.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127706131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506495
O. Dehzangi, P. Jeihouni, V. Finomore, A. Rezai
Due to easy transmission of the COVID-19, a crucial step is the effective screening of the front-line caregivers are one of the most vulnerable populations for early signs and symptoms, resembling the onset of the disease. Our aim in this paper is to track a combination of biomarkers in our ubiquitous experimental setup to monitor the human participants’ operating system to predict the likelihood of the viral infection symptoms during the next 2 days using a mobile app, and an unobtrusive wearable ring to track their physiological indicators and self-reported symptoms. we propose a multi-resolution signal processing and modeling method to effectively characterize the changes in those physiological indicators. In this way, we decompose the 1-D input windowed time-series in multi-resolution (i.e. 2-D spectro-temporal) space. Then, we fitted our proposed deep learning architecture that combines recurrent neural network (RNN) and convolutional neural network (CNN) to incorporate and model the sequence of multi-resolution snapshots in 3-D time-series space. The CNN is used to objectify the underlying features in each of the 2D spectro-temporal snapshots, while the RNN is utilized to track the temporal dynamic behavior of the snapshot sequences to predict the patients’ COVID-19 related symptoms. As the experimental results show, our proposed architecture with the best configuration achieves 87.53% and 95.12% average accuracy in predicting the COVID-19 related symptoms.
{"title":"Physiological Monitoring Of Front-Line Caregivers For Cv-19 Symptoms: Multi-Resolution Analysis & Convolutional-Recurrent Networks","authors":"O. Dehzangi, P. Jeihouni, V. Finomore, A. Rezai","doi":"10.1109/ICIP42928.2021.9506495","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506495","url":null,"abstract":"Due to easy transmission of the COVID-19, a crucial step is the effective screening of the front-line caregivers are one of the most vulnerable populations for early signs and symptoms, resembling the onset of the disease. Our aim in this paper is to track a combination of biomarkers in our ubiquitous experimental setup to monitor the human participants’ operating system to predict the likelihood of the viral infection symptoms during the next 2 days using a mobile app, and an unobtrusive wearable ring to track their physiological indicators and self-reported symptoms. we propose a multi-resolution signal processing and modeling method to effectively characterize the changes in those physiological indicators. In this way, we decompose the 1-D input windowed time-series in multi-resolution (i.e. 2-D spectro-temporal) space. Then, we fitted our proposed deep learning architecture that combines recurrent neural network (RNN) and convolutional neural network (CNN) to incorporate and model the sequence of multi-resolution snapshots in 3-D time-series space. The CNN is used to objectify the underlying features in each of the 2D spectro-temporal snapshots, while the RNN is utilized to track the temporal dynamic behavior of the snapshot sequences to predict the patients’ COVID-19 related symptoms. As the experimental results show, our proposed architecture with the best configuration achieves 87.53% and 95.12% average accuracy in predicting the COVID-19 related symptoms.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126553712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506170
Shah Hassan, Abhijit Mahalanobis
The detection of vehicular targets in infra-red imagery is a challenging task, both due to the relatively few pixels on target and the false alarms produced by the surrounding terrain clutter. It has been previously shown [1] that a relatively simple network (known as TCRNet) can outperform conventional deep CNNs for such applications by maximizing a target to clutter ratio (TCR) metric. In this paper, we introduce a new form of the network (referred to as TCRNet-2) that further improves the performance by first processing target and clutter information in two parallel channels and then combining them to optimize the TCR metric. We also show that the overall performance can be considerably improved by boosting the performance of a primary TCRNet-2 detector, with a secondary network that enhances discrimination between targets and clutter in the false alarm space of the primary network. We analyze the performance of the proposed networks using a publicly available data set of infra-red images of targets in natural terrain. It is shown that the TCRNet-2 and its boosted version yield considerably better performance than the original TCRNet over a wide range of distances, in both day and night conditions.
{"title":"Two-Stream Boosted TCRNet for Range-Tolerant Infra-Red Target Detection","authors":"Shah Hassan, Abhijit Mahalanobis","doi":"10.1109/ICIP42928.2021.9506170","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506170","url":null,"abstract":"The detection of vehicular targets in infra-red imagery is a challenging task, both due to the relatively few pixels on target and the false alarms produced by the surrounding terrain clutter. It has been previously shown [1] that a relatively simple network (known as TCRNet) can outperform conventional deep CNNs for such applications by maximizing a target to clutter ratio (TCR) metric. In this paper, we introduce a new form of the network (referred to as TCRNet-2) that further improves the performance by first processing target and clutter information in two parallel channels and then combining them to optimize the TCR metric. We also show that the overall performance can be considerably improved by boosting the performance of a primary TCRNet-2 detector, with a secondary network that enhances discrimination between targets and clutter in the false alarm space of the primary network. We analyze the performance of the proposed networks using a publicly available data set of infra-red images of targets in natural terrain. It is shown that the TCRNet-2 and its boosted version yield considerably better performance than the original TCRNet over a wide range of distances, in both day and night conditions.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128108658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506313
Yankun Ren, Longfei Li, Jun Zhou
Recent researches indicate deep learning models are vulnerable to adversarial attacks. Backdoor attack, also called trojan attack, is a variant of adversarial attacks. An malicious attacker can inject backdoor to models in training phase. As a result, the backdoor model performs normally on clean samples and can be triggered by a backdoor pattern to recognize backdoor samples as a wrong target label specified by the attacker. However, the vanilla backdoor attack method causes a measurable difference between clean and backdoor samples in latent space. Several state-of-the-art defense methods utilize this to identify backdoor samples. In this paper, we propose a novel backdoor attack method called SimTrojan, which aims to inject backdoor in models stealthily. Specifically, SimTrojan makes clean and backdoor samples have indistinguishable representations in latent space to evade current defense methods. Experiments demonstrate that SimTrojan achieves a high attack success rate and is undetectable by state-of-the-art defense methods. The study suggests the urgency of building more effective defense methods.
{"title":"Simtrojan: Stealthy Backdoor Attack","authors":"Yankun Ren, Longfei Li, Jun Zhou","doi":"10.1109/ICIP42928.2021.9506313","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506313","url":null,"abstract":"Recent researches indicate deep learning models are vulnerable to adversarial attacks. Backdoor attack, also called trojan attack, is a variant of adversarial attacks. An malicious attacker can inject backdoor to models in training phase. As a result, the backdoor model performs normally on clean samples and can be triggered by a backdoor pattern to recognize backdoor samples as a wrong target label specified by the attacker. However, the vanilla backdoor attack method causes a measurable difference between clean and backdoor samples in latent space. Several state-of-the-art defense methods utilize this to identify backdoor samples. In this paper, we propose a novel backdoor attack method called SimTrojan, which aims to inject backdoor in models stealthily. Specifically, SimTrojan makes clean and backdoor samples have indistinguishable representations in latent space to evade current defense methods. Experiments demonstrate that SimTrojan achieves a high attack success rate and is undetectable by state-of-the-art defense methods. The study suggests the urgency of building more effective defense methods.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125628202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506614
R. Yasarla, Vishal M. Patel
Atmospheric turbulence can significantly degrade the quality of images acquired by long-range imaging systems by causing spatially and temporally random fluctuations in the index of refraction of the atmosphere. Variations in the refractive index causes the captured images to be geometrically distorted and blurry. Hence, it is important to compensate for the visual degradation in images caused by atmospheric turbulence. In this paper, we propose a deep learning-based approach for restring a single image degraded by atmospheric turbulence. We make use of the epistemic uncertainty based on Monte Carlo dropouts to capture regions in the image where the network is having hard time restoring. The estimated uncertainty maps are then used to guide the network to obtain the restored image. Extensive experiments are conducted on synthetic and real images to show the significance of the proposed work.
{"title":"Learning to Restore Images Degraded by Atmospheric Turbulence Using Uncertainty","authors":"R. Yasarla, Vishal M. Patel","doi":"10.1109/ICIP42928.2021.9506614","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506614","url":null,"abstract":"Atmospheric turbulence can significantly degrade the quality of images acquired by long-range imaging systems by causing spatially and temporally random fluctuations in the index of refraction of the atmosphere. Variations in the refractive index causes the captured images to be geometrically distorted and blurry. Hence, it is important to compensate for the visual degradation in images caused by atmospheric turbulence. In this paper, we propose a deep learning-based approach for restring a single image degraded by atmospheric turbulence. We make use of the epistemic uncertainty based on Monte Carlo dropouts to capture regions in the image where the network is having hard time restoring. The estimated uncertainty maps are then used to guide the network to obtain the restored image. Extensive experiments are conducted on synthetic and real images to show the significance of the proposed work.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"0104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125741304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506359
Yongbin Liao, Hongyuan Zhu, Tao Chen, Jiayuan Fan
Point cloud instance segmentation is steadily improving with the development of deep learning. However, current progress is hindered by the expensive cost of collecting dense point cloud labels. To this end, we propose the first semi-supervised point cloud instance segmentation architecture, which is called semi-supervised point cloud instance segmentation with perturbation consistency regularization (SPCR). It is capable to alleviate the data-hungry bottleneck of existing strongly supervised methods. Specifically, SPCR enforces an invariance of the predictions over different perturbations applied to the input point clouds. We firstly introduce various perturbation schemes on inputs to force the network to be robust and easily generalized to the unseen and unlabeled data. Further, perturbation consistency regularization is then conducted on predicted instance masks from various transformed inputs to provide self-supervision for network learning. Extensive experiments on the challenging ScanNet v2 dataset demonstrate our method can achieve competitive performance compared with the state-of-the-art of fully supervised methods.
{"title":"Spcr: semi-supervised point cloud instance segmentation with perturbation consistency regularization","authors":"Yongbin Liao, Hongyuan Zhu, Tao Chen, Jiayuan Fan","doi":"10.1109/ICIP42928.2021.9506359","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506359","url":null,"abstract":"Point cloud instance segmentation is steadily improving with the development of deep learning. However, current progress is hindered by the expensive cost of collecting dense point cloud labels. To this end, we propose the first semi-supervised point cloud instance segmentation architecture, which is called semi-supervised point cloud instance segmentation with perturbation consistency regularization (SPCR). It is capable to alleviate the data-hungry bottleneck of existing strongly supervised methods. Specifically, SPCR enforces an invariance of the predictions over different perturbations applied to the input point clouds. We firstly introduce various perturbation schemes on inputs to force the network to be robust and easily generalized to the unseen and unlabeled data. Further, perturbation consistency regularization is then conducted on predicted instance masks from various transformed inputs to provide self-supervision for network learning. Extensive experiments on the challenging ScanNet v2 dataset demonstrate our method can achieve competitive performance compared with the state-of-the-art of fully supervised methods.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132480191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}