Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9190896
Xuelong Xu, Xiangfeng Luo, Liyan Ma
Multi-scale object detection involves classification and regression assignments of objects with variable scales from an image. How to extract discriminative features is a key point for multi-scale object detection. Recent detectors simply fuse pyramidal features extracted from ConvNets, which does not take full advantage of useful features and drop out redundant features. To address this problem, we propose Context-Aware Hierarchical Feature Attention Network (CHFANet) to focus on effective multi-scale feature extraction for object detection. Based on single shot multibox detector (SSD) framework, the CHFANet consists of two components: the context-aware feature extraction (CFE) module to capture rich multi-scale context features and the hierarchical feature fusion (HFF) module followed with the channel-wise attention model to generate deeply fused attentive features. On the Pascal VOC benchmark, our CHFANet can achieve 82.6% mAP. Extensive experiments demonstrate that the CHFANet outperforms a lot of state-of-the-art object detectors in accuracy without any bells and whistles.
{"title":"Context-Aware Hierarchical Feature Attention Network For Multi-Scale Object Detection","authors":"Xuelong Xu, Xiangfeng Luo, Liyan Ma","doi":"10.1109/ICIP40778.2020.9190896","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190896","url":null,"abstract":"Multi-scale object detection involves classification and regression assignments of objects with variable scales from an image. How to extract discriminative features is a key point for multi-scale object detection. Recent detectors simply fuse pyramidal features extracted from ConvNets, which does not take full advantage of useful features and drop out redundant features. To address this problem, we propose Context-Aware Hierarchical Feature Attention Network (CHFANet) to focus on effective multi-scale feature extraction for object detection. Based on single shot multibox detector (SSD) framework, the CHFANet consists of two components: the context-aware feature extraction (CFE) module to capture rich multi-scale context features and the hierarchical feature fusion (HFF) module followed with the channel-wise attention model to generate deeply fused attentive features. On the Pascal VOC benchmark, our CHFANet can achieve 82.6% mAP. Extensive experiments demonstrate that the CHFANet outperforms a lot of state-of-the-art object detectors in accuracy without any bells and whistles.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128212949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9190669
Krishna Sumanth Vengala, Rama Krishna Sai Subrahmanyam Gorthi
Phase reconstruction in Digital Holographic Interferometry (DHI) is widely employed for 3D deformation measurements of the object surfaces. The key challenge in phase reconstruction in DHI is in the estimation of the absolute phase from noisy reconstructed interference fringes. In this paper, we propose a novel efficient deep learning approach for the phase estimation from noisy interference fringes in DHI. The proposed approach takes noisy reconstructed interference fringes as input and estimates the 3D deformation field or the object surface profile as the output. The 3D deformation field measurement of the object is posed as the absolute phase estimation from the noisy wrapped phase, that can be obtained from the reconstructed interference fringes through arctan function. The proposed deep neural network is trained to predict the fringe-order through a fully convolutional semantic segmentation network, from the noisy wrapped phase. These predictions are improved by simultaneously minimizing the regression error between the true phase corresponding to the object deformation field and the estimated absolute phase considering the predicted fringe order. We compare our method with conventional methods as well as with the recent state-of-the-art deep learning phase unwrapping methods. The proposed method outperforms conventional approaches by a large margin, while we can observe significant improvement even with respect to recently proposed deep learning-based phase unwrapping methods, in the presence of noise as high as 0dB to -5dB.
{"title":"A Deep Learning Framework for 3D Surface Profiling of the Objects Using Digital Holographic Interferometry","authors":"Krishna Sumanth Vengala, Rama Krishna Sai Subrahmanyam Gorthi","doi":"10.1109/ICIP40778.2020.9190669","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190669","url":null,"abstract":"Phase reconstruction in Digital Holographic Interferometry (DHI) is widely employed for 3D deformation measurements of the object surfaces. The key challenge in phase reconstruction in DHI is in the estimation of the absolute phase from noisy reconstructed interference fringes. In this paper, we propose a novel efficient deep learning approach for the phase estimation from noisy interference fringes in DHI. The proposed approach takes noisy reconstructed interference fringes as input and estimates the 3D deformation field or the object surface profile as the output. The 3D deformation field measurement of the object is posed as the absolute phase estimation from the noisy wrapped phase, that can be obtained from the reconstructed interference fringes through arctan function. The proposed deep neural network is trained to predict the fringe-order through a fully convolutional semantic segmentation network, from the noisy wrapped phase. These predictions are improved by simultaneously minimizing the regression error between the true phase corresponding to the object deformation field and the estimated absolute phase considering the predicted fringe order. We compare our method with conventional methods as well as with the recent state-of-the-art deep learning phase unwrapping methods. The proposed method outperforms conventional approaches by a large margin, while we can observe significant improvement even with respect to recently proposed deep learning-based phase unwrapping methods, in the presence of noise as high as 0dB to -5dB.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127341272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9191049
Shuyu Miao, Rui Feng, Yuejie Zhang
There are two kinds of detection heads in object detection frameworks. Between them, the heads based on full connection contribute to mapping the learned feature representation to the sample label space, while the heads based on full convolution facilitate preserving location sensitivity information. However, to enjoy the benefits from both detection heads is still underexplored. In this paper, we propose a generalized Representation Reconstruction Head (RRHead) to break through the limitation that most detection heads focus on unilateral self-advantage while ignoring another one. RRHead enhances multi scale feature representation for better feature mapping, and employs location sensitivity representation for better location preservation. These optimize fully-convolutional-based heads and fully-connected-based heads separately. RRHead can be embedded in existing detection frameworks to heighten the rationality and reliability of the detection head representation without any additional modification. Extensive experiments show that our proposed RRHead improves the detection performance of the existing frameworks by a large margin on several challenging benchmarks, and achieves new state-of-the-art performance.
{"title":"Representation Reconstruction Head for Object Detection","authors":"Shuyu Miao, Rui Feng, Yuejie Zhang","doi":"10.1109/ICIP40778.2020.9191049","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191049","url":null,"abstract":"There are two kinds of detection heads in object detection frameworks. Between them, the heads based on full connection contribute to mapping the learned feature representation to the sample label space, while the heads based on full convolution facilitate preserving location sensitivity information. However, to enjoy the benefits from both detection heads is still underexplored. In this paper, we propose a generalized Representation Reconstruction Head (RRHead) to break through the limitation that most detection heads focus on unilateral self-advantage while ignoring another one. RRHead enhances multi scale feature representation for better feature mapping, and employs location sensitivity representation for better location preservation. These optimize fully-convolutional-based heads and fully-connected-based heads separately. RRHead can be embedded in existing detection frameworks to heighten the rationality and reliability of the detection head representation without any additional modification. Extensive experiments show that our proposed RRHead improves the detection performance of the existing frameworks by a large margin on several challenging benchmarks, and achieves new state-of-the-art performance.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129933038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9190937
P. Paramonov, Lars-Paul Lumbeeck, J. D. Beenhouwer, Jan Sijbers
In this paper, we present an approach to realistically simulate terahertz (THz) transmission mode imaging. We model the THz beam shape and account for the refraction of the THz beam at the different media interfaces using ray optics. Our approach does not require prior knowledge on the interfaces, instead it utilizes the refractive index scalar field. We study the beam shape and refraction effects separately by comparing resulting sinograms with the ones simulated by a Gaussian beam model, as well as with a real acquisition of a plastic object. The proposed forward projection can be utilized in iterative reconstruction algorithms to improve the quality of THz CT images.
{"title":"Accurate Terahertz Imaging Simulation With Ray Tracing Incorporating Beam Shape and Refraction","authors":"P. Paramonov, Lars-Paul Lumbeeck, J. D. Beenhouwer, Jan Sijbers","doi":"10.1109/ICIP40778.2020.9190937","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190937","url":null,"abstract":"In this paper, we present an approach to realistically simulate terahertz (THz) transmission mode imaging. We model the THz beam shape and account for the refraction of the THz beam at the different media interfaces using ray optics. Our approach does not require prior knowledge on the interfaces, instead it utilizes the refractive index scalar field. We study the beam shape and refraction effects separately by comparing resulting sinograms with the ones simulated by a Gaussian beam model, as well as with a real acquisition of a plastic object. The proposed forward projection can be utilized in iterative reconstruction algorithms to improve the quality of THz CT images.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129006872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9190873
Mingyang Li, Yongni Li, Shao-Lun Huang, Lin Zhang
With the rapid growth of multimedia data, the cross-modal retrieval problem has attracted a lot of interest in both research and industry in recent years. However, the inconsistency of data distribution from different modalities makes such task challenging. In this paper, we propose Semantically Supervised Maximal Correlation (S2MC) method for cross-modal retrieval by incorporating semantic label information into the traditional maximal correlation framework. Combining with maximal correlation based method for extracting unsupervised pairing information, our method effectively exploits supervised semantic information on both common feature space and label space. Extensive experiments show that our method outperforms other current state-of-the-art methods on cross-modal retrieval tasks on three widely used datasets.
{"title":"Semantically Supervised Maximal Correlation For Cross-Modal Retrieval","authors":"Mingyang Li, Yongni Li, Shao-Lun Huang, Lin Zhang","doi":"10.1109/ICIP40778.2020.9190873","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190873","url":null,"abstract":"With the rapid growth of multimedia data, the cross-modal retrieval problem has attracted a lot of interest in both research and industry in recent years. However, the inconsistency of data distribution from different modalities makes such task challenging. In this paper, we propose Semantically Supervised Maximal Correlation (S2MC) method for cross-modal retrieval by incorporating semantic label information into the traditional maximal correlation framework. Combining with maximal correlation based method for extracting unsupervised pairing information, our method effectively exploits supervised semantic information on both common feature space and label space. Extensive experiments show that our method outperforms other current state-of-the-art methods on cross-modal retrieval tasks on three widely used datasets.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129160416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9191327
Yue Hu, Disi Lin, Kuangshi Zhao
Four-dimensional magnetic resonance imaging (4D-MRI) can provide 3D tissue properties and the temporal profiles at the same time. However, further applications of 4D-MRI is limited by the long acquisition time and motion artifacts. We introduce a regularized image reconstruction method to recover 4D MR images from their undersampled Fourier coefficients, named HDTV-LLR. We adopt the three-dimensional higher degree total variation and the local low-rank penalties to simultaneously exploit the spatial and temporal correlations of the dataset. In order to solve the resulting optimization problem efficiently, we propose a fast alternating minimization algorithm. The performance of the proposed method is demonstrated in the context of 4D cardiac MR images reconstruction with undersampling factors of 12 and 16. The proposed method is compared with iGRASP, and schemes using either low-rank or sparsity constraint alone. Numerical results show that the proposed method enables accelerated 4D-MRI with improved image quality and reduced artifacts.
{"title":"Accelerated 4d Mr Image Reconstruction Using Joint Higher Degree Total Variation And Local Low-Rank Constraints","authors":"Yue Hu, Disi Lin, Kuangshi Zhao","doi":"10.1109/ICIP40778.2020.9191327","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191327","url":null,"abstract":"Four-dimensional magnetic resonance imaging (4D-MRI) can provide 3D tissue properties and the temporal profiles at the same time. However, further applications of 4D-MRI is limited by the long acquisition time and motion artifacts. We introduce a regularized image reconstruction method to recover 4D MR images from their undersampled Fourier coefficients, named HDTV-LLR. We adopt the three-dimensional higher degree total variation and the local low-rank penalties to simultaneously exploit the spatial and temporal correlations of the dataset. In order to solve the resulting optimization problem efficiently, we propose a fast alternating minimization algorithm. The performance of the proposed method is demonstrated in the context of 4D cardiac MR images reconstruction with undersampling factors of 12 and 16. The proposed method is compared with iGRASP, and schemes using either low-rank or sparsity constraint alone. Numerical results show that the proposed method enables accelerated 4D-MRI with improved image quality and reduced artifacts.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122374701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9190895
Sheikh Tania, M. Murshed, S. Teng, G. Karmakar
Texture is an indispensable property to develop many vision based autonomous applications. Compared to colour, feature dimension in a local texture descriptor is quite large as dense texture features need to represent the distribution of pixel intensities in the neighbourhood of each pixel. Large dimensional features require additional time for further processing that often restrict real-time applications. In this paper, a robust local texture descriptor is enhanced by reducing feature dimension by three folds without compromising the accuracy in region-based image segmentation applications. Reduction in feature dimension is achieved by exploiting the mean of neighbourhood pixel intensities radially along lines across a certain radius, which eliminates the need for sampling intensity distribution at three scales. Both the results of benchmark metrics and computational time are promising when the enhanced texture feature is used in a region-based hierarchical segmentation algorithm, a recent state-of-the-art technique.
{"title":"An Enhanced Local Texture Descriptor for Image Segmentation","authors":"Sheikh Tania, M. Murshed, S. Teng, G. Karmakar","doi":"10.1109/ICIP40778.2020.9190895","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190895","url":null,"abstract":"Texture is an indispensable property to develop many vision based autonomous applications. Compared to colour, feature dimension in a local texture descriptor is quite large as dense texture features need to represent the distribution of pixel intensities in the neighbourhood of each pixel. Large dimensional features require additional time for further processing that often restrict real-time applications. In this paper, a robust local texture descriptor is enhanced by reducing feature dimension by three folds without compromising the accuracy in region-based image segmentation applications. Reduction in feature dimension is achieved by exploiting the mean of neighbourhood pixel intensities radially along lines across a certain radius, which eliminates the need for sampling intensity distribution at three scales. Both the results of benchmark metrics and computational time are promising when the enhanced texture feature is used in a region-based hierarchical segmentation algorithm, a recent state-of-the-art technique.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128904178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9190957
Goutam Yelluru Gopal, Maria A. Amer
Correlation Filter (CF) based trackers have been the frontiers on various object tracking benchmarks. Use of multiple features and sophisticated learning methods have increased the accuracy of tracking results. However, the contribution of features are often fixed throughout the video sequence. Unreliable features lead to erroneous target localization and result in tracking failures. To alleviate this problem, we propose a method for online adaptation of feature weights based on their reliability. Our method also includes the notion of temporal consistency, to handle noisy reliability estimates. The two objectives are coupled to model a convex optimization problem for robust learning of feature weights. We also propose an algorithm to efficiently solve the resulting optimization problem, without hindering tracking speed. Results on VOT2018, TC128 and NfS30 datasets show that proposed method improves the performance of baseline CF trackers.
{"title":"Reliable Temporally Consistent Feature Adaptation for Visual Object Tracking","authors":"Goutam Yelluru Gopal, Maria A. Amer","doi":"10.1109/ICIP40778.2020.9190957","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190957","url":null,"abstract":"Correlation Filter (CF) based trackers have been the frontiers on various object tracking benchmarks. Use of multiple features and sophisticated learning methods have increased the accuracy of tracking results. However, the contribution of features are often fixed throughout the video sequence. Unreliable features lead to erroneous target localization and result in tracking failures. To alleviate this problem, we propose a method for online adaptation of feature weights based on their reliability. Our method also includes the notion of temporal consistency, to handle noisy reliability estimates. The two objectives are coupled to model a convex optimization problem for robust learning of feature weights. We also propose an algorithm to efficiently solve the resulting optimization problem, without hindering tracking speed. Results on VOT2018, TC128 and NfS30 datasets show that proposed method improves the performance of baseline CF trackers.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130584297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9191301
P. Das, N. Horst, M. Wien
This paper presents a transform coding technique for non-rectangular 2-D signals by extending the signal into a rectangular block in order to enable conventional block-based transform coding. The technique could be suitable for coding residuals of prediction blocks using geometric partitioning which has been adopted into the draft Versatile Video Coding standard. The extension of the non-rectangular signal is found using a sparse solution set generated by applying Orthogonal Matching Pursuits using partitioned transform bases. The method developed in this paper is based on Discrete Cosine Transform. Results achieved in an experimental setup outside of the video coding loop are presented for signals of triangular and trapezoidal shape in comparison to the shape-adaptive DCT. Encouraging gains are observed specifically for larger block sizes and in dependency of the quantization parameter and the partitioning shape.
{"title":"Coding Of Non-Rectangular Signals With Block-Based Transforms","authors":"P. Das, N. Horst, M. Wien","doi":"10.1109/ICIP40778.2020.9191301","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191301","url":null,"abstract":"This paper presents a transform coding technique for non-rectangular 2-D signals by extending the signal into a rectangular block in order to enable conventional block-based transform coding. The technique could be suitable for coding residuals of prediction blocks using geometric partitioning which has been adopted into the draft Versatile Video Coding standard. The extension of the non-rectangular signal is found using a sparse solution set generated by applying Orthogonal Matching Pursuits using partitioned transform bases. The method developed in this paper is based on Discrete Cosine Transform. Results achieved in an experimental setup outside of the video coding loop are presented for signals of triangular and trapezoidal shape in comparison to the shape-adaptive DCT. Encouraging gains are observed specifically for larger block sizes and in dependency of the quantization parameter and the partitioning shape.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130587151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ICIP40778.2020.9190767
H. Kim, Sunghun Joung, Ig-Jae Kim, K. Sohn
Dense object detectors that are applied over a regular, dense grid have advanced and drawn their attention in recent days. Their fully convolutional nature greatly advances the computational efficiency of object detectors compared to the two-stage detectors. However, the lack of the ability to adjust shape variation on a regular grid is still limited. In this paper we introduce a new framework, shape-adaptive kernel network, to handle spatial manipulation of input data in convolutional kernel space. At the heart of out approach is to align the original kernel space recovering shape variation of each input feature on regular grid. To this end, we propose a shape-adaptive kernel sampler to adjust dynamic convolutional kernel conditioned on input. To increase the flexibility of geometric transformation, a cascade refinement module is designed, which first estimates the global transformation grid and then estimates local offset in convolutional kernel space. Our experiments demonstrate the effectiveness of the shape-adaptive kernel network for dense object detection on various benchmarks.
{"title":"Shape-Adaptive Kernel Network for Dense Object Detection","authors":"H. Kim, Sunghun Joung, Ig-Jae Kim, K. Sohn","doi":"10.1109/ICIP40778.2020.9190767","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190767","url":null,"abstract":"Dense object detectors that are applied over a regular, dense grid have advanced and drawn their attention in recent days. Their fully convolutional nature greatly advances the computational efficiency of object detectors compared to the two-stage detectors. However, the lack of the ability to adjust shape variation on a regular grid is still limited. In this paper we introduce a new framework, shape-adaptive kernel network, to handle spatial manipulation of input data in convolutional kernel space. At the heart of out approach is to align the original kernel space recovering shape variation of each input feature on regular grid. To this end, we propose a shape-adaptive kernel sampler to adjust dynamic convolutional kernel conditioned on input. To increase the flexibility of geometric transformation, a cascade refinement module is designed, which first estimates the global transformation grid and then estimates local offset in convolutional kernel space. Our experiments demonstrate the effectiveness of the shape-adaptive kernel network for dense object detection on various benchmarks.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130791269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}