Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506311
Vinit Veerendraveer Singh, Shivanand Venkanna Sheshappanavar, C. Kambhamettu
Unlike images, meshes are irregular and unstructured. Thus, it is not trivial to extend existing image-based deep learning approaches for mesh analysis. In this paper, inspired by dilated convolutions for images, we proffer dilated convolutions for meshes. Our Dilated Mesh Convolution (DMC) unit inflates the kernels’ receptive field without increasing the number of learnable parameters. We also propose a Stacked Dilated Mesh Convolution (SDMC) block by stacking DMC units. It considers spatial regions around mesh faces’ at multiple scales while summarizing the neighboring contextual information. We accommodated SDMC in MeshNet to classify 3D meshes. Experimental results demonstrate that this redesigned model significantly improves classification accuracy on multiple data sets. Code is available at https://github.com/VimsLab/DMC.
{"title":"Mesh Classification With Dilated Mesh Convolutions","authors":"Vinit Veerendraveer Singh, Shivanand Venkanna Sheshappanavar, C. Kambhamettu","doi":"10.1109/ICIP42928.2021.9506311","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506311","url":null,"abstract":"Unlike images, meshes are irregular and unstructured. Thus, it is not trivial to extend existing image-based deep learning approaches for mesh analysis. In this paper, inspired by dilated convolutions for images, we proffer dilated convolutions for meshes. Our Dilated Mesh Convolution (DMC) unit inflates the kernels’ receptive field without increasing the number of learnable parameters. We also propose a Stacked Dilated Mesh Convolution (SDMC) block by stacking DMC units. It considers spatial regions around mesh faces’ at multiple scales while summarizing the neighboring contextual information. We accommodated SDMC in MeshNet to classify 3D meshes. Experimental results demonstrate that this redesigned model significantly improves classification accuracy on multiple data sets. Code is available at https://github.com/VimsLab/DMC.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130294096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cross-domain object detection is a very challenging task due to multi-level domain shift in an unseen domain. To address the problem, this paper proposes a hierarchical domain-consistent network (HDCN) for cross-domain object detection, which effectively suppresses pixel-level, image-level, as well as instance-level domain shift via jointly aligning three-level features. Firstly, at the pixel-level feature alignment stage, a pixel-level subnet with foreground-aware attention learning and pixel-level adversarial learning is proposed to focus on local foreground transferable information. Then, at the image-level feature alignment stage, global domain-invariant features are learned from the whole image through image-level adversarial learning. Finally, at the instance-level alignment stage, a prototype graph convolution network is conducted to guarantee distribution alignment of instances by minimizing the distance of prototypes with the same category but from different domains. Moreover, to avoid the non-convergence problem during multi-level feature alignment, a domain-consistent loss is proposed to harmonize the adaptation training process. Comprehensive results on various cross-domain detection tasks demonstrate the broad applicability and effectiveness of the proposed approach.
{"title":"Hierarchical Domain-Consistent Network For Cross-Domain Object Detection","authors":"Yuanyuan Liu, Ziyang Liu, Fang Fang, Zhanghua Fu, Zhanlong Chen","doi":"10.1109/ICIP42928.2021.9506743","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506743","url":null,"abstract":"Cross-domain object detection is a very challenging task due to multi-level domain shift in an unseen domain. To address the problem, this paper proposes a hierarchical domain-consistent network (HDCN) for cross-domain object detection, which effectively suppresses pixel-level, image-level, as well as instance-level domain shift via jointly aligning three-level features. Firstly, at the pixel-level feature alignment stage, a pixel-level subnet with foreground-aware attention learning and pixel-level adversarial learning is proposed to focus on local foreground transferable information. Then, at the image-level feature alignment stage, global domain-invariant features are learned from the whole image through image-level adversarial learning. Finally, at the instance-level alignment stage, a prototype graph convolution network is conducted to guarantee distribution alignment of instances by minimizing the distance of prototypes with the same category but from different domains. Moreover, to avoid the non-convergence problem during multi-level feature alignment, a domain-consistent loss is proposed to harmonize the adaptation training process. Comprehensive results on various cross-domain detection tasks demonstrate the broad applicability and effectiveness of the proposed approach.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126664220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506508
Kenan E. Ak, Ying Sun, Joo-Hwee Lim
In this paper, we focus on the problem of video prediction, i.e., future frame prediction. Most state-of-the-art techniques focus on synthesizing a single future frame at each step. However, this leads to utilizing the model’s own predicted frames when synthesizing multi-step prediction, resulting in gradual performance degradation due to accumulating errors in pixels. To alleviate this issue, we propose a model that can handle multi-step prediction. Additionally, we employ techniques to leverage from view synthesis for future frame prediction, where both problems are treated independently in the literature. Our proposed method employs multiview camera pose prediction and depth-prediction networks to project the last available frame to desired future frames via differentiable point cloud renderer. For the synthesis of moving objects, we utilize an additional refinement stage. In experiments, we show that the proposed framework outperforms state-of-theart methods in both KITTI and Cityscapes datasets.
{"title":"Robust Multi-Frame Future Prediction By Leveraging View Synthesis","authors":"Kenan E. Ak, Ying Sun, Joo-Hwee Lim","doi":"10.1109/ICIP42928.2021.9506508","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506508","url":null,"abstract":"In this paper, we focus on the problem of video prediction, i.e., future frame prediction. Most state-of-the-art techniques focus on synthesizing a single future frame at each step. However, this leads to utilizing the model’s own predicted frames when synthesizing multi-step prediction, resulting in gradual performance degradation due to accumulating errors in pixels. To alleviate this issue, we propose a model that can handle multi-step prediction. Additionally, we employ techniques to leverage from view synthesis for future frame prediction, where both problems are treated independently in the literature. Our proposed method employs multiview camera pose prediction and depth-prediction networks to project the last available frame to desired future frames via differentiable point cloud renderer. For the synthesis of moving objects, we utilize an additional refinement stage. In experiments, we show that the proposed framework outperforms state-of-theart methods in both KITTI and Cityscapes datasets.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124030093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506381
Changlei Lu, B. Liu, Wenbo Zhou, Qi Chu, Nenghai Yu
The current spike of deepfake techniques has received considerable attention due to security concerns. To mitigate the potential risks brought by deepfake techniques, many detection methods have been proposed. However, most existing works merely leverage spatial information from separate frames and ignore valuable inter-frame temporal information. In this paper, we propose a deepfake detection scheme that uses 3D-attentional inception network. The proposed model encompasses both spatial and temporal information simultaneously with the 3D kernels. Furthermore, the channel and spatial-temporal attention modules are applied to improve detection capabilities. Comprehensive experiments demonstrate that our scheme outperforms state-of-the-art methods.
{"title":"Deepfake Video Detection Using 3D-Attentional Inception Convolutional Neural Network","authors":"Changlei Lu, B. Liu, Wenbo Zhou, Qi Chu, Nenghai Yu","doi":"10.1109/ICIP42928.2021.9506381","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506381","url":null,"abstract":"The current spike of deepfake techniques has received considerable attention due to security concerns. To mitigate the potential risks brought by deepfake techniques, many detection methods have been proposed. However, most existing works merely leverage spatial information from separate frames and ignore valuable inter-frame temporal information. In this paper, we propose a deepfake detection scheme that uses 3D-attentional inception network. The proposed model encompasses both spatial and temporal information simultaneously with the 3D kernels. Furthermore, the channel and spatial-temporal attention modules are applied to improve detection capabilities. Comprehensive experiments demonstrate that our scheme outperforms state-of-the-art methods.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121163270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506569
Arman Zharmagambetov, Magzhan Gabidolla, M. A. Carreira-Perpiñán
Decision tree boosting is considered as an important and widely recognized method in image classification, despite dominance of the deep learning based approaches in this area. Provided with good image features, it can produce a powerful model with unique properties, such as strong predictive power, scalability, interpretability, etc. In this paper, we propose a novel tree boosting framework which capitalizes on the idea of using shallow, sparse and yet powerful oblique decision trees (trained with recently proposed Tree Alternating optimization algorithm) as the base learners. We empirically show that the resulting model achieves better or comparable performance (both in terms of accuracy and model size) against established boosting algorithms such as gradient boosting or AdaBoost in number of benchmarks. Further, we show that such trees can directly and efficiently handle multiclass problems without using one-vs-all strategy employed by most of the practical boosting implementations.
{"title":"Improved Multiclass Adaboost For Image Classification: The Role Of Tree Optimization","authors":"Arman Zharmagambetov, Magzhan Gabidolla, M. A. Carreira-Perpiñán","doi":"10.1109/ICIP42928.2021.9506569","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506569","url":null,"abstract":"Decision tree boosting is considered as an important and widely recognized method in image classification, despite dominance of the deep learning based approaches in this area. Provided with good image features, it can produce a powerful model with unique properties, such as strong predictive power, scalability, interpretability, etc. In this paper, we propose a novel tree boosting framework which capitalizes on the idea of using shallow, sparse and yet powerful oblique decision trees (trained with recently proposed Tree Alternating optimization algorithm) as the base learners. We empirically show that the resulting model achieves better or comparable performance (both in terms of accuracy and model size) against established boosting algorithms such as gradient boosting or AdaBoost in number of benchmarks. Further, we show that such trees can directly and efficiently handle multiclass problems without using one-vs-all strategy employed by most of the practical boosting implementations.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114236708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506066
Mark A Wedekind, Eric Oertel, Susana Castillo, M. Magnor
Computed Tomography is increasingly employed for non-destructive evaluation, with the aim of reconstructing a surface mesh of a scanned object from radiographic projections. State-of-the-art algorithms first reconstruct a voxel grid and then extract a surface mesh using existing meshing algorithms, often leading to stair-like aliasing artifacts along the grid axes, due to the grid’s orientation-dependent resolution. We circumvent such artifacts in filtered backprojection reconstructions by optimizing the mesh’s vertex positions using information taken directly from the projections, rather than from a voxel grid. We show that our approach reduces stair artifacts both visibly and measurably, at relatively little additional computational cost. Our method can be tied into existing mesh extraction algorithms and removes stair artifacts almost entirely.
{"title":"Reducing Stair Artifacts in CT Reconstruction","authors":"Mark A Wedekind, Eric Oertel, Susana Castillo, M. Magnor","doi":"10.1109/ICIP42928.2021.9506066","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506066","url":null,"abstract":"Computed Tomography is increasingly employed for non-destructive evaluation, with the aim of reconstructing a surface mesh of a scanned object from radiographic projections. State-of-the-art algorithms first reconstruct a voxel grid and then extract a surface mesh using existing meshing algorithms, often leading to stair-like aliasing artifacts along the grid axes, due to the grid’s orientation-dependent resolution. We circumvent such artifacts in filtered backprojection reconstructions by optimizing the mesh’s vertex positions using information taken directly from the projections, rather than from a voxel grid. We show that our approach reduces stair artifacts both visibly and measurably, at relatively little additional computational cost. Our method can be tied into existing mesh extraction algorithms and removes stair artifacts almost entirely.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114239815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
optimizing spectral filters for hyperspectral reconstruction has received increasing attentions recently. However, current filter selection methods suffer from extremely high computational complexity due to exhaustive optimization. In this paper, in order to reduce the computational complexity, we propose a novel Filter Selection Network (FS-Net) to select filters and learn the reconstruction network simultaneously. Specifically, we propose an end-to-end method to embed filter selection in FS-Net by setting spectral response functions as the input layer. Furthermore, we propose a non-negative Ll sparse regularization (NN-LI) to select optical filters automatically by sparsifying the input layer. Besides, we develop a two-stage training strategy for adjusting the number of selected filters. Experiments on public datasets show that our proposed method can considerably improve the reconstruction quality.
{"title":"Fs-Net: Filter Selection Network For Hyperspectral Reconstruction","authors":"Liutao Yang, Zhongnian Li, Zongxiang Pei, Daoqiang Zhang","doi":"10.1109/ICIP42928.2021.9506576","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506576","url":null,"abstract":"optimizing spectral filters for hyperspectral reconstruction has received increasing attentions recently. However, current filter selection methods suffer from extremely high computational complexity due to exhaustive optimization. In this paper, in order to reduce the computational complexity, we propose a novel Filter Selection Network (FS-Net) to select filters and learn the reconstruction network simultaneously. Specifically, we propose an end-to-end method to embed filter selection in FS-Net by setting spectral response functions as the input layer. Furthermore, we propose a non-negative Ll sparse regularization (NN-LI) to select optical filters automatically by sparsifying the input layer. Besides, we develop a two-stage training strategy for adjusting the number of selected filters. Experiments on public datasets show that our proposed method can considerably improve the reconstruction quality.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114329415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506395
Xiaogang Yang, C. Schroer
In this article, we introduce three different strategies of tomographic reconstruction based on deep learning. These algorithms are model-based learning for iterative optimization. We discuss the basic principles of developing these algorithms. The performance of them is analyzed and evaluated both on theory and simulation reconstruction. We developed open-source software to run these algorithms in the same framework. From the simulation results, all these deep learning algorithms showed improvements in reconstruction quality and accuracy where the strategy based on Generative Adversarial Networks showed the advantage especially.
{"title":"Strategies of Deep Learning for Tomographic Reconstruction","authors":"Xiaogang Yang, C. Schroer","doi":"10.1109/ICIP42928.2021.9506395","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506395","url":null,"abstract":"In this article, we introduce three different strategies of tomographic reconstruction based on deep learning. These algorithms are model-based learning for iterative optimization. We discuss the basic principles of developing these algorithms. The performance of them is analyzed and evaluated both on theory and simulation reconstruction. We developed open-source software to run these algorithms in the same framework. From the simulation results, all these deep learning algorithms showed improvements in reconstruction quality and accuracy where the strategy based on Generative Adversarial Networks showed the advantage especially.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116265870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506638
Xiao Chen, Tomoya Nakamura, Xiuxi Pan, Kazuyuki Tajima, K. Yamaguchi, T. Shimano, M. Yamaguchi
Fresnel zone aperture (FZA) lens-less camera is a class of computational imaging systems that employs an FZA as a coded mask instead of an optical lens. FZA lens-less camera can perform fast deconvolution reconstruction and realize the re-focusing function. However, the reconstructed image’s spatial resolution is restricted by diffraction when using the conventional method derived from the geometrical optics model. In a previous study, we quantitatively analyzed the diffraction propagation between mask and sensor. Then we proposed a color-channel synthesis reconstruction method based on wave-optics theory. This study proposed a novel image reconstruction method without distorting the color information, comprehensively synthesizing two images captured with different mask-sensor distances to mitigate the diffraction influence and improve the image resolution. The numerical simulation and optical experiment results confirm that the proposed method can improve the spatial resolution to about two times that of the conventional method based on the geometrical optics model.
{"title":"Resolution Improvement In FZA Lens-Less Camera By Synthesizing Images Captured With Different Mask-Sensor Distances","authors":"Xiao Chen, Tomoya Nakamura, Xiuxi Pan, Kazuyuki Tajima, K. Yamaguchi, T. Shimano, M. Yamaguchi","doi":"10.1109/ICIP42928.2021.9506638","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506638","url":null,"abstract":"Fresnel zone aperture (FZA) lens-less camera is a class of computational imaging systems that employs an FZA as a coded mask instead of an optical lens. FZA lens-less camera can perform fast deconvolution reconstruction and realize the re-focusing function. However, the reconstructed image’s spatial resolution is restricted by diffraction when using the conventional method derived from the geometrical optics model. In a previous study, we quantitatively analyzed the diffraction propagation between mask and sensor. Then we proposed a color-channel synthesis reconstruction method based on wave-optics theory. This study proposed a novel image reconstruction method without distorting the color information, comprehensively synthesizing two images captured with different mask-sensor distances to mitigate the diffraction influence and improve the image resolution. The numerical simulation and optical experiment results confirm that the proposed method can improve the spatial resolution to about two times that of the conventional method based on the geometrical optics model.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"31 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116316627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506456
Zhenduo Chen, Feng Liu, Zhenglai Zhao
Face Attributes Classification (FAC) is an important task in computer vision, aiming to predict the facial attributes of a given image. However, the value of mid-level feature information and the correlation between face attributes are always ignored by deep learning-based FAC methods. In order to solve these problems, we propose a novel and effective Multi-task CNN architecture. Instead of predicting all 40 attributes together, an attribute grouping strategy is proposed to divide the 40 attributes into 8 task groups correlatively. Meanwhile, through the Fusion Layer, mid-level deep representations are fused into the original feature representations to jointly predict the face attributes. Furthermore, the Task-unique Attention Modules can help learn more task-specific feature representations, obtaining higher FAC accuracy. Extensive experiments on the CelebA dataset demonstrate that our method outperforms state-of-the-art FAC methods.
{"title":"Let Them Choose What They Want: A Multi-Task CNN Architecture Leveraging Mid-Level Deep Representations for Face Attribute Classification","authors":"Zhenduo Chen, Feng Liu, Zhenglai Zhao","doi":"10.1109/ICIP42928.2021.9506456","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506456","url":null,"abstract":"Face Attributes Classification (FAC) is an important task in computer vision, aiming to predict the facial attributes of a given image. However, the value of mid-level feature information and the correlation between face attributes are always ignored by deep learning-based FAC methods. In order to solve these problems, we propose a novel and effective Multi-task CNN architecture. Instead of predicting all 40 attributes together, an attribute grouping strategy is proposed to divide the 40 attributes into 8 task groups correlatively. Meanwhile, through the Fusion Layer, mid-level deep representations are fused into the original feature representations to jointly predict the face attributes. Furthermore, the Task-unique Attention Modules can help learn more task-specific feature representations, obtaining higher FAC accuracy. Extensive experiments on the CelebA dataset demonstrate that our method outperforms state-of-the-art FAC methods.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116334359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}