Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412426
Zichun Weng, Youjun Xiang, Xianfeng Li, Juntao Liang, W. Huo, Yuli Fu
We propose a novel joint framework for 3D face reconstruction (3DFR) that integrates facial attribute estimation (FAE) as an auxiliary task. One of the essential problems of 3DFR is to extract semantic facial features (e.g., Big Nose, High Cheekbones, and Asian) from in-the-wild 2D images, which is inherently involved with FAE. These two tasks, though heterogeneous, are highly relevant to each other. To achieve this, we leverage a Convolutional Neural Network to extract shared facial representations for both shape decoder and attribute classifier. We further develop an in-batch hybrid-task training scheme that enables our model to learn from heterogeneous facial datasets jointly within a mini-batch. Thanks to the joint loss that provides supervision from both 3DFR and FAE domains, our model learns the correlations between 3D shapes and facial attributes, which benefit both feature extraction and shape inference. Quantitative evaluation and qualitative visualization results confirm the effectiveness and robustness of our joint framework.
{"title":"Learning Semantic Representations via Joint 3D Face Reconstruction and Facial Attribute Estimation","authors":"Zichun Weng, Youjun Xiang, Xianfeng Li, Juntao Liang, W. Huo, Yuli Fu","doi":"10.1109/ICPR48806.2021.9412426","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412426","url":null,"abstract":"We propose a novel joint framework for 3D face reconstruction (3DFR) that integrates facial attribute estimation (FAE) as an auxiliary task. One of the essential problems of 3DFR is to extract semantic facial features (e.g., Big Nose, High Cheekbones, and Asian) from in-the-wild 2D images, which is inherently involved with FAE. These two tasks, though heterogeneous, are highly relevant to each other. To achieve this, we leverage a Convolutional Neural Network to extract shared facial representations for both shape decoder and attribute classifier. We further develop an in-batch hybrid-task training scheme that enables our model to learn from heterogeneous facial datasets jointly within a mini-batch. Thanks to the joint loss that provides supervision from both 3DFR and FAE domains, our model learns the correlations between 3D shapes and facial attributes, which benefit both feature extraction and shape inference. Quantitative evaluation and qualitative visualization results confirm the effectiveness and robustness of our joint framework.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"41 1","pages":"9696-9702"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73532162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412587
Efthimia Kafali, N. Vretos, T. Semertzidis, P. Daras
Convolutional Neural Networks (CNNs) have recently been introduced for addressing copy-move forgery detection (CMFD). However, current CMFD CNN-based approaches have insufficient performance commitment regarding the localization of the positive class. In this paper, this issue is explored by considering both linear and nonlinear interactions between pixels. A nonlinear Inception module based on second-order Volterra kernels is proposed, in order to ameliorate the results of a state-of-the-art CMFD architecture. The outcome of this work shows that a combination of linear and nonlinear convolution kernels can make the input foreground and background pixels more separable. The proposed approach is evaluated on CASIA and CoMoFoD, two publicly available CMFD datasets, and results to an improved positive class localization performance. Moreover, the findings of the proposed method imply that the nonlinear Inception module stimulates immense robustness against miscellaneous post processing attacks.
{"title":"RobusterNet: Improving Copy-Move Forgery Detection with Volterra-based Convolutions","authors":"Efthimia Kafali, N. Vretos, T. Semertzidis, P. Daras","doi":"10.1109/ICPR48806.2021.9412587","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412587","url":null,"abstract":"Convolutional Neural Networks (CNNs) have recently been introduced for addressing copy-move forgery detection (CMFD). However, current CMFD CNN-based approaches have insufficient performance commitment regarding the localization of the positive class. In this paper, this issue is explored by considering both linear and nonlinear interactions between pixels. A nonlinear Inception module based on second-order Volterra kernels is proposed, in order to ameliorate the results of a state-of-the-art CMFD architecture. The outcome of this work shows that a combination of linear and nonlinear convolution kernels can make the input foreground and background pixels more separable. The proposed approach is evaluated on CASIA and CoMoFoD, two publicly available CMFD datasets, and results to an improved positive class localization performance. Moreover, the findings of the proposed method imply that the nonlinear Inception module stimulates immense robustness against miscellaneous post processing attacks.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"34 1","pages":"1160-1165"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73561109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412380
Shuquan Ye, Chu Han, Jiaying Lin, Guoqiang Han, Shengfeng He
Face synthesis is an interesting yet challenging task in computer vision. It is even much harder to generate a portrait video than a single image. In this paper, we propose a novel video generation framework for synthesizing arbitrary-length face videos without any face exemplar or landmark. To overcome the synthesis ambiguity of face video, we propose a divide-and-conquer strategy to separately address the video face synthesis problem from two aspects, face identity synthesis and rearrangement. To this end, we design a cascaded network which contains three components, Identity-aware GAN (IA-GAN), Face Coherence Network, and Interpolation Network. IA-GAN is proposed to synthesize photorealistic faces with the same identity from a set of noises. Face Coherence Network is designed to re-arrange the faces generated by IA-GAN while keeping the inter-frame coherence. Interpolation Network is introduced to eliminate the discontinuity between two adjacent frames and improve the smoothness of the face video. Experimental results demonstrate that our proposed network is able to generate face video with high visual quality while preserving the identity. Statistics show that our method outperforms state-of-the-art unconditional face video generative models in multiple challenging datasets.
{"title":"Coherence and Identity Learning for Arbitrary-length Face Video Generation","authors":"Shuquan Ye, Chu Han, Jiaying Lin, Guoqiang Han, Shengfeng He","doi":"10.1109/ICPR48806.2021.9412380","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412380","url":null,"abstract":"Face synthesis is an interesting yet challenging task in computer vision. It is even much harder to generate a portrait video than a single image. In this paper, we propose a novel video generation framework for synthesizing arbitrary-length face videos without any face exemplar or landmark. To overcome the synthesis ambiguity of face video, we propose a divide-and-conquer strategy to separately address the video face synthesis problem from two aspects, face identity synthesis and rearrangement. To this end, we design a cascaded network which contains three components, Identity-aware GAN (IA-GAN), Face Coherence Network, and Interpolation Network. IA-GAN is proposed to synthesize photorealistic faces with the same identity from a set of noises. Face Coherence Network is designed to re-arrange the faces generated by IA-GAN while keeping the inter-frame coherence. Interpolation Network is introduced to eliminate the discontinuity between two adjacent frames and improve the smoothness of the face video. Experimental results demonstrate that our proposed network is able to generate face video with high visual quality while preserving the identity. Statistics show that our method outperforms state-of-the-art unconditional face video generative models in multiple challenging datasets.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"31 1","pages":"915-922"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73736795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412048
Raghavendra Ramachandra, S. Venkatesh, K. B. Raja, C. Busch
Wrist-wearable devices such as smartwatch hardware have gained popularity as they provide quick access to various information and easy access to multiple applications. Among the numerous smartwatch applications, user verification based on the handwriting is gaining momentum by considering its reliability and user-friendliness. In this paper, we present a novel technique for user verification using a smartwatch based writing pattern or style. The proposed approach leverages accelerometer data captured from the smartwatch that is further represented using 2D Continuous Wavelet Transform (CWT) and deep features extracted using the pre-trained ResNet50. These features are classified using an ensemble of classifiers to make the final decision on user verification. Extensive experiments are carried out on a newly captured dataset using two different smartwatches with three different writing scenarios (or activities). Experimental results provide critical insights and analysis of the results in such a verification scenario.
{"title":"Handwritten Signature and Text based User Verification using Smartwatch","authors":"Raghavendra Ramachandra, S. Venkatesh, K. B. Raja, C. Busch","doi":"10.1109/ICPR48806.2021.9412048","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412048","url":null,"abstract":"Wrist-wearable devices such as smartwatch hardware have gained popularity as they provide quick access to various information and easy access to multiple applications. Among the numerous smartwatch applications, user verification based on the handwriting is gaining momentum by considering its reliability and user-friendliness. In this paper, we present a novel technique for user verification using a smartwatch based writing pattern or style. The proposed approach leverages accelerometer data captured from the smartwatch that is further represented using 2D Continuous Wavelet Transform (CWT) and deep features extracted using the pre-trained ResNet50. These features are classified using an ensemble of classifiers to make the final decision on user verification. Extensive experiments are carried out on a newly captured dataset using two different smartwatches with three different writing scenarios (or activities). Experimental results provide critical insights and analysis of the results in such a verification scenario.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"19 1","pages":"5099-5106"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73757112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412865
Martin Becker, J. Lippel, Thomas Zielke
This paper has three intertwined goals. The first is to introduce a new similarity measure for scatter plots. It uses Delaunay triangulations to compare two scatter plots regarding their relative positioning of clusters. The second is to apply this measure for the robustness assessment of a recent deep neural network (DNN) approach to dimensionality reduction (DR) for data visualization. It uses a nonlinear generalization of Fisher's linear discriminant analysis (LDA) as the encoder network of a deep autoencoder (DAE). The DAE's decoder network acts as a regularizer. The third goal is to look at different variants of the DNN: ones that promise robustness and ones that promise high classification accuracies. This is to study the trade-off between these two objectives – our results support the recent claim that robustness may be at odds with accuracy; however, results that are balanced regarding both objectives are achievable. We see a restricted Boltzmann machine (RBM) pretraining and the DAE based regularization as important building blocks for achieving balanced results. As a means of assessing the robustness of DR methods, we propose a measure that is based on our similarity measure for scatter plots. The robustness measure comes with a superimposition view of Delaunay triangulations that enables a fast comparison of results from multiple DR methods.
{"title":"Dimensionality Reduction for Data Visualization and Linear Classification, and the Trade-off between Robustness and Classification Accuracy","authors":"Martin Becker, J. Lippel, Thomas Zielke","doi":"10.1109/ICPR48806.2021.9412865","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412865","url":null,"abstract":"This paper has three intertwined goals. The first is to introduce a new similarity measure for scatter plots. It uses Delaunay triangulations to compare two scatter plots regarding their relative positioning of clusters. The second is to apply this measure for the robustness assessment of a recent deep neural network (DNN) approach to dimensionality reduction (DR) for data visualization. It uses a nonlinear generalization of Fisher's linear discriminant analysis (LDA) as the encoder network of a deep autoencoder (DAE). The DAE's decoder network acts as a regularizer. The third goal is to look at different variants of the DNN: ones that promise robustness and ones that promise high classification accuracies. This is to study the trade-off between these two objectives – our results support the recent claim that robustness may be at odds with accuracy; however, results that are balanced regarding both objectives are achievable. We see a restricted Boltzmann machine (RBM) pretraining and the DAE based regularization as important building blocks for achieving balanced results. As a means of assessing the robustness of DR methods, we propose a measure that is based on our similarity measure for scatter plots. The robustness measure comes with a superimposition view of Delaunay triangulations that enables a fast comparison of results from multiple DR methods.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"9 1","pages":"6478-6485"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74271567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412823
Armin Mehri, P. B. Ardakani, A. Sappa
This paper proposes a new lightweight network, LiNet, that enhancing technical efficiency in lightweight super resolution and operating approximately like very large and costly networks in terms of number of network parameters and operations. The proposed architecture allows the network to learn more abstract properties by avoiding low-level information via multiple links. LiNet introduces a Compact Dense Module, which contains set of inner and outer blocks, to efficiently extract meaningful information, to better leverage multi-level representations before upsampling stage, and to allow an efficient information and gradient flow within the network. Experiments on benchmark datasets show that the proposed LiNet achieves favorable performance against lightweight state-of-the-art methods.
{"title":"LiNet: A Lightweight Network for Image Super Resolution","authors":"Armin Mehri, P. B. Ardakani, A. Sappa","doi":"10.1109/ICPR48806.2021.9412823","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412823","url":null,"abstract":"This paper proposes a new lightweight network, LiNet, that enhancing technical efficiency in lightweight super resolution and operating approximately like very large and costly networks in terms of number of network parameters and operations. The proposed architecture allows the network to learn more abstract properties by avoiding low-level information via multiple links. LiNet introduces a Compact Dense Module, which contains set of inner and outer blocks, to efficiently extract meaningful information, to better leverage multi-level representations before upsampling stage, and to allow an efficient information and gradient flow within the network. Experiments on benchmark datasets show that the proposed LiNet achieves favorable performance against lightweight state-of-the-art methods.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"39 1","pages":"7196-7202"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74320347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412843
Jukka Peltomäki, Xingyang Ni, Jussi Puura, J. Kämäräinen, H. Huttunen
In this work, loop-closure detection from LiDAR scans is defined as an image re-identification problem. Reidentification is performed by computing Euclidean distances of a query scan to a gallery set of previous scans. The distances are computed in a feature embedding space where the scans are mapped by a convolutional neural network (CNN). The network is trained using the triplet loss training strategy. In our experiments we compare different backbone networks, variants of the triplet loss and generic and LiDAR specific data augmentation techniques. With a realistic indoor dataset the best architecture obtains the mean average precision (mAP) above 0.94.
{"title":"Loop-closure detection by LiDAR scan re-identification","authors":"Jukka Peltomäki, Xingyang Ni, Jussi Puura, J. Kämäräinen, H. Huttunen","doi":"10.1109/ICPR48806.2021.9412843","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412843","url":null,"abstract":"In this work, loop-closure detection from LiDAR scans is defined as an image re-identification problem. Reidentification is performed by computing Euclidean distances of a query scan to a gallery set of previous scans. The distances are computed in a feature embedding space where the scans are mapped by a convolutional neural network (CNN). The network is trained using the triplet loss training strategy. In our experiments we compare different backbone networks, variants of the triplet loss and generic and LiDAR specific data augmentation techniques. With a realistic indoor dataset the best architecture obtains the mean average precision (mAP) above 0.94.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"134 1","pages":"9107-9114"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75288174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412752
Fei Wang, Youdong Ding, Huan Liang, Yuzhen Gao, W. Che
Unsupervised domain adaptation has achieved significant results by leveraging knowledge from a source domain to learn a related but unlabeled target domain. Previous methods are insufficient to model domain discrepancy and class discrepancy, which may lead to misalignment and poor adaptation performance. To address this problem, in this paper, we propose a unified framework, called distance-aware domain adaptation, which is fully aware of both cross-domain distance and class-discriminative distance. In addition, second-order statistics distance and manifold alignment are also exploited to extract more information from data. In this manner, the generalization error of the target domain in classification problems can be reduced substantially. To validate the proposed method, we conducted experiments on five public datasets and an ablation study. The results demonstrate the good performance of our proposed method.
{"title":"A Unified Framework for Distance-Aware Domain Adaptation","authors":"Fei Wang, Youdong Ding, Huan Liang, Yuzhen Gao, W. Che","doi":"10.1109/ICPR48806.2021.9412752","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412752","url":null,"abstract":"Unsupervised domain adaptation has achieved significant results by leveraging knowledge from a source domain to learn a related but unlabeled target domain. Previous methods are insufficient to model domain discrepancy and class discrepancy, which may lead to misalignment and poor adaptation performance. To address this problem, in this paper, we propose a unified framework, called distance-aware domain adaptation, which is fully aware of both cross-domain distance and class-discriminative distance. In addition, second-order statistics distance and manifold alignment are also exploited to extract more information from data. In this manner, the generalization error of the target domain in classification problems can be reduced substantially. To validate the proposed method, we conducted experiments on five public datasets and an ablation study. The results demonstrate the good performance of our proposed method.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"211 1","pages":"1796-1803"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74436090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412767
Pep Santacruz, F. Serratosa
Differential models for the simulation of the muscle mechanics are based on iteratively updating a mesh grid and deducing its new state through a finite element model. Models usually assume that the mesh grid is almost regular, and this makes a degradation of the simulation accuracy in long simulation sequences, since the mesh tends to be less regular when the number of iterations increases. We present a model that has the aim of reducing this accuracy degradation. It is based on recomputing the mesh grid returned by the model in each iteration through the concept of graph matching. The new model is currently in use to analyse the dynamics of the human heart when some pressure is applied to it. The final goal of the project (which is not shown in this paper) is to deduce the optimal position and strength pressure applied to the heart that increases the chance of reviving it with the minimum tissue damage. Experimental validation shows that our model returns a higher accuracy of the muscle position through some iterations than classical differential models with an insignificant increase of runtime. Thus, it is worth recomputing the mesh grid since the simulation accuracy drastically increases at the expense of a low runtime increase.
{"title":"Incorporating a graph-matching algorithm into a muscle mechanics model","authors":"Pep Santacruz, F. Serratosa","doi":"10.1109/ICPR48806.2021.9412767","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412767","url":null,"abstract":"Differential models for the simulation of the muscle mechanics are based on iteratively updating a mesh grid and deducing its new state through a finite element model. Models usually assume that the mesh grid is almost regular, and this makes a degradation of the simulation accuracy in long simulation sequences, since the mesh tends to be less regular when the number of iterations increases. We present a model that has the aim of reducing this accuracy degradation. It is based on recomputing the mesh grid returned by the model in each iteration through the concept of graph matching. The new model is currently in use to analyse the dynamics of the human heart when some pressure is applied to it. The final goal of the project (which is not shown in this paper) is to deduce the optimal position and strength pressure applied to the heart that increases the chance of reviving it with the minimum tissue damage. Experimental validation shows that our model returns a higher accuracy of the muscle position through some iterations than classical differential models with an insignificant increase of runtime. Thus, it is worth recomputing the mesh grid since the simulation accuracy drastically increases at the expense of a low runtime increase.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"110 1","pages":"39-46"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74675079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9413207
Pranav Mantini, Shishir K. Shah
Image classification is a fundamental task in computer vision. A variety of deep learning models based on the Convolutional Neural Network (CNN) architecture have proven to be an efficient solution. Numerous improvements have been proposed over the years, where broader, deeper, and denser networks have been constructed. However, the atomic operation for these models has remained a linear unit (single neuron). In this work, we pursue an alternative dimension by hypothesizing the atomic operation to be performed by a quadratic unit. We construct convolutional layers using quadratic neurons for feature extraction and subsequently use dense layers for classification. We perform analysis to quantify the implication of replacing linear neurons with quadratic units. Results show a keen improvement in classification accuracy with quadratic neurons over linear neurons.
{"title":"CQNN: Convolutional Quadratic Neural Networks","authors":"Pranav Mantini, Shishir K. Shah","doi":"10.1109/ICPR48806.2021.9413207","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413207","url":null,"abstract":"Image classification is a fundamental task in computer vision. A variety of deep learning models based on the Convolutional Neural Network (CNN) architecture have proven to be an efficient solution. Numerous improvements have been proposed over the years, where broader, deeper, and denser networks have been constructed. However, the atomic operation for these models has remained a linear unit (single neuron). In this work, we pursue an alternative dimension by hypothesizing the atomic operation to be performed by a quadratic unit. We construct convolutional layers using quadratic neurons for feature extraction and subsequently use dense layers for classification. We perform analysis to quantify the implication of replacing linear neurons with quadratic units. Results show a keen improvement in classification accuracy with quadratic neurons over linear neurons.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"12 1","pages":"9819-9826"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74707817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}