Pub Date : 2020-02-01DOI: 10.1109/MVIP49855.2020.9116889
Hanieh Naderi, Leili Goli, S. Kasaei
Convolution Neural Networks (CNNs), despite being one of the most successful image classification methods, are not robust to most geometric transformations (rotation, isotropic scaling) because of their structural constraints. Recently, scale steerable filters have been proposed to allow scale invariance in CNNs. Although these filters enhance the network performance in scaled image classification tasks, they cannot maintain the scale information across the network. In this paper, this problem is addressed. First, a CNN is built with the usage of scale steerable filters. Then, a scale equivariat network is acquired by adding a feature map to each layer so that the scale-related features are retained across the network. At last, by defining the cost function as the cross entropy, this solution is evaluated and the model parameters are updated. The results show that it improves the perfromance about 2% over other comparable methods of scale equivariance and scale invariance, when run on the FMNIST-scale dataset.
{"title":"Scale Equivariant CNNs with Scale Steerable Filters","authors":"Hanieh Naderi, Leili Goli, S. Kasaei","doi":"10.1109/MVIP49855.2020.9116889","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9116889","url":null,"abstract":"Convolution Neural Networks (CNNs), despite being one of the most successful image classification methods, are not robust to most geometric transformations (rotation, isotropic scaling) because of their structural constraints. Recently, scale steerable filters have been proposed to allow scale invariance in CNNs. Although these filters enhance the network performance in scaled image classification tasks, they cannot maintain the scale information across the network. In this paper, this problem is addressed. First, a CNN is built with the usage of scale steerable filters. Then, a scale equivariat network is acquired by adding a feature map to each layer so that the scale-related features are retained across the network. At last, by defining the cost function as the cross entropy, this solution is evaluated and the model parameters are updated. The results show that it improves the perfromance about 2% over other comparable methods of scale equivariance and scale invariance, when run on the FMNIST-scale dataset.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126383383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-01DOI: 10.1109/MVIP49855.2020.9116926
Benyamin Kheradvar, A. Mousavinia, A. M. Sodagar
Disparity map images, as outputs of a stereo vision system, are known as an effective approach in applications that need depth information in their procedure. One example of such applications is extracting planes with arbitrary attributes from a scene using the concept of iso-disparity strips. The width and direction of strips depend on the plane direction and position in the 3D space. In this paper, a statistical analysis is performed to model the behavior of these strips. This statistical analysis as well as a frequency analysis reveal that for each group of iso-disparity strips, which are corresponding to a single plane in 3D, the width of strips can be represented by an average value superposed by an Additive Gaussian Noise (AGN). This means that a simple averaging technique can significantly reduce the measurement noise in applications such as ground detection using these strips. Results show that the width of iso-disparity strips can be measured with an average precision of 96% using the presented noise model.
{"title":"Extracting Iso-Disparity Strip Width using a Statistical Model in a Stereo Vision System","authors":"Benyamin Kheradvar, A. Mousavinia, A. M. Sodagar","doi":"10.1109/MVIP49855.2020.9116926","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9116926","url":null,"abstract":"Disparity map images, as outputs of a stereo vision system, are known as an effective approach in applications that need depth information in their procedure. One example of such applications is extracting planes with arbitrary attributes from a scene using the concept of iso-disparity strips. The width and direction of strips depend on the plane direction and position in the 3D space. In this paper, a statistical analysis is performed to model the behavior of these strips. This statistical analysis as well as a frequency analysis reveal that for each group of iso-disparity strips, which are corresponding to a single plane in 3D, the width of strips can be represented by an average value superposed by an Additive Gaussian Noise (AGN). This means that a simple averaging technique can significantly reduce the measurement noise in applications such as ground detection using these strips. Results show that the width of iso-disparity strips can be measured with an average precision of 96% using the presented noise model.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122205070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-01DOI: 10.1109/mvip49855.2020.9116925
{"title":"MVIP 2020 Table of Authors","authors":"","doi":"10.1109/mvip49855.2020.9116925","DOIUrl":"https://doi.org/10.1109/mvip49855.2020.9116925","url":null,"abstract":"","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132638494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-01DOI: 10.1109/MVIP49855.2020.9116915
Mohsen Heidari, Kazim Fouladi-Ghaleh
Nowadays, computer-based face recognition is a mature and reliable mechanism that is significantly used in many access control scenarios along with other biometric methods. Face recognition consists of two subtasks including Face Verification and Face Identification. By comparing a pair of images, Face Verification determines whether those images are related to one person or not; and Face Identification has to identify a specific face within a set of available faces in the database. There are many challenges in face recognition such as angle, illumination, pose, facial expression, noise, resolution, occlusion and the few number of one-class samples with several classes. In this paper, we are carrying out face recognition by utilizing transfer learning in a siamese network which consists of two similar CNNs. In the siamese network, a pair of two face images is given to the network as input, then the network extracts the features of this pair of images and finally, it determines whether the pair of images belongs to one person or not by using a similarity criterion. The results show that the proposed model is comparable with advanced models that are trained on datasets containing large numbers of samples. furthermore, it improves the accuracy of face recognition in comparison with methods which are trained using datasets with a few number of samples, and the mentioned accuracy is claimed to be 95.62% on LFW dataset.
{"title":"Using Siamese Networks with Transfer Learning for Face Recognition on Small-Samples Datasets","authors":"Mohsen Heidari, Kazim Fouladi-Ghaleh","doi":"10.1109/MVIP49855.2020.9116915","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9116915","url":null,"abstract":"Nowadays, computer-based face recognition is a mature and reliable mechanism that is significantly used in many access control scenarios along with other biometric methods. Face recognition consists of two subtasks including Face Verification and Face Identification. By comparing a pair of images, Face Verification determines whether those images are related to one person or not; and Face Identification has to identify a specific face within a set of available faces in the database. There are many challenges in face recognition such as angle, illumination, pose, facial expression, noise, resolution, occlusion and the few number of one-class samples with several classes. In this paper, we are carrying out face recognition by utilizing transfer learning in a siamese network which consists of two similar CNNs. In the siamese network, a pair of two face images is given to the network as input, then the network extracts the features of this pair of images and finally, it determines whether the pair of images belongs to one person or not by using a similarity criterion. The results show that the proposed model is comparable with advanced models that are trained on datasets containing large numbers of samples. furthermore, it improves the accuracy of face recognition in comparison with methods which are trained using datasets with a few number of samples, and the mentioned accuracy is claimed to be 95.62% on LFW dataset.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114166227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-01DOI: 10.1109/MVIP49855.2020.9116873
Amin Golnari, H. Khosravi, S. Sanei
Biometric recognition is a popular topic in machine vision. Deep Neural Networks have been recently used in several applications, especially in biometric recognition. In this paper, we combine a Deep Neural Network alongside Augmented Reality to produce a system capable of recognizing the faces of individuals and displaying some information about the individual as an Augmented Reality. We used a dataset containing 1200 face images of 100 faculty members of the Shahrood University of Technology. After training the proposed Deep Network, it reached the recognition accuracy of 99.45%. We also provided some graphical targets for each person that contains his information. When a person is identified by the deep network, the target image provided for augmented reality is aligned with the angle and dimensions of the detected face and displayed on top of it.
{"title":"DeepFaceAR: Deep Face Recognition and Displaying Personal Information via Augmented Reality","authors":"Amin Golnari, H. Khosravi, S. Sanei","doi":"10.1109/MVIP49855.2020.9116873","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9116873","url":null,"abstract":"Biometric recognition is a popular topic in machine vision. Deep Neural Networks have been recently used in several applications, especially in biometric recognition. In this paper, we combine a Deep Neural Network alongside Augmented Reality to produce a system capable of recognizing the faces of individuals and displaying some information about the individual as an Augmented Reality. We used a dataset containing 1200 face images of 100 faculty members of the Shahrood University of Technology. After training the proposed Deep Network, it reached the recognition accuracy of 99.45%. We also provided some graphical targets for each person that contains his information. When a person is identified by the deep network, the target image provided for augmented reality is aligned with the angle and dimensions of the detected face and displayed on top of it.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126131215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-01DOI: 10.1109/MVIP49855.2020.9116891
Morteza Mousa Pasandi, M. Hajabdollahi, N. Karimi, S. Samavi
Convolutional Neural Networks (CNNs) suffer from different issues such as computational complexity and the number of parameters. In recent years pruning techniques are employed to reduce the number of operations and model size in CNNs. Different pruning methods are proposed, which are based on pruning the connections, channels, and filters. Various techniques and tricks accompany pruning methods, and there is not a unifying framework to model all the pruning methods. In this paper pruning methods are investigated, and a general model which is contained the majority of pruning techniques is proposed. The advantages and disadvantages of the pruning methods can be identified, and all of them can be summarized under this model. The final goal of this model can be providing a specific method for all the pruning methods with different structures and applications.
{"title":"Modeling of Pruning Techniques for Simplifying Deep Neural Networks","authors":"Morteza Mousa Pasandi, M. Hajabdollahi, N. Karimi, S. Samavi","doi":"10.1109/MVIP49855.2020.9116891","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9116891","url":null,"abstract":"Convolutional Neural Networks (CNNs) suffer from different issues such as computational complexity and the number of parameters. In recent years pruning techniques are employed to reduce the number of operations and model size in CNNs. Different pruning methods are proposed, which are based on pruning the connections, channels, and filters. Various techniques and tricks accompany pruning methods, and there is not a unifying framework to model all the pruning methods. In this paper pruning methods are investigated, and a general model which is contained the majority of pruning techniques is proposed. The advantages and disadvantages of the pruning methods can be identified, and all of them can be summarized under this model. The final goal of this model can be providing a specific method for all the pruning methods with different structures and applications.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124848469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-01DOI: 10.1109/MVIP49855.2020.9116871
M. Alizadeh, H. Sajedi, B. BabaAli
Today, with the advancement of technology and the widespread use of the internet, watermarking techniques are being developed to protect copyright and data security. The methods proposed for watermarking can be divided into two main categories: spatial domain watermarking, and frequency domain watermarking. Often matrix transformation methods are merged with another method to select the right place to hide. In this paper, a non-blind watermarking id presented. In order to embed watermark Least Significant Bit (LSB) replacement and QR matrix factorization are exploited. Q learning is used to select the appropriate host blocks. The Peak Signal-to-Noise Ratio(PSNR) of the watermarked image and the extracted watermark image is considered as the reward function. The proposed method has been improved over the algorithms mentioned above with no learning methods and achieved a mean PSNR values of 56.61 dB and 55.77 dB for QR matrix factorization and LSB replacemnet embedding method respectively.
{"title":"Image Watermarking by Q Learning and Matrix Factorization","authors":"M. Alizadeh, H. Sajedi, B. BabaAli","doi":"10.1109/MVIP49855.2020.9116871","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9116871","url":null,"abstract":"Today, with the advancement of technology and the widespread use of the internet, watermarking techniques are being developed to protect copyright and data security. The methods proposed for watermarking can be divided into two main categories: spatial domain watermarking, and frequency domain watermarking. Often matrix transformation methods are merged with another method to select the right place to hide. In this paper, a non-blind watermarking id presented. In order to embed watermark Least Significant Bit (LSB) replacement and QR matrix factorization are exploited. Q learning is used to select the appropriate host blocks. The Peak Signal-to-Noise Ratio(PSNR) of the watermarked image and the extracted watermark image is considered as the reward function. The proposed method has been improved over the algorithms mentioned above with no learning methods and achieved a mean PSNR values of 56.61 dB and 55.77 dB for QR matrix factorization and LSB replacemnet embedding method respectively.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120921028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-01DOI: 10.1109/MVIP49855.2020.9116887
Vahid Reza Khazaie, Alireza Akhavanpour, R. Ebrahimpour
Core object recognition is the task of recognizing objects without regard to any variations in the conditions like pose, illumination or any other structural modifications. This task is solved through the feedforward processing of information in the human visual system. Deep neural networks can perform like humans in this task. However, we do not know how object recognition under more challenging conditions like occlusion is solved. Some computational models imply that recurrent processing might be a solution to the beyond core object recognition task. The other potential mechanism for solving occlusion is to reconstruct the occluded part of the object taking advantage of generative models. Here we used Conditional Generative Adversarial Networks for reconstruction. For reasonable size occlusion, we were able to remove the effect of occlusion and we recovered the performance of the base model. We showed getting the benefit of GANs for reconstruction and adding information by generative models can cause a better performance in the object recognition task under occlusion.
{"title":"Occluded Visual Object Recognition Using Deep Conditional Generative Adversarial Nets and Feedforward Convolutional Neural Networks","authors":"Vahid Reza Khazaie, Alireza Akhavanpour, R. Ebrahimpour","doi":"10.1109/MVIP49855.2020.9116887","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9116887","url":null,"abstract":"Core object recognition is the task of recognizing objects without regard to any variations in the conditions like pose, illumination or any other structural modifications. This task is solved through the feedforward processing of information in the human visual system. Deep neural networks can perform like humans in this task. However, we do not know how object recognition under more challenging conditions like occlusion is solved. Some computational models imply that recurrent processing might be a solution to the beyond core object recognition task. The other potential mechanism for solving occlusion is to reconstruct the occluded part of the object taking advantage of generative models. Here we used Conditional Generative Adversarial Networks for reconstruction. For reasonable size occlusion, we were able to remove the effect of occlusion and we recovered the performance of the base model. We showed getting the benefit of GANs for reconstruction and adding information by generative models can cause a better performance in the object recognition task under occlusion.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131118549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-01DOI: 10.1109/MVIP49855.2020.9116874
Mahdi Kalbasi, Hooman Nikmehr
Two-dimensional (2-D) convolution is a common operation in a wide range of signal and image processing applications such as edge detection, sharpening, and blurring. In the hardware implementation of these applications, 2d convolution is one of the most challenging parts because it is a compute-intensive and memory-intensive operation. To address these challenges, several design techniques such as pipelining, constant multiplication, and time-sharing have been applied in the literature which leads to convolvers with different implementation features. In this paper, based on design techniques, we classify these convolvers into four classes named Non-Pipelined Convolver, Reduced-Bandwidth Pipelined Convolver, Multiplier-Less Pipelined Convolver, and Time-Shared Convolver. Then, implementation features of these classes, such as critical path delay, memory bandwidth, and resource utilization, are analyticcally discussed for different convolution kernel sizes. Finally, an instance of each class is captured in Verilog and their features are evaluated by implementing them on a Virtex-7 FPGA and reported confirming the analytical discussions.
{"title":"A Classified and Comparative Study of 2-D Convolvers","authors":"Mahdi Kalbasi, Hooman Nikmehr","doi":"10.1109/MVIP49855.2020.9116874","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9116874","url":null,"abstract":"Two-dimensional (2-D) convolution is a common operation in a wide range of signal and image processing applications such as edge detection, sharpening, and blurring. In the hardware implementation of these applications, 2d convolution is one of the most challenging parts because it is a compute-intensive and memory-intensive operation. To address these challenges, several design techniques such as pipelining, constant multiplication, and time-sharing have been applied in the literature which leads to convolvers with different implementation features. In this paper, based on design techniques, we classify these convolvers into four classes named Non-Pipelined Convolver, Reduced-Bandwidth Pipelined Convolver, Multiplier-Less Pipelined Convolver, and Time-Shared Convolver. Then, implementation features of these classes, such as critical path delay, memory bandwidth, and resource utilization, are analyticcally discussed for different convolution kernel sizes. Finally, an instance of each class is captured in Verilog and their features are evaluated by implementing them on a Virtex-7 FPGA and reported confirming the analytical discussions.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130265463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-01DOI: 10.1109/mvip49855.2020.9116892
{"title":"MVIP 2020 Cover Page","authors":"","doi":"10.1109/mvip49855.2020.9116892","DOIUrl":"https://doi.org/10.1109/mvip49855.2020.9116892","url":null,"abstract":"","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134143798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}