Pub Date : 2022-02-23DOI: 10.1109/MVIP53647.2022.9738772
M. Imani
Extraction of discriminative features is an efficient step in any classification problem such as synthetic aperture radar (SAR) images classification. Polarimetric SAR (PolSAR) images with rich spatial features in two first dimensions and polarimetric characteristics in the third dimension are rich source of information for providing classification maps from the ground surface. By applying the spatial operators such as morphological filters by reconstruction, data dimensionality of the PolSAR is increased and needs feature reduction. In this work, median-mean and feature line embedding (MMFLE) is proposed for dimensionality reduction of the polarimetric-contextual cube in PolSAR images. MMFLE is stable with respect to outliers by utilizing the median-mean line metric. By an appropriate definition of scatter matrices, MMFLE maximizes the class separability. In addition, MMFLE is specially a superior feature reduction method when a small training set is available because it uses the feature line metric to model the data variations and generate virtual samples. With 10 training samples per class, MMFLE achieves 94.15% and 83.01% overall classification accuracy, respectively in Flevoland and SanFranciso PolSAR datasets acquired by AIRSAR.
在合成孔径雷达(SAR)图像分类等分类问题中,判别特征的提取是一个有效的步骤。极化SAR (PolSAR)图像具有丰富的二维空间特征和三维极化特征,是提供地面分类地图的丰富信息来源。通过重构形态学滤波器等空间算子,提高了PolSAR的数据维数,减少了特征。在这项工作中,提出了中位数均值和特征线嵌入(MMFLE)来降低PolSAR图像中偏振-上下文立方体的维数。MMFLE是稳定的相对于异常值利用中位数-平均值线度量。通过适当的散点矩阵定义,MMFLE最大化了类的可分性。此外,MMFLE使用特征线度量来建模数据变化并生成虚拟样本,因此在可用的训练集较小时,它是一种优越的特征约简方法。MMFLE在AIRSAR获取的Flevoland和san francisco PolSAR数据集上,每类训练样本10个,总体分类准确率分别达到94.15%和83.01%。
{"title":"Feature Line Based Feature Reduction of Polarimetric-Contextual Feature Cube for Polarimetric SAR Classification","authors":"M. Imani","doi":"10.1109/MVIP53647.2022.9738772","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738772","url":null,"abstract":"Extraction of discriminative features is an efficient step in any classification problem such as synthetic aperture radar (SAR) images classification. Polarimetric SAR (PolSAR) images with rich spatial features in two first dimensions and polarimetric characteristics in the third dimension are rich source of information for providing classification maps from the ground surface. By applying the spatial operators such as morphological filters by reconstruction, data dimensionality of the PolSAR is increased and needs feature reduction. In this work, median-mean and feature line embedding (MMFLE) is proposed for dimensionality reduction of the polarimetric-contextual cube in PolSAR images. MMFLE is stable with respect to outliers by utilizing the median-mean line metric. By an appropriate definition of scatter matrices, MMFLE maximizes the class separability. In addition, MMFLE is specially a superior feature reduction method when a small training set is available because it uses the feature line metric to model the data variations and generate virtual samples. With 10 training samples per class, MMFLE achieves 94.15% and 83.01% overall classification accuracy, respectively in Flevoland and SanFranciso PolSAR datasets acquired by AIRSAR.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116176199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-23DOI: 10.1109/MVIP53647.2022.9738789
Mohammadreza Ghafari, A. Amirkhani, E. Rashno, Shirin Ghanbari
In recent years, tremendous advances have been made in Artificial Intelligence (AI) algorithms in the field of image processing. Despite these advances, video compression using AI algorithms has always faced major challenges. These challenges often lie in two areas of higher processing load in comparison with traditional video compression methods, as well as lower visual quality in video content. Careful study and solution of these two challenges is the main motivation of this article that by focusing on them, we have introduced a new video compression based on AI. Since the challenge of processing load is often present in online systems, we have examined our AI video encoder in video streaming applications. One of the most popular applications of video streaming is traffic cameras and video surveillance in road environments which here we called it CCTVs. Our idea in this type of system goes back to fixed background images, where always occupied the bandwidth not efficiently, and the streaming video is related to duplicate background images. Our AI-based video encoder detects fixed background and caches it at the client-side by the background subtraction method. By separating the background image from the moving objects, it is only enough to send the moving objects to the destination, which can save a lot of network bandwidth. Our experimental results show that, in exchange for an acceptable reduction in visual quality assessment, the video compression processing load will be drastically reduced.
{"title":"Novel Gaussian Mixture-based Video Coding for Fixed Background Video Streaming","authors":"Mohammadreza Ghafari, A. Amirkhani, E. Rashno, Shirin Ghanbari","doi":"10.1109/MVIP53647.2022.9738789","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738789","url":null,"abstract":"In recent years, tremendous advances have been made in Artificial Intelligence (AI) algorithms in the field of image processing. Despite these advances, video compression using AI algorithms has always faced major challenges. These challenges often lie in two areas of higher processing load in comparison with traditional video compression methods, as well as lower visual quality in video content. Careful study and solution of these two challenges is the main motivation of this article that by focusing on them, we have introduced a new video compression based on AI. Since the challenge of processing load is often present in online systems, we have examined our AI video encoder in video streaming applications. One of the most popular applications of video streaming is traffic cameras and video surveillance in road environments which here we called it CCTVs. Our idea in this type of system goes back to fixed background images, where always occupied the bandwidth not efficiently, and the streaming video is related to duplicate background images. Our AI-based video encoder detects fixed background and caches it at the client-side by the background subtraction method. By separating the background image from the moving objects, it is only enough to send the moving objects to the destination, which can save a lot of network bandwidth. Our experimental results show that, in exchange for an acceptable reduction in visual quality assessment, the video compression processing load will be drastically reduced.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114919309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-23DOI: 10.1109/MVIP53647.2022.9738551
Mohammad Mahdi Moradi, R. Ghaderi
Due to the insensitivity of infrared images to changes in light intensity and weather conditions, these images are used in many surveillance systems and different fields. However, despite all the applications and benefits of these images, not enough data is available in many applications due to the high cost, time-consuming, and complicated data preparation. Two deep neural networks based on Conditional Generative Adversarial Networks are introduced to solve this problem and produce synthetical infrared images. One of these models is only for problems where the pair to pair visible and infrared images are available, and as a result, the mapping between these two domains will be learned. Given that in many of the problems we face unpaired data, another network is proposed in which the goal is to obtain a mapping from visible to infrared images so that the distribution of synthetical infrared images is indistinguishable from the real ones. Two publicly available datasets have been used to train and test the proposed models. Results properly demonstrate that the evaluation of the proposed system in regard to peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) has improved by 4.6199% and 3.9196%, respectively, compared to previous models.
{"title":"I-GANs for Synthetical Infrared Images Generation","authors":"Mohammad Mahdi Moradi, R. Ghaderi","doi":"10.1109/MVIP53647.2022.9738551","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738551","url":null,"abstract":"Due to the insensitivity of infrared images to changes in light intensity and weather conditions, these images are used in many surveillance systems and different fields. However, despite all the applications and benefits of these images, not enough data is available in many applications due to the high cost, time-consuming, and complicated data preparation. Two deep neural networks based on Conditional Generative Adversarial Networks are introduced to solve this problem and produce synthetical infrared images. One of these models is only for problems where the pair to pair visible and infrared images are available, and as a result, the mapping between these two domains will be learned. Given that in many of the problems we face unpaired data, another network is proposed in which the goal is to obtain a mapping from visible to infrared images so that the distribution of synthetical infrared images is indistinguishable from the real ones. Two publicly available datasets have been used to train and test the proposed models. Results properly demonstrate that the evaluation of the proposed system in regard to peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) has improved by 4.6199% and 3.9196%, respectively, compared to previous models.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121999455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-23DOI: 10.1109/MVIP53647.2022.9738757
Mehdi Sadeghibakhi, Seyed Majid Khorashadizadeh, Reza Behboodi, A. Latif
This paper proposes an Unbiased Variable Windows Size Impulse noise filter (UVWS) using a genetic algorithm to effectively restore the corrupted images with high or slight noise densities. The method consists of three stages. First, all pixels are classified into noisy and noise-free categories based on their intensities. In the second stage, the noisy pixels are pushed into a descending priority list the priority associated with each pixel is the number of noise-free pixels in the neighbor’s local window. Finally, for each pixel in the list, a local weighted average is calculated so that the corresponding weight for each neighbor is optimized by the genetic algorithm (GA). The performance of the proposed method is evaluated on several benchmark images and compared with four methods from the literature. The results show that the proposed method performs better in terms of visual quality and PSNR especially when the noise density is very high.
{"title":"Unbiased Variable Windows Size Impulse Noise Filter using Genetic Algorithm","authors":"Mehdi Sadeghibakhi, Seyed Majid Khorashadizadeh, Reza Behboodi, A. Latif","doi":"10.1109/MVIP53647.2022.9738757","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738757","url":null,"abstract":"This paper proposes an Unbiased Variable Windows Size Impulse noise filter (UVWS) using a genetic algorithm to effectively restore the corrupted images with high or slight noise densities. The method consists of three stages. First, all pixels are classified into noisy and noise-free categories based on their intensities. In the second stage, the noisy pixels are pushed into a descending priority list the priority associated with each pixel is the number of noise-free pixels in the neighbor’s local window. Finally, for each pixel in the list, a local weighted average is calculated so that the corresponding weight for each neighbor is optimized by the genetic algorithm (GA). The performance of the proposed method is evaluated on several benchmark images and compared with four methods from the literature. The results show that the proposed method performs better in terms of visual quality and PSNR especially when the noise density is very high.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121076305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-23DOI: 10.1109/MVIP53647.2022.9738738
E. Talebi, S. Sayedi
A novel low power and low area Digital Pixel Sensor using Pulse Width Modulation technique is designed in one-poly six-metal 0.18μm CMOS standard technology. The pixel has a pitch of 18.33 μm and a fill factor of about 24%. The Light to Time Converter (LTC) at the core of the pixel consumes only 5.5% of the pixel area. Post-layout simulation results exhibit 90.85 dB dynamic range, total power consumption of about 853.65 pW at 33 frames per second and a short conversion time with a maximum of 23.25 ms. The pixel’s digital output is linearized by using a look-up table based digital linearization circuitry resulting in a root mean square pixel-wise error of 0.797 between the original and the captured images. Monte Carlo analysis shows 2.93% Fixed Pattern Noise for the pixel.
{"title":"A Low Area and Low Power Pulse Width Modulation Based Digital Pixel Sensor","authors":"E. Talebi, S. Sayedi","doi":"10.1109/MVIP53647.2022.9738738","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738738","url":null,"abstract":"A novel low power and low area Digital Pixel Sensor using Pulse Width Modulation technique is designed in one-poly six-metal 0.18μm CMOS standard technology. The pixel has a pitch of 18.33 μm and a fill factor of about 24%. The Light to Time Converter (LTC) at the core of the pixel consumes only 5.5% of the pixel area. Post-layout simulation results exhibit 90.85 dB dynamic range, total power consumption of about 853.65 pW at 33 frames per second and a short conversion time with a maximum of 23.25 ms. The pixel’s digital output is linearized by using a look-up table based digital linearization circuitry resulting in a root mean square pixel-wise error of 0.797 between the original and the captured images. Monte Carlo analysis shows 2.93% Fixed Pattern Noise for the pixel.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133804120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-23DOI: 10.1109/MVIP53647.2022.9738771
Ahmad Vosoughi, Mohammad Hamed Samimi
Frequency response analysis (FRA) is one of the most efficient methods that can diagnose the mechanical faults of power transformers. Digital image processing of the FRA polar plot characteristics has been recently proposed in the literature for the interpretation of power transformers Frequency response. The important advantage of this method is using the phase angle of the FRA trace in addition to its amplitude for the analysis. The digital image processing techniques implemented on the FRA polar plot detect the fault by extracting and analyzing different features of the image by using texture analysis. In this study, the performance of this method is investigated on real windings to examine its ability in detecting the fault extent and type. This step is mandatory since the method is new in the FRA field and has been investigated only in simulation cases. Three different faults, including axial displacement, disk space variation, and radial deformation, are implemented in the experimental setup for the study. Results of implementation of this approach show that this approach neither can determine the fault extent nor the fault type. Therefore, essential changes need to be implemented in the method before applying it in the field.
{"title":"Evaluation of the Image Processing Technique in Interpretation of Polar Plot Characteristics of Transformer Frequency Response","authors":"Ahmad Vosoughi, Mohammad Hamed Samimi","doi":"10.1109/MVIP53647.2022.9738771","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738771","url":null,"abstract":"Frequency response analysis (FRA) is one of the most efficient methods that can diagnose the mechanical faults of power transformers. Digital image processing of the FRA polar plot characteristics has been recently proposed in the literature for the interpretation of power transformers Frequency response. The important advantage of this method is using the phase angle of the FRA trace in addition to its amplitude for the analysis. The digital image processing techniques implemented on the FRA polar plot detect the fault by extracting and analyzing different features of the image by using texture analysis. In this study, the performance of this method is investigated on real windings to examine its ability in detecting the fault extent and type. This step is mandatory since the method is new in the FRA field and has been investigated only in simulation cases. Three different faults, including axial displacement, disk space variation, and radial deformation, are implemented in the experimental setup for the study. Results of implementation of this approach show that this approach neither can determine the fault extent nor the fault type. Therefore, essential changes need to be implemented in the method before applying it in the field.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114707165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cancer is a rampant phenomenon caused by uncontrollable cells that grow and spread throughout the body. Invasive Ductal Carcinoma 1 is the most common type of breast cancer, which can be fatal for females if not detected early. As a result, prompt diagnosis is critical to maximizing surveillance rates and, in the meantime, minimizing long-term mortality rates. Nowadays, modern computer vision and deep learning techniques have transformed the medical image analysis arena. Computer vision application in medical image analysis has provided us with remarkable results, enhanced accuracy, and reduced costs. The main purpose of designing a new algorithm to detect unusual patches of breast images, was to acquire both high accuracy and low computational cost, simultaneously. Therefore, a novel architecture has been designed by utilizing Xception and MobileNetV2.This new algorithm achieves 93.4% balanced accuracy and 94.8% for F1-Score, which outperforms previously published algorithms for identifying IDC histopathology images that use deep learning techniques.
{"title":"Designing an Improved Deep Learning-Based Classifier for Breast Cancer Identification in Histopathology Images","authors":"Amirreza BabaAhmadi, Sahar Khalafi, Fatemeh Malekipour Esfahani","doi":"10.1109/MVIP53647.2022.9738774","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738774","url":null,"abstract":"Cancer is a rampant phenomenon caused by uncontrollable cells that grow and spread throughout the body. Invasive Ductal Carcinoma 1 is the most common type of breast cancer, which can be fatal for females if not detected early. As a result, prompt diagnosis is critical to maximizing surveillance rates and, in the meantime, minimizing long-term mortality rates. Nowadays, modern computer vision and deep learning techniques have transformed the medical image analysis arena. Computer vision application in medical image analysis has provided us with remarkable results, enhanced accuracy, and reduced costs. The main purpose of designing a new algorithm to detect unusual patches of breast images, was to acquire both high accuracy and low computational cost, simultaneously. Therefore, a novel architecture has been designed by utilizing Xception and MobileNetV2.This new algorithm achieves 93.4% balanced accuracy and 94.8% for F1-Score, which outperforms previously published algorithms for identifying IDC histopathology images that use deep learning techniques.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125132645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-23DOI: 10.1109/MVIP53647.2022.9738743
A. Zohrevand, Z. Imani
In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.
{"title":"An Empirical Study of the Performance of Different Optimizers in the Deep Neural Networks","authors":"A. Zohrevand, Z. Imani","doi":"10.1109/MVIP53647.2022.9738743","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738743","url":null,"abstract":"In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129564160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-23DOI: 10.1109/MVIP53647.2022.9738791
Tahereh Zarrat Ehsan, M. Nahvi, Seyed Mehdi Mohtavipour
Violent crime is one of the main reasons for death and mental disorder among adults worldwide. It increases the emotional distress of families and communities, such as depression, anxiety, and post-traumatic stress disorder. Automatic violence detection in surveillance cameras is an important research area to prevent physical and mental harm. Previous human behavior classifiers are based on learning both normal and violent patterns to categorize new unknown samples. There are few large datasets with various violent actions, so they could not provide sufficient generality in unseen situations. This paper introduces a novel unsupervised network based on motion acceleration patterns to derive and abstract discriminative features from input samples. This network is constructed from an AutoEncoder architecture, and it is required only to use normal samples in the training phase. The classification has been performed using a one-class classifier to specify violent and normal actions. Obtained results on Hockey and Movie datasets showed that the proposed network achieved outstanding accuracy and generality compared to the state-of-the-art violence detection methods.
{"title":"DABA-Net: Deep Acceleration-Based AutoEncoder Network for Violence Detection in Surveillance Cameras","authors":"Tahereh Zarrat Ehsan, M. Nahvi, Seyed Mehdi Mohtavipour","doi":"10.1109/MVIP53647.2022.9738791","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738791","url":null,"abstract":"Violent crime is one of the main reasons for death and mental disorder among adults worldwide. It increases the emotional distress of families and communities, such as depression, anxiety, and post-traumatic stress disorder. Automatic violence detection in surveillance cameras is an important research area to prevent physical and mental harm. Previous human behavior classifiers are based on learning both normal and violent patterns to categorize new unknown samples. There are few large datasets with various violent actions, so they could not provide sufficient generality in unseen situations. This paper introduces a novel unsupervised network based on motion acceleration patterns to derive and abstract discriminative features from input samples. This network is constructed from an AutoEncoder architecture, and it is required only to use normal samples in the training phase. The classification has been performed using a one-class classifier to specify violent and normal actions. Obtained results on Hockey and Movie datasets showed that the proposed network achieved outstanding accuracy and generality compared to the state-of-the-art violence detection methods.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117008885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-23DOI: 10.1109/MVIP53647.2022.9738770
Azadeh Torkashvand, A. Behrad
Sparse representation based on dictionary learning has been widely used in many applications over the past decade. In this article, a new method is proposed for removing noise from video images using sparse representation and a trained dictionary. To enhance the noise removal capability, the proposed method is combined with a block matching algorithm to take the advantage of the temporal dependency of video images and increase the quality of the output images. The simulations performed on different test data show the appropriate response of the proposed algorithm in terms of video image output quality.
{"title":"Video Denoising using Temporal Coherency of Video Frames and Sparse Representation","authors":"Azadeh Torkashvand, A. Behrad","doi":"10.1109/MVIP53647.2022.9738770","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738770","url":null,"abstract":"Sparse representation based on dictionary learning has been widely used in many applications over the past decade. In this article, a new method is proposed for removing noise from video images using sparse representation and a trained dictionary. To enhance the noise removal capability, the proposed method is combined with a block matching algorithm to take the advantage of the temporal dependency of video images and increase the quality of the output images. The simulations performed on different test data show the appropriate response of the proposed algorithm in terms of video image output quality.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131017073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}