2022 International Conference on Machine Vision and Image Processing (MVIP)最新文献

英文中文

Feature Line Based Feature Reduction of Polarimetric-Contextual Feature Cube for Polarimetric SAR Classification 基于特征线的极化-上下文特征立方特征约简的极化SAR分类

2022 International Conference on Machine Vision and Image Processing (MVIP)

Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738772

M. Imani

Extraction of discriminative features is an efficient step in any classification problem such as synthetic aperture radar (SAR) images classification. Polarimetric SAR (PolSAR) images with rich spatial features in two first dimensions and polarimetric characteristics in the third dimension are rich source of information for providing classification maps from the ground surface. By applying the spatial operators such as morphological filters by reconstruction, data dimensionality of the PolSAR is increased and needs feature reduction. In this work, median-mean and feature line embedding (MMFLE) is proposed for dimensionality reduction of the polarimetric-contextual cube in PolSAR images. MMFLE is stable with respect to outliers by utilizing the median-mean line metric. By an appropriate definition of scatter matrices, MMFLE maximizes the class separability. In addition, MMFLE is specially a superior feature reduction method when a small training set is available because it uses the feature line metric to model the data variations and generate virtual samples. With 10 training samples per class, MMFLE achieves 94.15% and 83.01% overall classification accuracy, respectively in Flevoland and SanFranciso PolSAR datasets acquired by AIRSAR.

在合成孔径雷达(SAR)图像分类等分类问题中，判别特征的提取是一个有效的步骤。极化SAR (PolSAR)图像具有丰富的二维空间特征和三维极化特征，是提供地面分类地图的丰富信息来源。通过重构形态学滤波器等空间算子，提高了PolSAR的数据维数，减少了特征。在这项工作中，提出了中位数均值和特征线嵌入(MMFLE)来降低PolSAR图像中偏振-上下文立方体的维数。MMFLE是稳定的相对于异常值利用中位数-平均值线度量。通过适当的散点矩阵定义，MMFLE最大化了类的可分性。此外，MMFLE使用特征线度量来建模数据变化并生成虚拟样本，因此在可用的训练集较小时，它是一种优越的特征约简方法。MMFLE在AIRSAR获取的Flevoland和san francisco PolSAR数据集上，每类训练样本10个，总体分类准确率分别达到94.15%和83.01%。

{"title":"Feature Line Based Feature Reduction of Polarimetric-Contextual Feature Cube for Polarimetric SAR Classification","authors":"M. Imani","doi":"10.1109/MVIP53647.2022.9738772","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738772","url":null,"abstract":"Extraction of discriminative features is an efficient step in any classification problem such as synthetic aperture radar (SAR) images classification. Polarimetric SAR (PolSAR) images with rich spatial features in two first dimensions and polarimetric characteristics in the third dimension are rich source of information for providing classification maps from the ground surface. By applying the spatial operators such as morphological filters by reconstruction, data dimensionality of the PolSAR is increased and needs feature reduction. In this work, median-mean and feature line embedding (MMFLE) is proposed for dimensionality reduction of the polarimetric-contextual cube in PolSAR images. MMFLE is stable with respect to outliers by utilizing the median-mean line metric. By an appropriate definition of scatter matrices, MMFLE maximizes the class separability. In addition, MMFLE is specially a superior feature reduction method when a small training set is available because it uses the feature line metric to model the data variations and generate virtual samples. With 10 training samples per class, MMFLE achieves 94.15% and 83.01% overall classification accuracy, respectively in Flevoland and SanFranciso PolSAR datasets acquired by AIRSAR.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116176199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Novel Gaussian Mixture-based Video Coding for Fixed Background Video Streaming 基于高斯混合的固定背景视频流视频编码

2022 International Conference on Machine Vision and Image Processing (MVIP)

Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738789

Mohammadreza Ghafari, A. Amirkhani, E. Rashno, Shirin Ghanbari

In recent years, tremendous advances have been made in Artificial Intelligence (AI) algorithms in the field of image processing. Despite these advances, video compression using AI algorithms has always faced major challenges. These challenges often lie in two areas of higher processing load in comparison with traditional video compression methods, as well as lower visual quality in video content. Careful study and solution of these two challenges is the main motivation of this article that by focusing on them, we have introduced a new video compression based on AI. Since the challenge of processing load is often present in online systems, we have examined our AI video encoder in video streaming applications. One of the most popular applications of video streaming is traffic cameras and video surveillance in road environments which here we called it CCTVs. Our idea in this type of system goes back to fixed background images, where always occupied the bandwidth not efficiently, and the streaming video is related to duplicate background images. Our AI-based video encoder detects fixed background and caches it at the client-side by the background subtraction method. By separating the background image from the moving objects, it is only enough to send the moving objects to the destination, which can save a lot of network bandwidth. Our experimental results show that, in exchange for an acceptable reduction in visual quality assessment, the video compression processing load will be drastically reduced.

近年来，人工智能(AI)算法在图像处理领域取得了巨大进展。尽管取得了这些进步，但使用人工智能算法进行视频压缩一直面临着重大挑战。这些挑战通常存在于两个方面，即与传统的视频压缩方法相比，处理负载更高，以及视频内容的视觉质量较低。仔细研究和解决这两个挑战是本文的主要动机，通过关注它们，我们介绍了一种新的基于AI的视频压缩。由于处理负载的挑战经常出现在在线系统中，我们已经检查了视频流应用中的人工智能视频编码器。视频流最流行的应用之一是交通摄像机和道路环境中的视频监控，这里我们称之为cctv。在这类系统中，我们的思路回到了固定的背景图像，它总是不能有效地占用带宽，并且流媒体视频与重复的背景图像有关。我们基于人工智能的视频编码器检测固定背景，并通过背景减法将其缓存到客户端。通过将背景图像与运动物体分离，只需要将运动物体发送到目的地就可以了，这样可以节省大量的网络带宽。我们的实验结果表明，为了获得可接受的视觉质量评估降低，视频压缩处理负载将大大减少。

{"title":"Novel Gaussian Mixture-based Video Coding for Fixed Background Video Streaming","authors":"Mohammadreza Ghafari, A. Amirkhani, E. Rashno, Shirin Ghanbari","doi":"10.1109/MVIP53647.2022.9738789","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738789","url":null,"abstract":"In recent years, tremendous advances have been made in Artificial Intelligence (AI) algorithms in the field of image processing. Despite these advances, video compression using AI algorithms has always faced major challenges. These challenges often lie in two areas of higher processing load in comparison with traditional video compression methods, as well as lower visual quality in video content. Careful study and solution of these two challenges is the main motivation of this article that by focusing on them, we have introduced a new video compression based on AI. Since the challenge of processing load is often present in online systems, we have examined our AI video encoder in video streaming applications. One of the most popular applications of video streaming is traffic cameras and video surveillance in road environments which here we called it CCTVs. Our idea in this type of system goes back to fixed background images, where always occupied the bandwidth not efficiently, and the streaming video is related to duplicate background images. Our AI-based video encoder detects fixed background and caches it at the client-side by the background subtraction method. By separating the background image from the moving objects, it is only enough to send the moving objects to the destination, which can save a lot of network bandwidth. Our experimental results show that, in exchange for an acceptable reduction in visual quality assessment, the video compression processing load will be drastically reduced.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114919309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

I-GANs for Synthetical Infrared Images Generation 用于合成红外图像生成的i - gan

2022 International Conference on Machine Vision and Image Processing (MVIP)

Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738551

Mohammad Mahdi Moradi, R. Ghaderi

Due to the insensitivity of infrared images to changes in light intensity and weather conditions, these images are used in many surveillance systems and different fields. However, despite all the applications and benefits of these images, not enough data is available in many applications due to the high cost, time-consuming, and complicated data preparation. Two deep neural networks based on Conditional Generative Adversarial Networks are introduced to solve this problem and produce synthetical infrared images. One of these models is only for problems where the pair to pair visible and infrared images are available, and as a result, the mapping between these two domains will be learned. Given that in many of the problems we face unpaired data, another network is proposed in which the goal is to obtain a mapping from visible to infrared images so that the distribution of synthetical infrared images is indistinguishable from the real ones. Two publicly available datasets have been used to train and test the proposed models. Results properly demonstrate that the evaluation of the proposed system in regard to peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) has improved by 4.6199% and 3.9196%, respectively, compared to previous models.

由于红外图像对光强和天气条件的变化不敏感，这些图像被用于许多监控系统和不同的领域。然而，尽管这些图像有很多应用和好处，但由于高成本、耗时和复杂的数据准备，在许多应用中没有足够的数据可用。引入了基于条件生成对抗网络的两种深度神经网络来解决这一问题，并生成了综合红外图像。其中一个模型只适用于有可见光和红外图像的问题，因此，这两个域之间的映射将被学习。考虑到我们面临的许多不成对数据的问题，提出了另一种网络，其目标是获得可见光到红外图像的映射，从而使合成红外图像的分布与真实图像无法区分。两个公开可用的数据集被用来训练和测试所提出的模型。结果表明，该系统在峰值信噪比(PSNR)和结构相似度指标(SSIM)方面的评价分别比之前的模型提高了4.6199%和3.9196%。

{"title":"I-GANs for Synthetical Infrared Images Generation","authors":"Mohammad Mahdi Moradi, R. Ghaderi","doi":"10.1109/MVIP53647.2022.9738551","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738551","url":null,"abstract":"Due to the insensitivity of infrared images to changes in light intensity and weather conditions, these images are used in many surveillance systems and different fields. However, despite all the applications and benefits of these images, not enough data is available in many applications due to the high cost, time-consuming, and complicated data preparation. Two deep neural networks based on Conditional Generative Adversarial Networks are introduced to solve this problem and produce synthetical infrared images. One of these models is only for problems where the pair to pair visible and infrared images are available, and as a result, the mapping between these two domains will be learned. Given that in many of the problems we face unpaired data, another network is proposed in which the goal is to obtain a mapping from visible to infrared images so that the distribution of synthetical infrared images is indistinguishable from the real ones. Two publicly available datasets have been used to train and test the proposed models. Results properly demonstrate that the evaluation of the proposed system in regard to peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) has improved by 4.6199% and 3.9196%, respectively, compared to previous models.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121999455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Unbiased Variable Windows Size Impulse Noise Filter using Genetic Algorithm 基于遗传算法的无偏变窗大小脉冲噪声滤波

2022 International Conference on Machine Vision and Image Processing (MVIP)

Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738757

Mehdi Sadeghibakhi, Seyed Majid Khorashadizadeh, Reza Behboodi, A. Latif

This paper proposes an Unbiased Variable Windows Size Impulse noise filter (UVWS) using a genetic algorithm to effectively restore the corrupted images with high or slight noise densities. The method consists of three stages. First, all pixels are classified into noisy and noise-free categories based on their intensities. In the second stage, the noisy pixels are pushed into a descending priority list the priority associated with each pixel is the number of noise-free pixels in the neighbor’s local window. Finally, for each pixel in the list, a local weighted average is calculated so that the corresponding weight for each neighbor is optimized by the genetic algorithm (GA). The performance of the proposed method is evaluated on several benchmark images and compared with four methods from the literature. The results show that the proposed method performs better in terms of visual quality and PSNR especially when the noise density is very high.

本文提出了一种基于遗传算法的无偏变窗口大小脉冲噪声滤波器(UVWS)，可以有效地恢复具有高或低噪声密度的损坏图像。该方法包括三个阶段。首先，将所有像素根据其强度分为有噪和无噪两类。在第二阶段，噪声像素被推入降序优先级列表，与每个像素相关联的优先级是邻居本地窗口中无噪声像素的数量。最后，对列表中的每个像素计算局部加权平均，通过遗传算法(GA)优化每个邻居的相应权重。在多个基准图像上对该方法的性能进行了评估，并与文献中的四种方法进行了比较。结果表明，在噪声密度较大的情况下，该方法在视觉质量和PSNR方面都有较好的效果。

引用次数: 0

A Low Area and Low Power Pulse Width Modulation Based Digital Pixel Sensor 基于低面积低功率脉宽调制的数字像素传感器

2022 International Conference on Machine Vision and Image Processing (MVIP)

Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738738

E. Talebi, S. Sayedi

A novel low power and low area Digital Pixel Sensor using Pulse Width Modulation technique is designed in one-poly six-metal 0.18μm CMOS standard technology. The pixel has a pitch of 18.33 μm and a fill factor of about 24%. The Light to Time Converter (LTC) at the core of the pixel consumes only 5.5% of the pixel area. Post-layout simulation results exhibit 90.85 dB dynamic range, total power consumption of about 853.65 pW at 33 frames per second and a short conversion time with a maximum of 23.25 ms. The pixel’s digital output is linearized by using a look-up table based digital linearization circuitry resulting in a root mean square pixel-wise error of 0.797 between the original and the captured images. Monte Carlo analysis shows 2.93% Fixed Pattern Noise for the pixel.

采用一聚六金属0.18μm CMOS标准工艺，设计了一种新型低功耗、低面积的脉冲宽度调制数字像素传感器。像素的间距为18.33 μm，填充系数约为24%。像素核心的光时转换器(LTC)仅消耗像素面积的5.5%。布局后仿真结果显示，动态范围为90.85 dB，每秒33帧时的总功耗约为853.65 pW，转换时间短，最大可达23.25 ms。像素的数字输出通过使用基于查找表的数字线性化电路进行线性化，导致原始图像和捕获图像之间的均方根像素误差为0.797。蒙特卡罗分析显示像素的固定模式噪声为2.93%。

引用次数: 0

Evaluation of the Image Processing Technique in Interpretation of Polar Plot Characteristics of Transformer Frequency Response 变压器频率响应极坐标特征解释中的图像处理技术评价

2022 International Conference on Machine Vision and Image Processing (MVIP)

Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738771

Ahmad Vosoughi, Mohammad Hamed Samimi

Frequency response analysis (FRA) is one of the most efficient methods that can diagnose the mechanical faults of power transformers. Digital image processing of the FRA polar plot characteristics has been recently proposed in the literature for the interpretation of power transformers Frequency response. The important advantage of this method is using the phase angle of the FRA trace in addition to its amplitude for the analysis. The digital image processing techniques implemented on the FRA polar plot detect the fault by extracting and analyzing different features of the image by using texture analysis. In this study, the performance of this method is investigated on real windings to examine its ability in detecting the fault extent and type. This step is mandatory since the method is new in the FRA field and has been investigated only in simulation cases. Three different faults, including axial displacement, disk space variation, and radial deformation, are implemented in the experimental setup for the study. Results of implementation of this approach show that this approach neither can determine the fault extent nor the fault type. Therefore, essential changes need to be implemented in the method before applying it in the field.

频率响应分析(FRA)是电力变压器机械故障诊断最有效的方法之一。最近有文献提出用数字图像处理FRA极坐标特征来解释电力变压器的频率响应。该方法的一个重要优点是除了利用频响迹的幅值外，还利用了频响迹的相角进行分析。在FRA极坐标图上实现的数字图像处理技术通过纹理分析提取和分析图像的不同特征来检测故障。在本研究中，研究了该方法在实际绕组上的性能，以检验其检测故障范围和类型的能力。这一步是强制性的，因为该方法在FRA领域是新的，并且只在模拟情况下进行了研究。实验设置了三种不同的断层，包括轴向位移、圆盘空间变化和径向变形。该方法的实施结果表明，该方法既不能确定故障范围，也不能确定故障类型。因此，在将其应用于现场之前，需要在方法中实现基本的更改。

{"title":"Evaluation of the Image Processing Technique in Interpretation of Polar Plot Characteristics of Transformer Frequency Response","authors":"Ahmad Vosoughi, Mohammad Hamed Samimi","doi":"10.1109/MVIP53647.2022.9738771","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738771","url":null,"abstract":"Frequency response analysis (FRA) is one of the most efficient methods that can diagnose the mechanical faults of power transformers. Digital image processing of the FRA polar plot characteristics has been recently proposed in the literature for the interpretation of power transformers Frequency response. The important advantage of this method is using the phase angle of the FRA trace in addition to its amplitude for the analysis. The digital image processing techniques implemented on the FRA polar plot detect the fault by extracting and analyzing different features of the image by using texture analysis. In this study, the performance of this method is investigated on real windings to examine its ability in detecting the fault extent and type. This step is mandatory since the method is new in the FRA field and has been investigated only in simulation cases. Three different faults, including axial displacement, disk space variation, and radial deformation, are implemented in the experimental setup for the study. Results of implementation of this approach show that this approach neither can determine the fault extent nor the fault type. Therefore, essential changes need to be implemented in the method before applying it in the field.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114707165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Designing an Improved Deep Learning-Based Classifier for Breast Cancer Identification in Histopathology Images 设计一种改进的基于深度学习的乳腺癌组织病理学图像识别分类器

2022 International Conference on Machine Vision and Image Processing (MVIP)

Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738774

Amirreza BabaAhmadi, Sahar Khalafi, Fatemeh Malekipour Esfahani

Cancer is a rampant phenomenon caused by uncontrollable cells that grow and spread throughout the body. Invasive Ductal Carcinoma 1 is the most common type of breast cancer, which can be fatal for females if not detected early. As a result, prompt diagnosis is critical to maximizing surveillance rates and, in the meantime, minimizing long-term mortality rates. Nowadays, modern computer vision and deep learning techniques have transformed the medical image analysis arena. Computer vision application in medical image analysis has provided us with remarkable results, enhanced accuracy, and reduced costs. The main purpose of designing a new algorithm to detect unusual patches of breast images, was to acquire both high accuracy and low computational cost, simultaneously. Therefore, a novel architecture has been designed by utilizing Xception and MobileNetV2.This new algorithm achieves 93.4% balanced accuracy and 94.8% for F1-Score, which outperforms previously published algorithms for identifying IDC histopathology images that use deep learning techniques.

癌症是由不受控制的细胞在全身生长和扩散引起的一种猖獗现象。浸润性导管癌是最常见的乳腺癌类型，如果不及早发现，对女性来说可能是致命的。因此，及时诊断对于最大限度地提高监测率，同时最大限度地降低长期死亡率至关重要。如今，现代计算机视觉和深度学习技术已经改变了医学图像分析领域。计算机视觉在医学图像分析中的应用为我们提供了显著的效果，提高了精度，降低了成本。设计一种检测乳房图像异常斑块的新算法的主要目的是同时获得较高的准确率和较低的计算成本。因此，利用Xception和MobileNetV2设计了一种新的体系结构。该新算法的平衡准确率为93.4%，F1-Score为94.8%，优于先前发表的使用深度学习技术识别IDC组织病理学图像的算法。

{"title":"Designing an Improved Deep Learning-Based Classifier for Breast Cancer Identification in Histopathology Images","authors":"Amirreza BabaAhmadi, Sahar Khalafi, Fatemeh Malekipour Esfahani","doi":"10.1109/MVIP53647.2022.9738774","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738774","url":null,"abstract":"Cancer is a rampant phenomenon caused by uncontrollable cells that grow and spread throughout the body. Invasive Ductal Carcinoma 1 is the most common type of breast cancer, which can be fatal for females if not detected early. As a result, prompt diagnosis is critical to maximizing surveillance rates and, in the meantime, minimizing long-term mortality rates. Nowadays, modern computer vision and deep learning techniques have transformed the medical image analysis arena. Computer vision application in medical image analysis has provided us with remarkable results, enhanced accuracy, and reduced costs. The main purpose of designing a new algorithm to detect unusual patches of breast images, was to acquire both high accuracy and low computational cost, simultaneously. Therefore, a novel architecture has been designed by utilizing Xception and MobileNetV2.This new algorithm achieves 93.4% balanced accuracy and 94.8% for F1-Score, which outperforms previously published algorithms for identifying IDC histopathology images that use deep learning techniques.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125132645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An Empirical Study of the Performance of Different Optimizers in the Deep Neural Networks 深度神经网络中不同优化器性能的实证研究

2022 International Conference on Machine Vision and Image Processing (MVIP)

Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738743

A. Zohrevand, Z. Imani

In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.

近年来，随机梯度下降法(SGD)被广泛用于传统神经网络(CNN)模型的优化。虽然许多研究者采用CNN模型对任务进行分类，但据我们所知，针对CNN开发的各种优化器在训练CNN中并没有得到深入的研究和分析。在本文中，我们尝试研究各种优化器对CNN性能的影响。进行了两组实验。首先，对于CIFAR10, MNIST和Fashion MNIST数据集上的记录分类，一个名为VGG11的著名CNN由四种不同的优化器从头开始训练，包括SGD, Adam, Adadelta和AdaGrad。其次，通过同样的四个优化器，一个名为AlexNet的流行CNN架构被微调以对波斯语手写单词进行分类。在这两个实验中，结果表明Adam和AdaGrad在训练成本和识别准确率方面与其他两个优化器具有相对相似的行为和更高的性能。实验研究了不同初始学习率值对Adam优化器性能的影响。结果表明，数值越小收敛速度越快。

{"title":"An Empirical Study of the Performance of Different Optimizers in the Deep Neural Networks","authors":"A. Zohrevand, Z. Imani","doi":"10.1109/MVIP53647.2022.9738743","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738743","url":null,"abstract":"In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129564160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

DABA-Net: Deep Acceleration-Based AutoEncoder Network for Violence Detection in Surveillance Cameras DABA-Net:基于深度加速的自动编码器网络用于监控摄像机中的暴力检测

2022 International Conference on Machine Vision and Image Processing (MVIP)

Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738791

Tahereh Zarrat Ehsan, M. Nahvi, Seyed Mehdi Mohtavipour

Violent crime is one of the main reasons for death and mental disorder among adults worldwide. It increases the emotional distress of families and communities, such as depression, anxiety, and post-traumatic stress disorder. Automatic violence detection in surveillance cameras is an important research area to prevent physical and mental harm. Previous human behavior classifiers are based on learning both normal and violent patterns to categorize new unknown samples. There are few large datasets with various violent actions, so they could not provide sufficient generality in unseen situations. This paper introduces a novel unsupervised network based on motion acceleration patterns to derive and abstract discriminative features from input samples. This network is constructed from an AutoEncoder architecture, and it is required only to use normal samples in the training phase. The classification has been performed using a one-class classifier to specify violent and normal actions. Obtained results on Hockey and Movie datasets showed that the proposed network achieved outstanding accuracy and generality compared to the state-of-the-art violence detection methods.

暴力犯罪是全世界成年人死亡和精神失常的主要原因之一。它增加了家庭和社区的情绪困扰，如抑郁、焦虑和创伤后应激障碍。监控摄像机的暴力自动检测是防止身体和精神伤害的重要研究领域。以前的人类行为分类器是基于学习正常和暴力模式来分类新的未知样本。很少有包含各种暴力行为的大型数据集，因此它们无法在看不见的情况下提供足够的通用性。本文介绍了一种基于运动加速模式的无监督网络，从输入样本中导出和抽象判别特征。该网络由AutoEncoder架构构建，并且只需要在训练阶段使用正常样本。已经使用单类分类器执行了分类，以指定暴力和正常操作。在Hockey和Movie数据集上获得的结果表明，与最先进的暴力检测方法相比，所提出的网络具有出色的准确性和通用性。

{"title":"DABA-Net: Deep Acceleration-Based AutoEncoder Network for Violence Detection in Surveillance Cameras","authors":"Tahereh Zarrat Ehsan, M. Nahvi, Seyed Mehdi Mohtavipour","doi":"10.1109/MVIP53647.2022.9738791","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738791","url":null,"abstract":"Violent crime is one of the main reasons for death and mental disorder among adults worldwide. It increases the emotional distress of families and communities, such as depression, anxiety, and post-traumatic stress disorder. Automatic violence detection in surveillance cameras is an important research area to prevent physical and mental harm. Previous human behavior classifiers are based on learning both normal and violent patterns to categorize new unknown samples. There are few large datasets with various violent actions, so they could not provide sufficient generality in unseen situations. This paper introduces a novel unsupervised network based on motion acceleration patterns to derive and abstract discriminative features from input samples. This network is constructed from an AutoEncoder architecture, and it is required only to use normal samples in the training phase. The classification has been performed using a one-class classifier to specify violent and normal actions. Obtained results on Hockey and Movie datasets showed that the proposed network achieved outstanding accuracy and generality compared to the state-of-the-art violence detection methods.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117008885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Video Denoising using Temporal Coherency of Video Frames and Sparse Representation 基于视频帧时间相干和稀疏表示的视频去噪

2022 International Conference on Machine Vision and Image Processing (MVIP)

Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738770

Azadeh Torkashvand, A. Behrad

Sparse representation based on dictionary learning has been widely used in many applications over the past decade. In this article, a new method is proposed for removing noise from video images using sparse representation and a trained dictionary. To enhance the noise removal capability, the proposed method is combined with a block matching algorithm to take the advantage of the temporal dependency of video images and increase the quality of the output images. The simulations performed on different test data show the appropriate response of the proposed algorithm in terms of video image output quality.

在过去的十年里，基于字典学习的稀疏表示在许多应用中得到了广泛的应用。本文提出了一种利用稀疏表示和训练字典来去除视频图像噪声的新方法。为了增强去噪能力，将该方法与块匹配算法相结合，利用视频图像的时间依赖性，提高输出图像的质量。在不同的测试数据上进行了仿真，结果表明该算法对视频图像输出质量的响应是适当的。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 International Conference on Machine Vision and Image Processing (MVIP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀