首页 > 最新文献

2022 International Conference on Machine Vision and Image Processing (MVIP)最新文献

英文 中文
Tumor Detection in Brain MRI using Residual Convolutional Neural Networks 残差卷积神经网络在脑MRI肿瘤检测中的应用
Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738767
Mohammad Reza Obeidavi, K. Maghooli
Brain tumor is one of the complications that has a high mortality rate. Early detection of tumors can help treat this type of cancer. Among tumor detection methods, magnetic resonance imaging (MRI) is a common method. But there is always an attempt to detect the tumor automatically in medical images. Therefore, in this paper, a method for automatic detection of tumor in MRI images with the help of residual neural networks is introduced. By testing the proposed neural network on the BRATS data set, the results show well the efficiency of the proposed method.
脑肿瘤是死亡率很高的并发症之一。早期发现肿瘤有助于治疗这类癌症。在肿瘤检测方法中,磁共振成像(MRI)是一种常用的方法。但在医学图像中,一直存在对肿瘤自动检测的尝试。因此,本文介绍了一种基于残差神经网络的MRI图像肿瘤自动检测方法。通过在BRATS数据集上对所提神经网络的测试,结果表明了所提方法的有效性。
{"title":"Tumor Detection in Brain MRI using Residual Convolutional Neural Networks","authors":"Mohammad Reza Obeidavi, K. Maghooli","doi":"10.1109/MVIP53647.2022.9738767","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738767","url":null,"abstract":"Brain tumor is one of the complications that has a high mortality rate. Early detection of tumors can help treat this type of cancer. Among tumor detection methods, magnetic resonance imaging (MRI) is a common method. But there is always an attempt to detect the tumor automatically in medical images. Therefore, in this paper, a method for automatic detection of tumor in MRI images with the help of residual neural networks is introduced. By testing the proposed neural network on the BRATS data set, the results show well the efficiency of the proposed method.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133076492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Deep Autoencoder Multi-Exposure HDR Imaging 深度自动编码器多曝光HDR成像
Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738552
A. Omrani, M. Soheili, M. Kelarestaghi
Recently, in the era of photography, due to capturing images with limited dynamic range by cameras, High Dynamic Range (HDR) imaging has engrossed people’s attention because HDR pictures present more details and better luminance than images with Low Dynamic Range (LDR). Moreover, produced HDR images by a single LDR image cannot reconstruct details appropriately, and therefore, in this research, a deep learning method is proposed to generate an HDR picture by multiple LDR pictures with different exposures. The experiments and results illustrate that the proposed algorithm performs better than the other methods in quantitative and visual comparison.
近年来,在摄影时代,由于相机拍摄的图像动态范围有限,高动态范围(High dynamic range, HDR)成像受到了人们的关注,因为HDR图像比低动态范围(Low dynamic range, LDR)图像呈现出更多的细节和更好的亮度。此外,单张LDR图像生成的HDR图像不能很好地重建细节,因此,本研究提出了一种深度学习方法,通过多张不同曝光的LDR图像生成HDR图像。实验和结果表明,该算法在定量和视觉比较方面都优于其他方法。
{"title":"Deep Autoencoder Multi-Exposure HDR Imaging","authors":"A. Omrani, M. Soheili, M. Kelarestaghi","doi":"10.1109/MVIP53647.2022.9738552","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738552","url":null,"abstract":"Recently, in the era of photography, due to capturing images with limited dynamic range by cameras, High Dynamic Range (HDR) imaging has engrossed people’s attention because HDR pictures present more details and better luminance than images with Low Dynamic Range (LDR). Moreover, produced HDR images by a single LDR image cannot reconstruct details appropriately, and therefore, in this research, a deep learning method is proposed to generate an HDR picture by multiple LDR pictures with different exposures. The experiments and results illustrate that the proposed algorithm performs better than the other methods in quantitative and visual comparison.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130455599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Light Face: A Light Face Detector for Edge Devices 光面:用于边缘设备的光面检测器
Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738740
Saeed Khanehgir, Amir Mohammad Ghoreyshi, Alireza Akbari, R. Derakhshan, M. Sabokrou
Face detection is one of the most important and basic steps in the recognition and verification of human identity. Using models based on convolutional networks such as face detection models is very difficult and challenging due to a large number of parameters, computational complexity, and high power consumption in environments such as edge devices, mobiles with limited memory storage resources, and low computing power. In this paper, a light and fast face detection model is proposed to predict the face boxes with real-time speed and high accuracy. The proposed model is structured based on the YOLO algorithm and CSPDarknet53 tiny backbone. Some tricks such as calculating custom anchor boxes aimed to solve the detection problem of varying face scales and some optimization techniques such as pruning and quantization have also been used to optimize and reduce the number of parameters and improve the speed to make the final model strong and suitable for use in environments with low computational power. One of our best models with a MAP of 67.52% on the WIDER FACE dataset and a volume of 1.7 Mb and a speed of 1.43 FPS on a mobile phone with ordinary hardware has shown significant performance
人脸检测是识别和验证人类身份的最重要和最基本的步骤之一。使用基于卷积网络的模型(如人脸检测模型)是非常困难和具有挑战性的,因为在边缘设备、内存存储资源有限的移动设备和低计算能力等环境中存在大量参数、计算复杂性和高功耗。本文提出了一种轻量快速的人脸检测模型,能够实时、高精度地预测人脸盒。该模型基于YOLO算法和CSPDarknet53微骨干结构。一些技巧,如计算自定义锚盒,旨在解决不同人脸尺度的检测问题,一些优化技术,如修剪和量化,也被用来优化和减少参数数量,提高速度,使最终模型强大,适合在低计算能力的环境中使用。我们最好的模型之一在使用普通硬件的手机上,MAP为67.52%,体积为1.7 Mb,速度为1.43 FPS,表现出了显著的性能
{"title":"Light Face: A Light Face Detector for Edge Devices","authors":"Saeed Khanehgir, Amir Mohammad Ghoreyshi, Alireza Akbari, R. Derakhshan, M. Sabokrou","doi":"10.1109/MVIP53647.2022.9738740","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738740","url":null,"abstract":"Face detection is one of the most important and basic steps in the recognition and verification of human identity. Using models based on convolutional networks such as face detection models is very difficult and challenging due to a large number of parameters, computational complexity, and high power consumption in environments such as edge devices, mobiles with limited memory storage resources, and low computing power. In this paper, a light and fast face detection model is proposed to predict the face boxes with real-time speed and high accuracy. The proposed model is structured based on the YOLO algorithm and CSPDarknet53 tiny backbone. Some tricks such as calculating custom anchor boxes aimed to solve the detection problem of varying face scales and some optimization techniques such as pruning and quantization have also been used to optimize and reduce the number of parameters and improve the speed to make the final model strong and suitable for use in environments with low computational power. One of our best models with a MAP of 67.52% on the WIDER FACE dataset and a volume of 1.7 Mb and a speed of 1.43 FPS on a mobile phone with ordinary hardware has shown significant performance","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125695796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A face detection method via ensemble of four versions of YOLOs 一种基于四个版本的yolo集合的人脸检测方法
Pub Date : 2022-02-23 DOI: 10.1109/MVIP53647.2022.9738779
Sanaz Khalili, A. Shakiba
We implemented a real-time ensemble model for face detection by combining the results of YOLO v1 to v4. We used the WIDER FACE benchmark for training YOLOv1 to v4 in the Darknet framework. Then, we ensemble their results by two methods, namely, WBF (Weighted boxes fusion) and NMW (Non-maximum weighted). The experimental analysis showed that the mAP increases in the WBF ensemble of the models for all the easy, medium, and hard images in the datasets by 7.81%, 22.91%, and 12.96%, respectively. These numbers are 6.25%, 20.83%, and 11.11% for the NMW ensemble.
我们将YOLO v1和v4的结果结合起来,实现了一个实时的人脸检测集成模型。我们在Darknet框架中使用WIDER FACE基准来训练YOLOv1到v4。然后,我们通过加权盒融合(WBF)和非最大加权融合(NMW)两种方法对结果进行综合。实验分析表明,对于所有数据集中的易、中、硬图像,mAP分别使模型的WBF集合提高了7.81%、22.91%和12.96%。这些数字分别为6.25%、20.83%和11.11%。
{"title":"A face detection method via ensemble of four versions of YOLOs","authors":"Sanaz Khalili, A. Shakiba","doi":"10.1109/MVIP53647.2022.9738779","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738779","url":null,"abstract":"We implemented a real-time ensemble model for face detection by combining the results of YOLO v1 to v4. We used the WIDER FACE benchmark for training YOLOv1 to v4 in the Darknet framework. Then, we ensemble their results by two methods, namely, WBF (Weighted boxes fusion) and NMW (Non-maximum weighted). The experimental analysis showed that the mAP increases in the WBF ensemble of the models for all the easy, medium, and hard images in the datasets by 7.81%, 22.91%, and 12.96%, respectively. These numbers are 6.25%, 20.83%, and 11.11% for the NMW ensemble.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126536499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Real-Time Facial Expression Recognition using Facial Landmarks and Neural Networks 基于面部地标和神经网络的实时面部表情识别
Pub Date : 2022-01-31 DOI: 10.1109/MVIP53647.2022.9738754
M. Haghpanah, Ehsan Saeedizade, M. T. Masouleh, A. Kalhor
This paper presents a lightweight algorithm for feature extraction, classification of seven different emotions, and facial expression recognition in a real-time manner based on static images of the human face. In this regard, a Multi-Layer Perceptron (MLP) neural network is trained based on the foregoing algorithm. In order to classify human faces, first, some pre-processing is applied to the input image, which can localize and cut out faces from it. In the next step, a facial landmark detection library is used, which can detect the landmarks of each face. Then, the human face is split into upper and lower faces, which enables the extraction of the desired features from each part. In the proposed model, both geometric and texture-based feature types are taken into account. After the feature extraction phase, a normalized vector of features is created. A 3-layer MLP is trained using these feature vectors, leading to 96% accuracy on the test set.
本文提出了一种基于人脸静态图像的特征提取、七种不同情绪分类和面部表情实时识别的轻量级算法。为此,在上述算法的基础上训练了一个多层感知器(MLP)神经网络。为了对人脸进行分类,首先对输入图像进行预处理,实现人脸的定位和裁剪;下一步,使用人脸地标检测库,检测每个人脸的地标。然后,将人脸分割成上下两部分,从每一部分提取出所需的特征。该模型同时考虑了几何特征和基于纹理的特征。在特征提取阶段之后,创建一个归一化的特征向量。使用这些特征向量训练3层MLP,在测试集上达到96%的准确率。
{"title":"Real-Time Facial Expression Recognition using Facial Landmarks and Neural Networks","authors":"M. Haghpanah, Ehsan Saeedizade, M. T. Masouleh, A. Kalhor","doi":"10.1109/MVIP53647.2022.9738754","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738754","url":null,"abstract":"This paper presents a lightweight algorithm for feature extraction, classification of seven different emotions, and facial expression recognition in a real-time manner based on static images of the human face. In this regard, a Multi-Layer Perceptron (MLP) neural network is trained based on the foregoing algorithm. In order to classify human faces, first, some pre-processing is applied to the input image, which can localize and cut out faces from it. In the next step, a facial landmark detection library is used, which can detect the landmarks of each face. Then, the human face is split into upper and lower faces, which enables the extraction of the desired features from each part. In the proposed model, both geometric and texture-based feature types are taken into account. After the feature extraction phase, a normalized vector of features is created. A 3-layer MLP is trained using these feature vectors, leading to 96% accuracy on the test set.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"2018 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128564129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Deep Curriculum Learning for PolSAR Image Classification 基于深度课程学习的PolSAR图像分类
Pub Date : 2021-12-26 DOI: 10.1109/MVIP53647.2022.9738781
Hamid Mousavi, M. Imani, H. Ghassemian
Following the great success of curriculum learning in the area of machine learning, a novel deep curriculum learning method proposed in this paper, entitled DCL, particularly for the classification of fully polarimetric synthetic aperture radar (PolSAR) data. This method utilizes the entropy-alpha target decomposition method to estimate the degree of complexity of each PolSAR image patch before applying it to the convolutional neural network (CNN). Also, an accumulative mini-batch pacing function is used to introduce more difficult patches to CNN. Experiments on the widely used data set of AIRSAR Flevoland reveal that the proposed curriculum learning method can not only increase classification accuracy but also lead to faster training convergence.
随着课程学习在机器学习领域的巨大成功,本文提出了一种新的深度课程学习方法,称为DCL,特别是用于全极化合成孔径雷达(PolSAR)数据的分类。该方法利用熵- α目标分解方法估计每个PolSAR图像patch的复杂程度,然后将其应用于卷积神经网络(CNN)。此外,还使用了一个累积的小批量起搏函数来为CNN引入更困难的补丁。在广泛使用的AIRSAR Flevoland数据集上的实验表明,本文提出的课程学习方法不仅可以提高分类精度,而且可以加快训练收敛速度。
{"title":"Deep Curriculum Learning for PolSAR Image Classification","authors":"Hamid Mousavi, M. Imani, H. Ghassemian","doi":"10.1109/MVIP53647.2022.9738781","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738781","url":null,"abstract":"Following the great success of curriculum learning in the area of machine learning, a novel deep curriculum learning method proposed in this paper, entitled DCL, particularly for the classification of fully polarimetric synthetic aperture radar (PolSAR) data. This method utilizes the entropy-alpha target decomposition method to estimate the degree of complexity of each PolSAR image patch before applying it to the convolutional neural network (CNN). Also, an accumulative mini-batch pacing function is used to introduce more difficult patches to CNN. Experiments on the widely used data set of AIRSAR Flevoland reveal that the proposed curriculum learning method can not only increase classification accuracy but also lead to faster training convergence.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129533035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards Fine-grained Image Classification with Generative Adversarial Networks and Facial Landmark Detection 基于生成对抗网络和面部地标检测的细粒度图像分类
Pub Date : 2021-08-28 DOI: 10.1109/MVIP53647.2022.9738759
Mahdieh Darvish, Mahsa Pouramini, H. Bahador
Fine-grained classification remains a challenging task because distinguishing categories needs learning complex and local differences. Diversity in the pose, scale, and position of objects in an image makes the problem even more difficult. Although the recent Vision Transformer models achieve high performance, they need an extensive volume of input data. To encounter this problem, we made the best use of GAN-based data augmentation to generate extra dataset instances. Oxford-IIIT Pets was our dataset of choice for this experiment. It consists of 37 breeds of cats and dogs with variations in scale, poses, and lighting, which intensifies the difficulty of the classification task. Furthermore, we enhanced the performance of the recent Generative Adversarial Network (GAN), StyleGAN2-ADA model to generate more realistic images while preventing overfitting to the training set. We did this by training a customized version of MobileNetV2 to predict animal facial landmarks; then, we cropped images accordingly. Lastly, we combined the synthetic images with the original dataset and compared our proposed method with standard GANs augmentation and no augmentation with different subsets of training data. We validated our work by evaluating the accuracy of fine-grained image classification on the recent Vision Transformer (ViT) Model. Code is available at: https://github.com/mahdi-darvish/GAN-augmented-pet-classifler
细粒度分类仍然是一项具有挑战性的任务,因为区分类别需要学习复杂的局部差异。图像中物体的姿态、比例和位置的多样性使问题变得更加困难。尽管最近的Vision Transformer模型实现了高性能,但它们需要大量的输入数据。为了解决这个问题,我们充分利用了基于gan的数据增强来生成额外的数据集实例。Oxford-IIIT Pets是我们选择的实验数据集。它由37个品种的猫和狗组成,在规模、姿势和光照方面都有所不同,这加大了分类任务的难度。此外,我们增强了最近的生成对抗网络(GAN), StyleGAN2-ADA模型的性能,以生成更逼真的图像,同时防止对训练集的过拟合。我们通过训练一个定制版本的MobileNetV2来预测动物的面部特征;然后,我们相应地裁剪图像。最后,我们将合成图像与原始数据集相结合,并对不同子集的训练数据与标准GANs增强和不增强进行了比较。我们通过评估最近的视觉变压器(Vision Transformer, ViT)模型上细粒度图像分类的准确性来验证我们的工作。代码可从https://github.com/mahdi-darvish/GAN-augmented-pet-classifler获得
{"title":"Towards Fine-grained Image Classification with Generative Adversarial Networks and Facial Landmark Detection","authors":"Mahdieh Darvish, Mahsa Pouramini, H. Bahador","doi":"10.1109/MVIP53647.2022.9738759","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738759","url":null,"abstract":"Fine-grained classification remains a challenging task because distinguishing categories needs learning complex and local differences. Diversity in the pose, scale, and position of objects in an image makes the problem even more difficult. Although the recent Vision Transformer models achieve high performance, they need an extensive volume of input data. To encounter this problem, we made the best use of GAN-based data augmentation to generate extra dataset instances. Oxford-IIIT Pets was our dataset of choice for this experiment. It consists of 37 breeds of cats and dogs with variations in scale, poses, and lighting, which intensifies the difficulty of the classification task. Furthermore, we enhanced the performance of the recent Generative Adversarial Network (GAN), StyleGAN2-ADA model to generate more realistic images while preventing overfitting to the training set. We did this by training a customized version of MobileNetV2 to predict animal facial landmarks; then, we cropped images accordingly. Lastly, we combined the synthetic images with the original dataset and compared our proposed method with standard GANs augmentation and no augmentation with different subsets of training data. We validated our work by evaluating the accuracy of fine-grained image classification on the recent Vision Transformer (ViT) Model. Code is available at: https://github.com/mahdi-darvish/GAN-augmented-pet-classifler","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132779953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exploring the Properties and Evolution of Neural Network Eigenspaces during Training 神经网络特征空间在训练过程中的性质与演化
Pub Date : 2021-06-17 DOI: 10.1109/MVIP53647.2022.9738741
Mats L. Richter, Leila Malihi, Anne-Kathrin Patricia Windler, U. Krumnack
We investigate properties and the evolution of the emergent inference process inside neural networks using layer saturation [1] and logistic regression probes [2]. We demonstrate that the difficulty of a problem, defined by the number of classes and complexity of the visual domain, as well as the number of parameters in neural network layers affect the predictive performance in an antagonistic manner. We further show that this relationship can be measured using saturation. This opens the possibility of detecting over- and under-parameterization of neural networks. We further show that the observed effects are independent of previously reported pathological patterns like the "tail pattern" described in [1]. Finally, we study the emergence of saturation patterns during training, showing that saturation patterns emerge early during training. This allows for early analysis and potentially increased cycle-time during experiments.
我们使用层饱和[1]和逻辑回归探针[2]来研究神经网络内部紧急推理过程的性质和进化。我们证明了一个问题的难度,由视觉域的类的数量和复杂性定义,以及神经网络层中参数的数量以对抗的方式影响预测性能。我们进一步表明,这种关系可以用饱和度来测量。这开启了检测神经网络参数化过度和参数化不足的可能性。我们进一步表明,观察到的效果独立于先前报道的病理模式,如[1]中描述的“尾巴模式”。最后,我们研究了饱和模式在训练过程中的出现,表明饱和模式在训练过程中出现得较早。这允许在实验期间进行早期分析和潜在地增加周期时间。
{"title":"Exploring the Properties and Evolution of Neural Network Eigenspaces during Training","authors":"Mats L. Richter, Leila Malihi, Anne-Kathrin Patricia Windler, U. Krumnack","doi":"10.1109/MVIP53647.2022.9738741","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738741","url":null,"abstract":"We investigate properties and the evolution of the emergent inference process inside neural networks using layer saturation [1] and logistic regression probes [2]. We demonstrate that the difficulty of a problem, defined by the number of classes and complexity of the visual domain, as well as the number of parameters in neural network layers affect the predictive performance in an antagonistic manner. We further show that this relationship can be measured using saturation. This opens the possibility of detecting over- and under-parameterization of neural networks. We further show that the observed effects are independent of previously reported pathological patterns like the \"tail pattern\" described in [1]. Finally, we study the emergence of saturation patterns during training, showing that saturation patterns emerge early during training. This allows for early analysis and potentially increased cycle-time during experiments.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121892395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Lip reading using external viseme decoding 唇读使用外部viseme解码
Pub Date : 2021-04-10 DOI: 10.1109/MVIP53647.2022.9738749
J. Peymanfard, M. R. Mohammadi, Hossein Zeinali, N. Mozayani
Lip-reading is the operation of recognizing speech from lip movements. This is a difficult task because the movements of the lips when pronouncing the words are similar for some of them. Viseme is used to describe lip movements during a conversation. This paper aims to show how to use external text data (for viseme-to-character mapping) by dividing video-to-character into two stages, namely converting video to viseme and then converting viseme to character by using separate models. Our proposed method improves word error rate by an absolute rate of 4% compared to the typical sequence to sequence lipreading model on the BBC-Oxford Lip Reading dataset (LRS2).
唇读是通过唇的运动来识别语言的操作。这是一项困难的任务,因为其中一些人在发音时嘴唇的动作是相似的。Viseme是用来形容谈话中嘴唇的动作。本文旨在通过将视频到字符分为两个阶段,即将视频转换为viseme,然后使用单独的模型将viseme转换为字符,来展示如何使用外部文本数据(用于viseme到字符的映射)。与BBC-Oxford唇读数据集(LRS2)上典型的序列对序列唇读模型相比,我们提出的方法将单词错误率提高了4%。
{"title":"Lip reading using external viseme decoding","authors":"J. Peymanfard, M. R. Mohammadi, Hossein Zeinali, N. Mozayani","doi":"10.1109/MVIP53647.2022.9738749","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738749","url":null,"abstract":"Lip-reading is the operation of recognizing speech from lip movements. This is a difficult task because the movements of the lips when pronouncing the words are similar for some of them. Viseme is used to describe lip movements during a conversation. This paper aims to show how to use external text data (for viseme-to-character mapping) by dividing video-to-character into two stages, namely converting video to viseme and then converting viseme to character by using separate models. Our proposed method improves word error rate by an absolute rate of 4% compared to the typical sequence to sequence lipreading model on the BBC-Oxford Lip Reading dataset (LRS2).","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130740669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2022 International Conference on Machine Vision and Image Processing (MVIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1