DEEP LEARNING TECHNOLOGY FOR VIDEOFRAME PROCESSING IN FACE SEGMENTATION ON MOBILE DEVICES

Herald of Advanced Information Technology Pub Date : 2021-06-30 DOI:10.15276/hait.02.2021.7

V. Ruvinskaya, Yurii Yu. Timkov

{"title":"DEEP LEARNING TECHNOLOGY FOR VIDEOFRAME PROCESSING IN FACE SEGMENTATION ON MOBILE DEVICES","authors":"V. Ruvinskaya, Yurii Yu. Timkov","doi":"10.15276/hait.02.2021.7","DOIUrl":null,"url":null,"abstract":"The aim of the research is to reduce the frame processing time for face segmentation on videos on mobile devices using deep learning technologies. The paper analyzes the advantages and disadvantages of existing segmentation methods, as well as their applicability to various tasks. The existing real-time realizations of face segmentation in the most popular mobile applications, which provide the functionality for adding visual effects to videos, were compared. As a result, it was determined that the classical segmentation methods do not have a suitable combination of accuracy and speed, and require manual tuning for a particular task, while the neural network-based segmentation methods determine the deep features automatically and have high accuracy with an acceptable speed. The method based on convolutional neural networks is chosen for use because, in addition to the advantages of other methods based on neural networks, it does not require such a significant amount of computing resources during its execution. A review of existing convolutional neural networks for segmentation was held, based on which the DeepLabV3+ network was chosen as having sufficiently high accuracy and being optimized for work on mobile devices. Modifications were made to the structure of the selected network to match the task of two classes segmentation and to speed up the work on devices with low performance. 8-bit quantization was applied to the values processed by the network for further acceleration. The network was adapted to the task of face segmentation by transfer learning performed on a set of face images from the COCO dataset. Based on the modified and additionally trained segmentation model, a mobile app was created to record video with real-time visual effects, which applies segmentation to separately add effects on two zones - the face (color filters, brightness adjustment, animated effects) and the background (blurring, hiding, replacement with another image). The time of frames processing in the application was tested on mobile devices with different technical characteristics. We analyzed the differences in testing results for segmentation using the obtained model and segmentation using the normalized cuts method. The comparison reveals a decrease of frame processing time on the majority of devices with a slight decrease of segmentation accuracy.","PeriodicalId":375628,"journal":{"name":"Herald of Advanced Information Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Herald of Advanced Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15276/hait.02.2021.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The aim of the research is to reduce the frame processing time for face segmentation on videos on mobile devices using deep learning technologies. The paper analyzes the advantages and disadvantages of existing segmentation methods, as well as their applicability to various tasks. The existing real-time realizations of face segmentation in the most popular mobile applications, which provide the functionality for adding visual effects to videos, were compared. As a result, it was determined that the classical segmentation methods do not have a suitable combination of accuracy and speed, and require manual tuning for a particular task, while the neural network-based segmentation methods determine the deep features automatically and have high accuracy with an acceptable speed. The method based on convolutional neural networks is chosen for use because, in addition to the advantages of other methods based on neural networks, it does not require such a significant amount of computing resources during its execution. A review of existing convolutional neural networks for segmentation was held, based on which the DeepLabV3+ network was chosen as having sufficiently high accuracy and being optimized for work on mobile devices. Modifications were made to the structure of the selected network to match the task of two classes segmentation and to speed up the work on devices with low performance. 8-bit quantization was applied to the values processed by the network for further acceleration. The network was adapted to the task of face segmentation by transfer learning performed on a set of face images from the COCO dataset. Based on the modified and additionally trained segmentation model, a mobile app was created to record video with real-time visual effects, which applies segmentation to separately add effects on two zones - the face (color filters, brightness adjustment, animated effects) and the background (blurring, hiding, replacement with another image). The time of frames processing in the application was tested on mobile devices with different technical characteristics. We analyzed the differences in testing results for segmentation using the obtained model and segmentation using the normalized cuts method. The comparison reveals a decrease of frame processing time on the majority of devices with a slight decrease of segmentation accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

深度学习技术在移动设备上的人脸分割视频帧处理

本研究的目的是利用深度学习技术减少移动设备上视频人脸分割的帧处理时间。本文分析了现有分割方法的优缺点，以及它们对各种任务的适用性。比较了目前最流行的移动应用程序中实时实现的人脸分割，这些应用程序提供了为视频添加视觉效果的功能。结果表明，传统的分割方法在精度和速度上没有很好的结合，需要针对特定的任务进行人工调整，而基于神经网络的分割方法可以自动确定深度特征，具有较高的精度和可接受的速度。选择基于卷积神经网络的方法，除了具有其他基于神经网络的方法的优点外，在执行过程中不需要如此大量的计算资源。对现有的用于分割的卷积神经网络进行了回顾，在此基础上选择了DeepLabV3+网络，因为它具有足够高的精度，并且针对移动设备进行了优化。对选择的网络结构进行修改，以匹配两类分割的任务，并在性能较低的设备上加快工作速度。对网络处理的值进行8位量化，进一步加速。通过对来自COCO数据集的一组人脸图像进行迁移学习，使该网络适应人脸分割任务。基于修改后的和额外训练的分割模型，制作了一个实时视觉效果视频的移动应用程序，该应用程序通过分割分别在两个区域添加效果，分别是脸部(色彩滤镜、亮度调整、动画效果)和背景(模糊、隐藏、替换另一幅图像)。在具有不同技术特性的移动设备上测试了应用程序的帧处理时间。我们分析了使用得到的模型和使用归一化切割方法进行分割的测试结果的差异。对比显示，在大多数设备上，帧处理时间减少，分割精度略有下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Herald of Advanced Information Technology

自引率

0.00%

发文量

期刊最新文献

Method of reliability control of thermoelectric systems to ensure thermal regimes Reaching consensus in group recommendation systems Modeling and forecasting of stock market processes The use of augmented reality for renovation of cultural heritage sites Assessment of the quality of neural network models based on a multifactorial information criterion