Video Face Tracking for IoT Big Data using Improved Swin Transformer based CSA Model

Journal of Machine and Computing Pub Date : 2024-04-05 DOI:10.53759/7669/jmc202404029

Anbumani K, Cuddapah Anitha, Achuta Rao S V, Praveen Kumar K, Meganathan Ramasamy, M. R.

{"title":"Video Face Tracking for IoT Big Data using Improved Swin Transformer based CSA Model","authors":"Anbumani K, Cuddapah Anitha, Achuta Rao S V, Praveen Kumar K, Meganathan Ramasamy, M. R.","doi":"10.53759/7669/jmc202404029","DOIUrl":null,"url":null,"abstract":"Even though Convolutional Neural Networks (CNNs) have greatly improved face-related algorithms, it is still difficult to keep both accuracy and efficiency in real-world applications. The most cutting-edge approaches use deeper networks to improve performance, but the increased computing complexity and number of parameters make them impractical for usage in mobile applications. To tackle these issues, this article presents a model for object detection that combines Deeplabv3+ with Swin transformer, which incorporates GLTB and Swin-Conv-Dspp (SCD). To start with, in order to lessen the impact of the hole phenomena and the loss of fine-grained data, we employ the SCD component, which is capable of efficiently extracting feature information from objects at various sizes. Secondly, in order to properly address the issue of challenging object recognition due to occlusion, the study builds a GLTB with a spatial pyramid pooling shuffle module. This module allows for the extraction of important detail information from the few noticeable pixels of the blocked objects. Crocodile search algorithm (CSA) enhances classification accuracy by properly selecting the model's fine-tuning. On a benchmark dataset known as WFLW, the study experimentally validates the suggested model. Compared to other light models, the experimental findings show that it delivers higher performance with significantly fewer parameters and reduced computing complexity.","PeriodicalId":516221,"journal":{"name":"Journal of Machine and Computing","volume":"69 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Machine and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.53759/7669/jmc202404029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Even though Convolutional Neural Networks (CNNs) have greatly improved face-related algorithms, it is still difficult to keep both accuracy and efficiency in real-world applications. The most cutting-edge approaches use deeper networks to improve performance, but the increased computing complexity and number of parameters make them impractical for usage in mobile applications. To tackle these issues, this article presents a model for object detection that combines Deeplabv3+ with Swin transformer, which incorporates GLTB and Swin-Conv-Dspp (SCD). To start with, in order to lessen the impact of the hole phenomena and the loss of fine-grained data, we employ the SCD component, which is capable of efficiently extracting feature information from objects at various sizes. Secondly, in order to properly address the issue of challenging object recognition due to occlusion, the study builds a GLTB with a spatial pyramid pooling shuffle module. This module allows for the extraction of important detail information from the few noticeable pixels of the blocked objects. Crocodile search algorithm (CSA) enhances classification accuracy by properly selecting the model's fine-tuning. On a benchmark dataset known as WFLW, the study experimentally validates the suggested model. Compared to other light models, the experimental findings show that it delivers higher performance with significantly fewer parameters and reduced computing complexity.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用基于 CSA 模型的改进型斯温变换器进行物联网大数据视频人脸跟踪

尽管卷积神经网络（CNN）极大地改进了与人脸有关的算法，但在实际应用中仍难以保证准确性和效率。最前沿的方法使用深度网络来提高性能，但计算复杂度和参数数量的增加使其在移动应用中的使用变得不切实际。为了解决这些问题，本文提出了一种结合 Deeplabv3+ 和 Swin 变换器的物体检测模型，其中包含 GLTB 和 Swin-Conv-Dspp (SCD)。首先，为了减少洞穴现象的影响和细粒度数据的丢失，我们采用了 SCD 组件，它能够有效地提取各种大小物体的特征信息。其次，为了妥善解决因遮挡而对物体识别造成挑战的问题，本研究建立了一个带有空间金字塔池化洗牌模块的 GLTB。该模块可以从被遮挡物体的少数明显像素中提取重要的细节信息。鳄鱼搜索算法（CSA）通过适当选择模型的微调来提高分类精度。本研究在一个名为 WFLW 的基准数据集上对所建议的模型进行了实验验证。实验结果表明，与其他轻型模型相比，该模型以更少的参数和更低的计算复杂度实现了更高的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Machine and Computing

自引率

0.00%

发文量