基于改进的 YOLOv8 方法的人脸实例分割和张口度分析

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Multimedia Systems Pub Date : 2024-09-11 DOI:10.1007/s00530-024-01472-z

Yuhe Fan, Lixun Zhang, Canxing Zheng, Xingyuan Wang, Jinghui Zhu, Lan Wang

{"title":"基于改进的 YOLOv8 方法的人脸实例分割和张口度分析","authors":"Yuhe Fan, Lixun Zhang, Canxing Zheng, Xingyuan Wang, Jinghui Zhu, Lan Wang","doi":"10.1007/s00530-024-01472-z","DOIUrl":null,"url":null,"abstract":"<p>Instance segmentation of faces and mouth-opening degrees is an important technology for meal-assisting robotics in food delivery safety. However, due to the diversity in in shape, color, and posture of faces and the mouth with small area contour, easy to deform, and occluded, it is challenging to real-time and accurate instance segmentation. In this paper, we proposed a novel method for instance segmentation of faces and mouth-opening degrees. Specifically, in backbone network, deformable convolution was introduced to enhance the ability to capture finer-grained spatial information and the CloFormer module was introduced to improve the ability to capture high-frequency local and low-frequency global information. In neck network, classical convolution and C2f modules are replaced by GSConv and VoV-GSCSP aggregation modules, respectively, to reduce the complexity and floating-point operations of models. Finally, in localization loss, CIOU loss was replaced by WIOU loss to reduce the competitiveness of high-quality anchor frames and mask the influence of low-quality samples, which in turn improves localization accuracy and generalization ability. It is abbreviated as the DCGW-YOLOv8n-seg model. The DCGW-YOLOv8n-seg model was compared with the baseline YOLOv8n-seg model and several state-of-the-art instance segmentation models on datasets, respectively. The results show that the DCGW-YOLOv8n-seg model is characterized by high accuracy, speed, robustness, and generalization ability. The effectiveness of each improvement in improving the model performance was verified by ablation experiments. Finally, the DCGW-YOLOv8n-seg model was applied to the instance segmentation experiment of meal-assisting robotics. The results show that the DCGW-YOLOv8n-seg model can better realize the instance segmentation effect of faces and mouth-opening degrees. The novel method proposed can provide a guiding theoretical basis for meal-assisting robotics in food delivery safety and can provide a reference value for computer vision and image instance segmentation.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"11 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method\",\"authors\":\"Yuhe Fan, Lixun Zhang, Canxing Zheng, Xingyuan Wang, Jinghui Zhu, Lan Wang\",\"doi\":\"10.1007/s00530-024-01472-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Instance segmentation of faces and mouth-opening degrees is an important technology for meal-assisting robotics in food delivery safety. However, due to the diversity in in shape, color, and posture of faces and the mouth with small area contour, easy to deform, and occluded, it is challenging to real-time and accurate instance segmentation. In this paper, we proposed a novel method for instance segmentation of faces and mouth-opening degrees. Specifically, in backbone network, deformable convolution was introduced to enhance the ability to capture finer-grained spatial information and the CloFormer module was introduced to improve the ability to capture high-frequency local and low-frequency global information. In neck network, classical convolution and C2f modules are replaced by GSConv and VoV-GSCSP aggregation modules, respectively, to reduce the complexity and floating-point operations of models. Finally, in localization loss, CIOU loss was replaced by WIOU loss to reduce the competitiveness of high-quality anchor frames and mask the influence of low-quality samples, which in turn improves localization accuracy and generalization ability. It is abbreviated as the DCGW-YOLOv8n-seg model. The DCGW-YOLOv8n-seg model was compared with the baseline YOLOv8n-seg model and several state-of-the-art instance segmentation models on datasets, respectively. The results show that the DCGW-YOLOv8n-seg model is characterized by high accuracy, speed, robustness, and generalization ability. The effectiveness of each improvement in improving the model performance was verified by ablation experiments. Finally, the DCGW-YOLOv8n-seg model was applied to the instance segmentation experiment of meal-assisting robotics. The results show that the DCGW-YOLOv8n-seg model can better realize the instance segmentation effect of faces and mouth-opening degrees. The novel method proposed can provide a guiding theoretical basis for meal-assisting robotics in food delivery safety and can provide a reference value for computer vision and image instance segmentation.</p>\",\"PeriodicalId\":51138,\"journal\":{\"name\":\"Multimedia Systems\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multimedia Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00530-024-01472-z\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01472-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

人脸和嘴巴张开度的实例分割是食品配送安全领域中助餐机器人的一项重要技术。然而，由于人脸的形状、颜色和姿态多种多样，且嘴部轮廓面积小、易变形、易遮挡，要实时准确地进行实例分割具有很大的挑战性。本文提出了一种新颖的人脸和张嘴度实例分割方法。具体来说，在骨干网络中，引入了可变形卷积，以增强捕捉更精细空间信息的能力；引入了 CloFormer 模块，以增强捕捉高频局部和低频全局信息的能力。在颈部网络中，经典的卷积和 C2f 模块分别被 GSConv 和 VoV-GSCSP 聚合模块取代，以降低模型的复杂性和浮点运算。最后，在定位损失方面，用 WIOU 损失代替 CIOU 损失，以降低高质量锚帧的竞争力，掩盖低质量样本的影响，从而提高定位精度和泛化能力。简称为 DCGW-YOLOv8n-seg 模型。研究人员分别将 DCGW-YOLOv8n-seg 模型与基准 YOLOv8n-seg 模型和几个最先进的实例分割模型在数据集上进行了比较。结果表明，DCGW-YOLOv8n-seg 模型具有精度高、速度快、鲁棒性强和泛化能力强的特点。消融实验验证了每种改进在提高模型性能方面的有效性。最后，将 DCGW-YOLOv8n-seg 模型应用于助餐机器人的实例分割实验。结果表明，DCGW-YOLOv8n-seg 模型能更好地实现人脸和张口度的实例分割效果。所提出的新方法可为助餐机器人在食品配送安全方面提供指导性理论依据，并可为计算机视觉和图像实例分割提供参考价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method

Instance segmentation of faces and mouth-opening degrees is an important technology for meal-assisting robotics in food delivery safety. However, due to the diversity in in shape, color, and posture of faces and the mouth with small area contour, easy to deform, and occluded, it is challenging to real-time and accurate instance segmentation. In this paper, we proposed a novel method for instance segmentation of faces and mouth-opening degrees. Specifically, in backbone network, deformable convolution was introduced to enhance the ability to capture finer-grained spatial information and the CloFormer module was introduced to improve the ability to capture high-frequency local and low-frequency global information. In neck network, classical convolution and C2f modules are replaced by GSConv and VoV-GSCSP aggregation modules, respectively, to reduce the complexity and floating-point operations of models. Finally, in localization loss, CIOU loss was replaced by WIOU loss to reduce the competitiveness of high-quality anchor frames and mask the influence of low-quality samples, which in turn improves localization accuracy and generalization ability. It is abbreviated as the DCGW-YOLOv8n-seg model. The DCGW-YOLOv8n-seg model was compared with the baseline YOLOv8n-seg model and several state-of-the-art instance segmentation models on datasets, respectively. The results show that the DCGW-YOLOv8n-seg model is characterized by high accuracy, speed, robustness, and generalization ability. The effectiveness of each improvement in improving the model performance was verified by ablation experiments. Finally, the DCGW-YOLOv8n-seg model was applied to the instance segmentation experiment of meal-assisting robotics. The results show that the DCGW-YOLOv8n-seg model can better realize the instance segmentation effect of faces and mouth-opening degrees. The novel method proposed can provide a guiding theoretical basis for meal-assisting robotics in food delivery safety and can provide a reference value for computer vision and image instance segmentation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.