从屏障功能看 "会说话的头 "的皮肤弹性

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Cognitive Computation Pub Date : 2024-08-24 DOI:10.1007/s12559-024-10344-7

Iti Chaturvedi, Vlad Pandelea, Erik Cambria, Roy Welsch, Bithin Datta

{"title":"从屏障功能看 \"会说话的头 \"的皮肤弹性","authors":"Iti Chaturvedi, Vlad Pandelea, Erik Cambria, Roy Welsch, Bithin Datta","doi":"10.1007/s12559-024-10344-7","DOIUrl":null,"url":null,"abstract":"<p>In this paper, we target the problem of generating facial expressions from a piece of audio. This is challenging since both audio and video have inherent characteristics that are distinct from the other. Some words may have identical lip movements, and speech impediments may prevent lip-reading in some individuals. Previous approaches to generating such a talking head suffered from stiff expressions. This is because they focused only on lip movements and the facial landmarks did not contain the information flow from the audio. Hence, in this work, we employ spatio-temporal independent component analysis to accurately sync the audio with the corresponding face video. Proper word formation also requires control over the face muscles that can be captured using a barrier function. We first validated the approach on the diffusion of salt water in coastal areas using a synthetic finite element simulation. Next, we applied it to 3D facial expressions in toddlers for which training data is difficult to capture. Prior knowledge in the form of rules is specified using Fuzzy logic, and multi-objective optimization is used to collectively learn a set of rules. We observed significantly higher F-measure on three real-world problems.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"116 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Barrier Function to Skin Elasticity in Talking Head\",\"authors\":\"Iti Chaturvedi, Vlad Pandelea, Erik Cambria, Roy Welsch, Bithin Datta\",\"doi\":\"10.1007/s12559-024-10344-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In this paper, we target the problem of generating facial expressions from a piece of audio. This is challenging since both audio and video have inherent characteristics that are distinct from the other. Some words may have identical lip movements, and speech impediments may prevent lip-reading in some individuals. Previous approaches to generating such a talking head suffered from stiff expressions. This is because they focused only on lip movements and the facial landmarks did not contain the information flow from the audio. Hence, in this work, we employ spatio-temporal independent component analysis to accurately sync the audio with the corresponding face video. Proper word formation also requires control over the face muscles that can be captured using a barrier function. We first validated the approach on the diffusion of salt water in coastal areas using a synthetic finite element simulation. Next, we applied it to 3D facial expressions in toddlers for which training data is difficult to capture. Prior knowledge in the form of rules is specified using Fuzzy logic, and multi-objective optimization is used to collectively learn a set of rules. We observed significantly higher F-measure on three real-world problems.</p>\",\"PeriodicalId\":51243,\"journal\":{\"name\":\"Cognitive Computation\",\"volume\":\"116 1\",\"pages\":\"\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s12559-024-10344-7\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12559-024-10344-7","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们的目标是解决从一段音频生成面部表情的问题。这是一个具有挑战性的问题，因为音频和视频都具有不同的固有特征。有些单词的唇部动作可能完全相同，有些人可能因为语言障碍而无法读唇。以前生成这种 "话头 "的方法存在表情僵硬的问题。这是因为它们只关注唇部动作，而面部地标并不包含音频信息流。因此，在这项工作中，我们采用了时空独立分量分析法，以准确同步音频和相应的面部视频。正确的构词还需要对面部肌肉的控制，这可以通过障碍函数来捕捉。我们首先利用合成有限元模拟对沿海地区的盐水扩散进行了验证。接着，我们将其应用于难以获取训练数据的幼儿三维面部表情。先验知识以规则的形式使用模糊逻辑进行指定，并使用多目标优化来共同学习一组规则。我们在三个实际问题上观察到了明显更高的 F-measure。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Barrier Function to Skin Elasticity in Talking Head

In this paper, we target the problem of generating facial expressions from a piece of audio. This is challenging since both audio and video have inherent characteristics that are distinct from the other. Some words may have identical lip movements, and speech impediments may prevent lip-reading in some individuals. Previous approaches to generating such a talking head suffered from stiff expressions. This is because they focused only on lip movements and the facial landmarks did not contain the information flow from the audio. Hence, in this work, we employ spatio-temporal independent component analysis to accurately sync the audio with the corresponding face video. Proper word formation also requires control over the face muscles that can be captured using a barrier function. We first validated the approach on the diffusion of salt water in coastal areas using a synthetic finite element simulation. Next, we applied it to 3D facial expressions in toddlers for which training data is difficult to capture. Prior knowledge in the form of rules is specified using Fuzzy logic, and multi-objective optimization is used to collectively learn a set of rules. We observed significantly higher F-measure on three real-world problems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cognitive Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-NEUROSCIENCES

CiteScore

9.30

自引率

3.70%

发文量

116

审稿时长

>12 weeks

期刊介绍： Cognitive Computation is an international, peer-reviewed, interdisciplinary journal that publishes cutting-edge articles describing original basic and applied work involving biologically-inspired computational accounts of all aspects of natural and artificial cognitive systems. It provides a new platform for the dissemination of research, current practices and future trends in the emerging discipline of cognitive computation that bridges the gap between life sciences, social sciences, engineering, physical and mathematical sciences, and humanities.