Pub Date : 2024-04-14DOI: 10.1109/icassp48485.2024.10446379
Jiyuan Liu, Wenping Wei, Zhendong Li, Guanfeng Li, Hao Liu
{"title":"Invariant Motion Representation Learning for 3D Talking Face Synthesis","authors":"Jiyuan Liu, Wenping Wei, Zhendong Li, Guanfeng Li, Hao Liu","doi":"10.1109/icassp48485.2024.10446379","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10446379","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"66 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140705107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Speech Emotion Recognition (SER) has garnered significant attention over the past two decades. In the early stages of SER technology, ’brute force’-based techniques led to a significant expansion in knowledge-based acoustic feature representation (FR) for modeling sparse emotional data. However, as deep learning techniques have become more powerful, their direct application has been limited by the scarcity of well-annotated emotional data. As a result, pre-trained neural embeddings on large speech corpora have gained popularity for SER tasks. These embeddings leverage existing transfer learning methods suitable for general-purpose self-supervised learning (SSL) representations. Recent studies on downstream SSL techniques for dimensional SER have shown promising results. In this research, we aim to evaluate the emotion-discriminative characteristics of neural embeddings in general cases (out-of-domain) and when fine-tuned for SER (in-domain). Given that most SSL techniques are pre-trained primarily on English speech, we plan to use speech emotion corpora in both language-matched and mismatched conditions. We will assess the discriminative characteristics of both handcrafted and standalone neural embeddings as FRs.
过去二十年来,语音情感识别(SER)备受关注。在 SER 技术的早期阶段,基于 "蛮力 "的技术极大地扩展了基于知识的声学特征表示(FR),用于对稀疏的情感数据建模。然而,随着深度学习技术变得越来越强大,它们的直接应用却因缺乏完善标注的情感数据而受到限制。因此,在 SER 任务中,在大型语音语料库中进行预训练的神经嵌入越来越受欢迎。这些嵌入利用现有的转移学习方法,适用于通用的自我监督学习(SSL)表征。最近对用于维度 SER 的下游 SSL 技术的研究显示出了良好的效果。在这项研究中,我们旨在评估神经嵌入在一般情况下(域外)和针对 SER 进行微调时(域内)的情感鉴别特性。鉴于大多数 SSL 技术主要是在英语语音基础上进行预训练的,我们计划在语言匹配和不匹配条件下使用语音情感语料库。我们将评估手工制作的神经嵌入和独立神经嵌入作为 FR 的判别特性。
{"title":"Comparing data-Driven and Handcrafted Features for Dimensional Emotion Recognition","authors":"Bogdan Vlasenko, Sargam Vyas, Mathew Magimai.-Doss","doi":"10.1109/icassp48485.2024.10446134","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10446134","url":null,"abstract":"Speech Emotion Recognition (SER) has garnered significant attention over the past two decades. In the early stages of SER technology, ’brute force’-based techniques led to a significant expansion in knowledge-based acoustic feature representation (FR) for modeling sparse emotional data. However, as deep learning techniques have become more powerful, their direct application has been limited by the scarcity of well-annotated emotional data. As a result, pre-trained neural embeddings on large speech corpora have gained popularity for SER tasks. These embeddings leverage existing transfer learning methods suitable for general-purpose self-supervised learning (SSL) representations. Recent studies on downstream SSL techniques for dimensional SER have shown promising results. In this research, we aim to evaluate the emotion-discriminative characteristics of neural embeddings in general cases (out-of-domain) and when fine-tuned for SER (in-domain). Given that most SSL techniques are pre-trained primarily on English speech, we plan to use speech emotion corpora in both language-matched and mismatched conditions. We will assess the discriminative characteristics of both handcrafted and standalone neural embeddings as FRs.","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"62 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140704868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-14DOI: 10.1109/icassp48485.2024.10446544
Riku Arakawa, Mathieu Parvaix, Chiong Lai, Hakan Erdogan, Alex Olwal
{"title":"Quantifying The Effect Of Simulator-Based Data Augmentation For Speech Recognition On Augmented Reality Glasses","authors":"Riku Arakawa, Mathieu Parvaix, Chiong Lai, Hakan Erdogan, Alex Olwal","doi":"10.1109/icassp48485.2024.10446544","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10446544","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140706157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-14DOI: 10.1109/icassp48485.2024.10446677
Pengwei Yin, Jingjing Wang, Jiawu Dai, Xiaojun Wu
{"title":"NERF-GAZE: A Head-Eye Redirection Parametric Model for Gaze Estimation","authors":"Pengwei Yin, Jingjing Wang, Jiawu Dai, Xiaojun Wu","doi":"10.1109/icassp48485.2024.10446677","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10446677","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"206 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140704675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-14DOI: 10.1109/icassp48485.2024.10446247
Jiaxu Wang, Bo Xu, Hao Cheng, Renjing Xu
{"title":"DONE: Dynamic Neural Representation Via Hyperplane Neural ODE","authors":"Jiaxu Wang, Bo Xu, Hao Cheng, Renjing Xu","doi":"10.1109/icassp48485.2024.10446247","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10446247","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"200 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140704705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}