Kevin El Haddad, Yara Rizk, Louise Heron, Nadine Hajj, Yong Zhao, Jaebok Kim, Trung Ngo Trong, Minha Lee, Marwan Doumit, Payton Lin, Yelin Kim, Hüseyin Çakmak
{"title":"End-to-End Listening Agent for Audiovisual Emotional and Naturalistic Interactions","authors":"Kevin El Haddad, Yara Rizk, Louise Heron, Nadine Hajj, Yong Zhao, Jaebok Kim, Trung Ngo Trong, Minha Lee, Marwan Doumit, Payton Lin, Yelin Kim, Hüseyin Çakmak","doi":"10.7559/CITARJ.V10I2.424","DOIUrl":null,"url":null,"abstract":"In this work, we established the foundations of a framework with the goal to build an end-to-end naturalistic expressive listening agent. The project was split into modules for recognition of the user’s paralinguistic and nonverbal expressions, prediction of the agent’s reactions, synthesis of the agent’s expressions and data recordings of nonverbal conversation expressions. First, a multimodal multitask deep learning-based emotion classification system was built along with a rule-based visual expression detection system. Then several sequence prediction systems for nonverbal expressions were implemented and compared. Also, an audiovisual concatenation-based synthesis system was implemented. Finally, a naturalistic, dyadic emotional conversation database was collected. We report here the work made for each of these modules and our planned future improvements.","PeriodicalId":41151,"journal":{"name":"Journal of Science and Technology of the Arts","volume":"10 1","pages":"49-61"},"PeriodicalIF":0.2000,"publicationDate":"2018-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Science and Technology of the Arts","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7559/CITARJ.V10I2.424","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 3
Abstract
In this work, we established the foundations of a framework with the goal to build an end-to-end naturalistic expressive listening agent. The project was split into modules for recognition of the user’s paralinguistic and nonverbal expressions, prediction of the agent’s reactions, synthesis of the agent’s expressions and data recordings of nonverbal conversation expressions. First, a multimodal multitask deep learning-based emotion classification system was built along with a rule-based visual expression detection system. Then several sequence prediction systems for nonverbal expressions were implemented and compared. Also, an audiovisual concatenation-based synthesis system was implemented. Finally, a naturalistic, dyadic emotional conversation database was collected. We report here the work made for each of these modules and our planned future improvements.
期刊介绍:
The Journal of Science and Technology of the Arts (CITARJ) covers a wide range of topics related to the study and practice of Artistic work approached through Science and Technology, including: -Aesthetics of New Media- Audiovisual and Cinematic Art- Computer Music- Digital Arts - Digital Culture- Generative Art/Systems- Interactive Art - Interactive Multimedia- Interactive Sound- New Interfaces for Digital Expression- New Media Art- Tangible interfaces.