Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues.

CEUR workshop proceedings Pub Date : 2024-02-01

Maneesh Bilalpur, Mert Inan, Dorsa Zeinali, Jeffrey F Cohn, Malihe Alikhani

{"title":"Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues.","authors":"Maneesh Bilalpur, Mert Inan, Dorsa Zeinali, Jeffrey F Cohn, Malihe Alikhani","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Addressing the critical shortage of mental health resources for effective screening, diagnosis, and treatment remains a significant challenge. This scarcity underscores the need for innovative solutions, particularly in enhancing the accessibility and efficacy of therapeutic support. Embodied agents with advanced interactive capabilities emerge as a promising and cost-effective supplement to traditional caregiving methods. Crucial to these agents' effectiveness is their ability to simulate non-verbal behaviors, like backchannels, that are pivotal in establishing rapport and understanding in therapeutic contexts but remain under-explored. To improve the rapport-building capabilities of embodied agents we annotated backchannel smiles in videos of intimate face-to-face conversations over topics such as mental health, illness, and relationships. We hypothesized that both speaker and listener behaviors affect the duration and intensity of backchannel smiles. Using cues from speech prosody and language along with the demographics of the speaker and listener, we found them to contain significant predictors of the intensity of backchannel smiles. Based on our findings, we introduce backchannel smile production in embodied agents as a generation problem. Our attention-based generative model suggests that listener information offers performance improvements over the baseline speaker-centric generation approach. Conditioned generation using the significant predictors of smile intensity provides statistically significant improvements in empirical measures of generation quality. Our user study by transferring generated smiles to an embodied agent suggests that agent with backchannel smiles is perceived to be more human-like and is an attractive alternative for non-personal conversations over agent without backchannel smiles.</p>","PeriodicalId":72554,"journal":{"name":"CEUR workshop proceedings","volume":"3649 ","pages":"12-22"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11608428/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CEUR workshop proceedings","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Addressing the critical shortage of mental health resources for effective screening, diagnosis, and treatment remains a significant challenge. This scarcity underscores the need for innovative solutions, particularly in enhancing the accessibility and efficacy of therapeutic support. Embodied agents with advanced interactive capabilities emerge as a promising and cost-effective supplement to traditional caregiving methods. Crucial to these agents' effectiveness is their ability to simulate non-verbal behaviors, like backchannels, that are pivotal in establishing rapport and understanding in therapeutic contexts but remain under-explored. To improve the rapport-building capabilities of embodied agents we annotated backchannel smiles in videos of intimate face-to-face conversations over topics such as mental health, illness, and relationships. We hypothesized that both speaker and listener behaviors affect the duration and intensity of backchannel smiles. Using cues from speech prosody and language along with the demographics of the speaker and listener, we found them to contain significant predictors of the intensity of backchannel smiles. Based on our findings, we introduce backchannel smile production in embodied agents as a generation problem. Our attention-based generative model suggests that listener information offers performance improvements over the baseline speaker-centric generation approach. Conditioned generation using the significant predictors of smile intensity provides statistically significant improvements in empirical measures of generation quality. Our user study by transferring generated smiles to an embodied agent suggests that agent with backchannel smiles is perceived to be more human-like and is an attractive alternative for non-personal conversations over agent without backchannel smiles.

微信好友朋友圈 QQ好友复制链接

本刊更多论文

学习在心理健康对话中为具身AI代理生成上下文敏感的反向通道微笑。

解决有效筛查、诊断和治疗精神卫生资源严重短缺的问题仍然是一项重大挑战。这种稀缺性强调需要创新的解决办法，特别是在提高治疗支持的可及性和有效性方面。具身代理具有先进的交互能力，是传统护理方法的一种有前途和经济效益的补充。对这些药物的有效性至关重要的是它们模拟非语言行为的能力，比如反向渠道，这对于在治疗环境中建立融洽关系和理解至关重要，但仍未得到充分探索。为了提高具身代理建立融洽关系的能力，我们在关于心理健康、疾病和人际关系等话题的亲密面对面对话视频中注释了反向通道微笑。我们假设说话者和听者的行为都会影响反向微笑的持续时间和强度。通过使用语音韵律和语言的线索以及说话者和听者的人口统计数据，我们发现它们包含了反向通道微笑强度的重要预测因子。基于我们的发现，我们将隐含代理中的反向微笑产生作为一个生成问题引入。我们基于注意力的生成模型表明，与以说话者为中心的生成方法相比，听者信息提供了性能改进。使用微笑强度的显著预测因子的条件生成在生成质量的经验测量中提供了统计上显著的改进。我们通过将生成的微笑转移到一个具身代理的用户研究表明，具有反向通道微笑的代理被认为更像人类，并且与没有反向通道微笑的代理相比，是非个人对话的一个有吸引力的选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

CEUR workshop proceedings

CiteScore

1.10

自引率

0.00%

发文量