M. Aylett, Andrea Carmantini, Chris Pidcock, Eric Nichols, Randy Gomez
{"title":"A Pilot Evaluation of a Conversational Listener for Conversational User Interfaces","authors":"M. Aylett, Andrea Carmantini, Chris Pidcock, Eric Nichols, Randy Gomez","doi":"10.1145/3571884.3605871","DOIUrl":null,"url":null,"abstract":"Current spoken conversational user interfaces (CUIs) are predominantly implemented using a sequential, utterance based, two-party, speak-wait/speak-wait approach. Human-human conversation 1) is not sequential, with overlap, interruption and back channels; 2) processes utterances before they are complete and 3) are often multi-party. As part of Honda Research Institute’s Haru project a light weight word spotting speech recognition system - A conversational listener - was implemented to allow very fast turn-taking in simple voice interaction conditions. In this paper, we present a pilot evaluation of the conversational listener in a script follower context (which allows a robot to act out a dialog with a user). We compare a disembodied version of the system with expressive synthesis to Alexa with and without fast turn-taking. Qualitative results indicate that users were sensitive to turn-taking delay and characterful speech synthesis.","PeriodicalId":127379,"journal":{"name":"Proceedings of the 5th International Conference on Conversational User Interfaces","volume":" 42","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Conversational User Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3571884.3605871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Current spoken conversational user interfaces (CUIs) are predominantly implemented using a sequential, utterance based, two-party, speak-wait/speak-wait approach. Human-human conversation 1) is not sequential, with overlap, interruption and back channels; 2) processes utterances before they are complete and 3) are often multi-party. As part of Honda Research Institute’s Haru project a light weight word spotting speech recognition system - A conversational listener - was implemented to allow very fast turn-taking in simple voice interaction conditions. In this paper, we present a pilot evaluation of the conversational listener in a script follower context (which allows a robot to act out a dialog with a user). We compare a disembodied version of the system with expressive synthesis to Alexa with and without fast turn-taking. Qualitative results indicate that users were sensitive to turn-taking delay and characterful speech synthesis.