{"title":"Evaluating Human-AI Hybrid Conversational Systems with Chatbot Message Suggestions","authors":"Zihan Gao, Jiepu Jiang","doi":"10.1145/3459637.3482340","DOIUrl":null,"url":null,"abstract":"AI chatbots can offer suggestions to help humans answer questions by reducing text entry effort and providing relevant knowledge for unfamiliar questions. We study whether chatbot suggestions can help people answer knowledge-demanding questions in a conversation and influence response quality and efficiency. We conducted a large-scale crowdsourcing user study and evaluated 20 hybrid system variants and a human-only baseline. The hybrid systems used four chatbots of varied response quality and differed in the number of suggestions and whether to preset the message box with top suggestions. Experimental results show that chatbot suggestions---even using poor-performing chatbots---have consistently improved response efficiency. Compared with the human-only setting, hybrid systems have reduced response time by 12%--35% and keystrokes by 33%--60%, and users have adopted a suggestion for the final response without any changes in 44%--68% of the cases. In contrast, crowd workers in the human-only setting typed most of the response texts and copied 5% of the answers from other sites. However, we also found that chatbot suggestions did not always help response quality. Specifically, in hybrid systems equipped with poor-performing chatbots, users responded with lower-quality answers than others in the human-only setting. It seems that users would not simply ignore poor suggestions and compose responses as they could without seeing the suggestions. Besides, presetting the message box has improved reply efficiency without hurting response quality. We did not find that showing more suggestions helps or hurts response quality or efficiency consistently. Our study reveals how and when AI chatbot suggestions can help people answer questions in hybrid conversational systems.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459637.3482340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
AI chatbots can offer suggestions to help humans answer questions by reducing text entry effort and providing relevant knowledge for unfamiliar questions. We study whether chatbot suggestions can help people answer knowledge-demanding questions in a conversation and influence response quality and efficiency. We conducted a large-scale crowdsourcing user study and evaluated 20 hybrid system variants and a human-only baseline. The hybrid systems used four chatbots of varied response quality and differed in the number of suggestions and whether to preset the message box with top suggestions. Experimental results show that chatbot suggestions---even using poor-performing chatbots---have consistently improved response efficiency. Compared with the human-only setting, hybrid systems have reduced response time by 12%--35% and keystrokes by 33%--60%, and users have adopted a suggestion for the final response without any changes in 44%--68% of the cases. In contrast, crowd workers in the human-only setting typed most of the response texts and copied 5% of the answers from other sites. However, we also found that chatbot suggestions did not always help response quality. Specifically, in hybrid systems equipped with poor-performing chatbots, users responded with lower-quality answers than others in the human-only setting. It seems that users would not simply ignore poor suggestions and compose responses as they could without seeing the suggestions. Besides, presetting the message box has improved reply efficiency without hurting response quality. We did not find that showing more suggestions helps or hurts response quality or efficiency consistently. Our study reveals how and when AI chatbot suggestions can help people answer questions in hybrid conversational systems.