{"title":"在基于语音的 \"人在回路 \"系统中用大型语言模型取代人类","authors":"Shih-Hong Huang, Ting-Hao 'Kenneth' Huang","doi":"10.1609/aaaiss.v3i1.31178","DOIUrl":null,"url":null,"abstract":"It is easy to assume that Large Language Models (LLMs) will seamlessly take over applications, especially those that are largely automated. In the case of conversational voice assistants, commercial systems have been widely deployed and used over the past decade. However, are we indeed on the cusp of the future we envisioned? There exists a social-technical gap between what people want to accomplish and the actual capability of technology. In this paper, we present a case study comparing two voice assistants built on Amazon Alexa: one employing a human-in-the-loop workflow, the other utilizes LLM to engage in conversations with users. In our comparison, we discovered that the issues arising in current human-in-the-loop and LLM systems are not identical. However, the presence of a set of similar issues in both systems leads us to believe that focusing on the interaction between users and systems is crucial, perhaps even more so than focusing solely on the underlying technology itself. Merely enhancing the performance of the workers or the models may not adequately address these issues. This observation prompts our research question: What are the overlooked contributing factors in the effort to improve the capabilities of voice assistants, which might not have been emphasized in prior research?","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"45 10","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On Replacing Humans with Large Language Models in Voice-Based Human-in-the-Loop Systems\",\"authors\":\"Shih-Hong Huang, Ting-Hao 'Kenneth' Huang\",\"doi\":\"10.1609/aaaiss.v3i1.31178\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is easy to assume that Large Language Models (LLMs) will seamlessly take over applications, especially those that are largely automated. In the case of conversational voice assistants, commercial systems have been widely deployed and used over the past decade. However, are we indeed on the cusp of the future we envisioned? There exists a social-technical gap between what people want to accomplish and the actual capability of technology. In this paper, we present a case study comparing two voice assistants built on Amazon Alexa: one employing a human-in-the-loop workflow, the other utilizes LLM to engage in conversations with users. In our comparison, we discovered that the issues arising in current human-in-the-loop and LLM systems are not identical. However, the presence of a set of similar issues in both systems leads us to believe that focusing on the interaction between users and systems is crucial, perhaps even more so than focusing solely on the underlying technology itself. Merely enhancing the performance of the workers or the models may not adequately address these issues. This observation prompts our research question: What are the overlooked contributing factors in the effort to improve the capabilities of voice assistants, which might not have been emphasized in prior research?\",\"PeriodicalId\":516827,\"journal\":{\"name\":\"Proceedings of the AAAI Symposium Series\",\"volume\":\"45 10\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the AAAI Symposium Series\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/aaaiss.v3i1.31178\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Symposium Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaaiss.v3i1.31178","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On Replacing Humans with Large Language Models in Voice-Based Human-in-the-Loop Systems
It is easy to assume that Large Language Models (LLMs) will seamlessly take over applications, especially those that are largely automated. In the case of conversational voice assistants, commercial systems have been widely deployed and used over the past decade. However, are we indeed on the cusp of the future we envisioned? There exists a social-technical gap between what people want to accomplish and the actual capability of technology. In this paper, we present a case study comparing two voice assistants built on Amazon Alexa: one employing a human-in-the-loop workflow, the other utilizes LLM to engage in conversations with users. In our comparison, we discovered that the issues arising in current human-in-the-loop and LLM systems are not identical. However, the presence of a set of similar issues in both systems leads us to believe that focusing on the interaction between users and systems is crucial, perhaps even more so than focusing solely on the underlying technology itself. Merely enhancing the performance of the workers or the models may not adequately address these issues. This observation prompts our research question: What are the overlooked contributing factors in the effort to improve the capabilities of voice assistants, which might not have been emphasized in prior research?