学习连续的言语和动作之间的对应关系

Proceedings. The 4nd International Conference on Development and Learning, 2005. Pub Date : 2005-07-19 DOI:10.1109/DEVLRN.2005.1490983

O. Natsuki, N. Arata, I. Yoshiaki

{"title":"学习连续的言语和动作之间的对应关系","authors":"O. Natsuki, N. Arata, I. Yoshiaki","doi":"10.1109/DEVLRN.2005.1490983","DOIUrl":null,"url":null,"abstract":"Summary form only given. Roy (1999) developed a computational model of early lexical learning to address three questions: First, how do infants discover linguistic units? Second, how do they learn perceptually-grounded semantic categories? And third, how do they learn to associate linguistic units with appropriate semantic categories? His model coupled speech recordings with static images of objects, and acquired a lexicon of shape names. Kaplan et al. (2001) presented a model for teaching names of actions to an enhanced version of AIBO. The AIBO had built-in speech recognition facilities and behaviors. In this paper, we try to build a system that learns the correspondence between continuous speeches and continuous motions without a built-in speech recognizer nor built-in behaviors. We teach RobotPHONE to respond to voices properly by taking its hands. For example, one says 'bye-bye' to the RobotPHONE holding its hand and waving. From continuous input, the system must segment speech and discover acoustic units which correspond to words. The segmentation is done based on recurrent patterns which was found by incremental reference interval-free continuous DP (IRIFCDP) by Kiyama et al. (1996) and Utsunomiya et al. (2004), and we accelerate the IRIFCDP using ShiftCDP (Itoh and Tanaka, 2004). The system also segments motion by the accelerated IRIFCDP, and it memorizes co-occurring speech and motion patterns. Then, it can respond to taught words properly by detecting taught words in speech input by ShiftCDP. We gave a demonstration with a RobotPHONE at the conference. We expect that it can learn words in any languages because it has no built-in facilities specific to any language","PeriodicalId":297121,"journal":{"name":"Proceedings. The 4nd International Conference on Development and Learning, 2005.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning the Correspondence between Continuous Speeches and Motions\",\"authors\":\"O. Natsuki, N. Arata, I. Yoshiaki\",\"doi\":\"10.1109/DEVLRN.2005.1490983\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. Roy (1999) developed a computational model of early lexical learning to address three questions: First, how do infants discover linguistic units? Second, how do they learn perceptually-grounded semantic categories? And third, how do they learn to associate linguistic units with appropriate semantic categories? His model coupled speech recordings with static images of objects, and acquired a lexicon of shape names. Kaplan et al. (2001) presented a model for teaching names of actions to an enhanced version of AIBO. The AIBO had built-in speech recognition facilities and behaviors. In this paper, we try to build a system that learns the correspondence between continuous speeches and continuous motions without a built-in speech recognizer nor built-in behaviors. We teach RobotPHONE to respond to voices properly by taking its hands. For example, one says 'bye-bye' to the RobotPHONE holding its hand and waving. From continuous input, the system must segment speech and discover acoustic units which correspond to words. The segmentation is done based on recurrent patterns which was found by incremental reference interval-free continuous DP (IRIFCDP) by Kiyama et al. (1996) and Utsunomiya et al. (2004), and we accelerate the IRIFCDP using ShiftCDP (Itoh and Tanaka, 2004). The system also segments motion by the accelerated IRIFCDP, and it memorizes co-occurring speech and motion patterns. Then, it can respond to taught words properly by detecting taught words in speech input by ShiftCDP. We gave a demonstration with a RobotPHONE at the conference. We expect that it can learn words in any languages because it has no built-in facilities specific to any language\",\"PeriodicalId\":297121,\"journal\":{\"name\":\"Proceedings. The 4nd International Conference on Development and Learning, 2005.\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. The 4nd International Conference on Development and Learning, 2005.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEVLRN.2005.1490983\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. The 4nd International Conference on Development and Learning, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2005.1490983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

只提供摘要形式。Roy(1999)开发了一个早期词汇学习的计算模型来解决三个问题:第一，婴儿如何发现语言单位?第二，他们如何学习基于感知的语义范畴?第三，他们如何学会将语言单位与适当的语义范畴联系起来?他的模型将语音记录与物体的静态图像相结合，并获得了形状名称的词典。Kaplan等人(2001)提出了一个向AIBO增强版教授动作名称的模型。AIBO内置了语音识别功能和行为。在本文中，我们试图建立一个系统来学习连续语音和连续动作之间的对应关系，而不需要内置语音识别器和内置行为。我们教RobotPHONE通过握住它的手来正确地回应声音。例如，一个人握着RobotPHONE的手挥手说“再见”。从连续输入中，系统必须分割语音并发现与单词对应的声学单元。分割是基于Kiyama等人(1996)和Utsunomiya等人(2004)通过增量参考无间隔连续DP (IRIFCDP)发现的循环模式完成的，我们使用ShiftCDP加速IRIFCDP (Itoh和Tanaka, 2004)。该系统还通过加速的IRIFCDP分割运动，并记忆同时发生的语音和运动模式。然后，通过检测ShiftCDP输入的语音中的教词，对教词做出正确的响应。我们在会议上用机器人电话做了演示。我们期望它可以学习任何语言的单词，因为它没有任何特定于任何语言的内置功能

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning the Correspondence between Continuous Speeches and Motions

Summary form only given. Roy (1999) developed a computational model of early lexical learning to address three questions: First, how do infants discover linguistic units? Second, how do they learn perceptually-grounded semantic categories? And third, how do they learn to associate linguistic units with appropriate semantic categories? His model coupled speech recordings with static images of objects, and acquired a lexicon of shape names. Kaplan et al. (2001) presented a model for teaching names of actions to an enhanced version of AIBO. The AIBO had built-in speech recognition facilities and behaviors. In this paper, we try to build a system that learns the correspondence between continuous speeches and continuous motions without a built-in speech recognizer nor built-in behaviors. We teach RobotPHONE to respond to voices properly by taking its hands. For example, one says 'bye-bye' to the RobotPHONE holding its hand and waving. From continuous input, the system must segment speech and discover acoustic units which correspond to words. The segmentation is done based on recurrent patterns which was found by incremental reference interval-free continuous DP (IRIFCDP) by Kiyama et al. (1996) and Utsunomiya et al. (2004), and we accelerate the IRIFCDP using ShiftCDP (Itoh and Tanaka, 2004). The system also segments motion by the accelerated IRIFCDP, and it memorizes co-occurring speech and motion patterns. Then, it can respond to taught words properly by detecting taught words in speech input by ShiftCDP. We gave a demonstration with a RobotPHONE at the conference. We expect that it can learn words in any languages because it has no built-in facilities specific to any language

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. The 4nd International Conference on Development and Learning, 2005.

自引率

0.00%

发文量