{"title":"学习连续的言语和动作之间的对应关系","authors":"O. Natsuki, N. Arata, I. Yoshiaki","doi":"10.1109/DEVLRN.2005.1490983","DOIUrl":null,"url":null,"abstract":"Summary form only given. Roy (1999) developed a computational model of early lexical learning to address three questions: First, how do infants discover linguistic units? Second, how do they learn perceptually-grounded semantic categories? And third, how do they learn to associate linguistic units with appropriate semantic categories? His model coupled speech recordings with static images of objects, and acquired a lexicon of shape names. Kaplan et al. (2001) presented a model for teaching names of actions to an enhanced version of AIBO. The AIBO had built-in speech recognition facilities and behaviors. In this paper, we try to build a system that learns the correspondence between continuous speeches and continuous motions without a built-in speech recognizer nor built-in behaviors. We teach RobotPHONE to respond to voices properly by taking its hands. For example, one says 'bye-bye' to the RobotPHONE holding its hand and waving. From continuous input, the system must segment speech and discover acoustic units which correspond to words. The segmentation is done based on recurrent patterns which was found by incremental reference interval-free continuous DP (IRIFCDP) by Kiyama et al. (1996) and Utsunomiya et al. (2004), and we accelerate the IRIFCDP using ShiftCDP (Itoh and Tanaka, 2004). The system also segments motion by the accelerated IRIFCDP, and it memorizes co-occurring speech and motion patterns. Then, it can respond to taught words properly by detecting taught words in speech input by ShiftCDP. We gave a demonstration with a RobotPHONE at the conference. We expect that it can learn words in any languages because it has no built-in facilities specific to any language","PeriodicalId":297121,"journal":{"name":"Proceedings. The 4nd International Conference on Development and Learning, 2005.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning the Correspondence between Continuous Speeches and Motions\",\"authors\":\"O. Natsuki, N. Arata, I. Yoshiaki\",\"doi\":\"10.1109/DEVLRN.2005.1490983\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. Roy (1999) developed a computational model of early lexical learning to address three questions: First, how do infants discover linguistic units? Second, how do they learn perceptually-grounded semantic categories? And third, how do they learn to associate linguistic units with appropriate semantic categories? His model coupled speech recordings with static images of objects, and acquired a lexicon of shape names. Kaplan et al. (2001) presented a model for teaching names of actions to an enhanced version of AIBO. The AIBO had built-in speech recognition facilities and behaviors. In this paper, we try to build a system that learns the correspondence between continuous speeches and continuous motions without a built-in speech recognizer nor built-in behaviors. We teach RobotPHONE to respond to voices properly by taking its hands. For example, one says 'bye-bye' to the RobotPHONE holding its hand and waving. From continuous input, the system must segment speech and discover acoustic units which correspond to words. The segmentation is done based on recurrent patterns which was found by incremental reference interval-free continuous DP (IRIFCDP) by Kiyama et al. (1996) and Utsunomiya et al. (2004), and we accelerate the IRIFCDP using ShiftCDP (Itoh and Tanaka, 2004). The system also segments motion by the accelerated IRIFCDP, and it memorizes co-occurring speech and motion patterns. Then, it can respond to taught words properly by detecting taught words in speech input by ShiftCDP. We gave a demonstration with a RobotPHONE at the conference. We expect that it can learn words in any languages because it has no built-in facilities specific to any language\",\"PeriodicalId\":297121,\"journal\":{\"name\":\"Proceedings. The 4nd International Conference on Development and Learning, 2005.\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. The 4nd International Conference on Development and Learning, 2005.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEVLRN.2005.1490983\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. The 4nd International Conference on Development and Learning, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2005.1490983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning the Correspondence between Continuous Speeches and Motions
Summary form only given. Roy (1999) developed a computational model of early lexical learning to address three questions: First, how do infants discover linguistic units? Second, how do they learn perceptually-grounded semantic categories? And third, how do they learn to associate linguistic units with appropriate semantic categories? His model coupled speech recordings with static images of objects, and acquired a lexicon of shape names. Kaplan et al. (2001) presented a model for teaching names of actions to an enhanced version of AIBO. The AIBO had built-in speech recognition facilities and behaviors. In this paper, we try to build a system that learns the correspondence between continuous speeches and continuous motions without a built-in speech recognizer nor built-in behaviors. We teach RobotPHONE to respond to voices properly by taking its hands. For example, one says 'bye-bye' to the RobotPHONE holding its hand and waving. From continuous input, the system must segment speech and discover acoustic units which correspond to words. The segmentation is done based on recurrent patterns which was found by incremental reference interval-free continuous DP (IRIFCDP) by Kiyama et al. (1996) and Utsunomiya et al. (2004), and we accelerate the IRIFCDP using ShiftCDP (Itoh and Tanaka, 2004). The system also segments motion by the accelerated IRIFCDP, and it memorizes co-occurring speech and motion patterns. Then, it can respond to taught words properly by detecting taught words in speech input by ShiftCDP. We gave a demonstration with a RobotPHONE at the conference. We expect that it can learn words in any languages because it has no built-in facilities specific to any language