{"title":"类型,位置和比例从杂乱的自然视频和动作","authors":"Xiaoying Song, Wenqiang Zhang, J. Weng","doi":"10.1109/TAMD.2015.2478377","DOIUrl":null,"url":null,"abstract":"We model the autonomous development of brain-inspired circuits through two modalities-video stream and action stream that are synchronized in time. We assume that such multimodal streams are available to a baby through inborn reflexes, self-supervision, and caretaker's supervision, when the baby interacts with the real world. By autonomous development, we mean that not only that the internal (inside the “skull”) self-organization is fully autonomous, but the developmental program (DP) that regulates the computation of the network is also task nonspecific. In this work, the task-nonspecificity is reflected by the fact that the actions associated with an attended object in a cluttered, natural, and dynamic scene is taught after the DP is finished and the “life” has begun. The actions correspond to neuronal firing patterns representing object type, object location and object scale, but learning is directly from unsegmented cluttered scenes. Along the line of where-what networks (WWN), this is the first one that explicitly models multiple “brain” areas-each for a different range of object scales. Among experiments, large natural video experiments were conducted. To show the power of automatic attention in unknown cluttered backgrounds, the last experimental group demonstrated disjoint tests in the presence of large within-class variations (object 3-D-rotations in very different unknown backgrounds), but small between-class variations (small object patches in large similar and different unknown backgrounds), in contrast with global classification tests such as ImageNet and Atari Games.","PeriodicalId":49193,"journal":{"name":"IEEE Transactions on Autonomous Mental Development","volume":"7 1","pages":"273-286"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TAMD.2015.2478377","citationCount":"4","resultStr":"{\"title\":\"Types, Locations, and Scales from Cluttered Natural Video and Actions\",\"authors\":\"Xiaoying Song, Wenqiang Zhang, J. Weng\",\"doi\":\"10.1109/TAMD.2015.2478377\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We model the autonomous development of brain-inspired circuits through two modalities-video stream and action stream that are synchronized in time. We assume that such multimodal streams are available to a baby through inborn reflexes, self-supervision, and caretaker's supervision, when the baby interacts with the real world. By autonomous development, we mean that not only that the internal (inside the “skull”) self-organization is fully autonomous, but the developmental program (DP) that regulates the computation of the network is also task nonspecific. In this work, the task-nonspecificity is reflected by the fact that the actions associated with an attended object in a cluttered, natural, and dynamic scene is taught after the DP is finished and the “life” has begun. The actions correspond to neuronal firing patterns representing object type, object location and object scale, but learning is directly from unsegmented cluttered scenes. Along the line of where-what networks (WWN), this is the first one that explicitly models multiple “brain” areas-each for a different range of object scales. Among experiments, large natural video experiments were conducted. To show the power of automatic attention in unknown cluttered backgrounds, the last experimental group demonstrated disjoint tests in the presence of large within-class variations (object 3-D-rotations in very different unknown backgrounds), but small between-class variations (small object patches in large similar and different unknown backgrounds), in contrast with global classification tests such as ImageNet and Atari Games.\",\"PeriodicalId\":49193,\"journal\":{\"name\":\"IEEE Transactions on Autonomous Mental Development\",\"volume\":\"7 1\",\"pages\":\"273-286\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TAMD.2015.2478377\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Autonomous Mental Development\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TAMD.2015.2478377\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Autonomous Mental Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAMD.2015.2478377","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Types, Locations, and Scales from Cluttered Natural Video and Actions
We model the autonomous development of brain-inspired circuits through two modalities-video stream and action stream that are synchronized in time. We assume that such multimodal streams are available to a baby through inborn reflexes, self-supervision, and caretaker's supervision, when the baby interacts with the real world. By autonomous development, we mean that not only that the internal (inside the “skull”) self-organization is fully autonomous, but the developmental program (DP) that regulates the computation of the network is also task nonspecific. In this work, the task-nonspecificity is reflected by the fact that the actions associated with an attended object in a cluttered, natural, and dynamic scene is taught after the DP is finished and the “life” has begun. The actions correspond to neuronal firing patterns representing object type, object location and object scale, but learning is directly from unsegmented cluttered scenes. Along the line of where-what networks (WWN), this is the first one that explicitly models multiple “brain” areas-each for a different range of object scales. Among experiments, large natural video experiments were conducted. To show the power of automatic attention in unknown cluttered backgrounds, the last experimental group demonstrated disjoint tests in the presence of large within-class variations (object 3-D-rotations in very different unknown backgrounds), but small between-class variations (small object patches in large similar and different unknown backgrounds), in contrast with global classification tests such as ImageNet and Atari Games.