{"title":"视频图灵测试:迈向人类级人工智能的第一步","authors":"Minsu Lee, Yu-Jung Heo, Seongho Choi, Woo Suk Choi, Byoung-Tak Zhang","doi":"10.1002/aaai.12128","DOIUrl":null,"url":null,"abstract":"<p>The development of artificial intelligence (AI) agents capable of human-level understanding of video content and conducting conversations with humans on this basis is a promising application that people expect. However, this is a challenging task that requires the holistic integration of multimodal information with temporal dependencies and reasoning, as well as social and physical commonsense. In addition, the development of appropriate systematic evaluation methods is essential. In this context, we introduce the Video Turing Test (VTT), a blind test used to evaluate human-likeness in terms of video comprehension ability. Moreover, we propose Vincent as a video understanding AI. We explain the configuration of VTT, the architecture of Vincent to prepare for VTT and the proposed evaluation methods for video comprehension. We also estimate the current intelligence level of AI based on our results and discuss future research directions.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"44 4","pages":"537-554"},"PeriodicalIF":2.5000,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.12128","citationCount":"0","resultStr":"{\"title\":\"Video Turing Test: A first step towards human-level AI\",\"authors\":\"Minsu Lee, Yu-Jung Heo, Seongho Choi, Woo Suk Choi, Byoung-Tak Zhang\",\"doi\":\"10.1002/aaai.12128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The development of artificial intelligence (AI) agents capable of human-level understanding of video content and conducting conversations with humans on this basis is a promising application that people expect. However, this is a challenging task that requires the holistic integration of multimodal information with temporal dependencies and reasoning, as well as social and physical commonsense. In addition, the development of appropriate systematic evaluation methods is essential. In this context, we introduce the Video Turing Test (VTT), a blind test used to evaluate human-likeness in terms of video comprehension ability. Moreover, we propose Vincent as a video understanding AI. We explain the configuration of VTT, the architecture of Vincent to prepare for VTT and the proposed evaluation methods for video comprehension. We also estimate the current intelligence level of AI based on our results and discuss future research directions.</p>\",\"PeriodicalId\":7854,\"journal\":{\"name\":\"Ai Magazine\",\"volume\":\"44 4\",\"pages\":\"537-554\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.12128\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ai Magazine\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/aaai.12128\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ai Magazine","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/aaai.12128","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Video Turing Test: A first step towards human-level AI
The development of artificial intelligence (AI) agents capable of human-level understanding of video content and conducting conversations with humans on this basis is a promising application that people expect. However, this is a challenging task that requires the holistic integration of multimodal information with temporal dependencies and reasoning, as well as social and physical commonsense. In addition, the development of appropriate systematic evaluation methods is essential. In this context, we introduce the Video Turing Test (VTT), a blind test used to evaluate human-likeness in terms of video comprehension ability. Moreover, we propose Vincent as a video understanding AI. We explain the configuration of VTT, the architecture of Vincent to prepare for VTT and the proposed evaluation methods for video comprehension. We also estimate the current intelligence level of AI based on our results and discuss future research directions.
期刊介绍:
AI Magazine publishes original articles that are reasonably self-contained and aimed at a broad spectrum of the AI community. Technical content should be kept to a minimum. In general, the magazine does not publish articles that have been published elsewhere in whole or in part. The magazine welcomes the contribution of articles on the theory and practice of AI as well as general survey articles, tutorial articles on timely topics, conference or symposia or workshop reports, and timely columns on topics of interest to AI scientists.