{"title":"Computational Methods for Integrating Vision and Language","authors":"Kobus Barnard","doi":"10.2200/s00705ed1v01y201602cov007","DOIUrl":null,"url":null,"abstract":"Abstract \"This is clearly the most comprehensive and thoughtful compendium of knowledge on language/vision integration out there, and I'm sure it will be a valuable resources to many researchers and instructors.\" - Sven Dickinson, Series Editor (University of Toronto) Modeling data from visual and linguistic modalities together creates opportunities for better understanding of both, and supports many useful applications. Examples of dual visual-linguistic data includes images with keywords, video with narrative, and figures in documents. We consider two key task-driven themes: translating from one modality to another (e.g., inferring annotations for images) and understanding the data using all modalities, where one modality can help disambiguate information in another. The multiple modalities can either be essentially semantically redundant (e.g., keywords provided by a person looking at the image), or largely complementary (e.g., meta data such as the camera used). Redundancy and complementarity are two ...","PeriodicalId":377202,"journal":{"name":"Synthesis Lectures on Computer Vision","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Synthesis Lectures on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2200/s00705ed1v01y201602cov007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Abstract "This is clearly the most comprehensive and thoughtful compendium of knowledge on language/vision integration out there, and I'm sure it will be a valuable resources to many researchers and instructors." - Sven Dickinson, Series Editor (University of Toronto) Modeling data from visual and linguistic modalities together creates opportunities for better understanding of both, and supports many useful applications. Examples of dual visual-linguistic data includes images with keywords, video with narrative, and figures in documents. We consider two key task-driven themes: translating from one modality to another (e.g., inferring annotations for images) and understanding the data using all modalities, where one modality can help disambiguate information in another. The multiple modalities can either be essentially semantically redundant (e.g., keywords provided by a person looking at the image), or largely complementary (e.g., meta data such as the camera used). Redundancy and complementarity are two ...
“这显然是关于语言/视觉整合的最全面和最深思熟虑的知识纲要,我相信它将成为许多研究人员和教师的宝贵资源。”- Sven Dickinson,系列编辑(多伦多大学)来自视觉和语言模式的建模数据一起为更好地理解两者创造了机会,并支持许多有用的应用。双重视觉语言数据的例子包括带有关键词的图像、带有叙述的视频和文档中的数字。我们考虑了两个关键的任务驱动主题:从一种模态转换到另一种模态(例如,推断图像的注释)和使用所有模态理解数据,其中一种模态可以帮助消除另一种模态中的信息歧义。多模态可以本质上是语义冗余的(例如,由查看图像的人提供的关键字),或者很大程度上是互补的(例如,元数据,如使用的相机)。冗余和互补性是两个…