视觉与语言整合的计算方法

Synthesis Lectures on Computer Vision Pub Date : 2016-04-21 DOI:10.2200/s00705ed1v01y201602cov007

Kobus Barnard

{"title":"视觉与语言整合的计算方法","authors":"Kobus Barnard","doi":"10.2200/s00705ed1v01y201602cov007","DOIUrl":null,"url":null,"abstract":"Abstract \"This is clearly the most comprehensive and thoughtful compendium of knowledge on language/vision integration out there, and I'm sure it will be a valuable resources to many researchers and instructors.\" - Sven Dickinson, Series Editor (University of Toronto) Modeling data from visual and linguistic modalities together creates opportunities for better understanding of both, and supports many useful applications. Examples of dual visual-linguistic data includes images with keywords, video with narrative, and figures in documents. We consider two key task-driven themes: translating from one modality to another (e.g., inferring annotations for images) and understanding the data using all modalities, where one modality can help disambiguate information in another. The multiple modalities can either be essentially semantically redundant (e.g., keywords provided by a person looking at the image), or largely complementary (e.g., meta data such as the camera used). Redundancy and complementarity are two ...","PeriodicalId":377202,"journal":{"name":"Synthesis Lectures on Computer Vision","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Computational Methods for Integrating Vision and Language\",\"authors\":\"Kobus Barnard\",\"doi\":\"10.2200/s00705ed1v01y201602cov007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract \\\"This is clearly the most comprehensive and thoughtful compendium of knowledge on language/vision integration out there, and I'm sure it will be a valuable resources to many researchers and instructors.\\\" - Sven Dickinson, Series Editor (University of Toronto) Modeling data from visual and linguistic modalities together creates opportunities for better understanding of both, and supports many useful applications. Examples of dual visual-linguistic data includes images with keywords, video with narrative, and figures in documents. We consider two key task-driven themes: translating from one modality to another (e.g., inferring annotations for images) and understanding the data using all modalities, where one modality can help disambiguate information in another. The multiple modalities can either be essentially semantically redundant (e.g., keywords provided by a person looking at the image), or largely complementary (e.g., meta data such as the camera used). Redundancy and complementarity are two ...\",\"PeriodicalId\":377202,\"journal\":{\"name\":\"Synthesis Lectures on Computer Vision\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Synthesis Lectures on Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2200/s00705ed1v01y201602cov007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Synthesis Lectures on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2200/s00705ed1v01y201602cov007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

“这显然是关于语言/视觉整合的最全面和最深思熟虑的知识纲要，我相信它将成为许多研究人员和教师的宝贵资源。”- Sven Dickinson，系列编辑(多伦多大学)来自视觉和语言模式的建模数据一起为更好地理解两者创造了机会，并支持许多有用的应用。双重视觉语言数据的例子包括带有关键词的图像、带有叙述的视频和文档中的数字。我们考虑了两个关键的任务驱动主题:从一种模态转换到另一种模态(例如，推断图像的注释)和使用所有模态理解数据，其中一种模态可以帮助消除另一种模态中的信息歧义。多模态可以本质上是语义冗余的(例如，由查看图像的人提供的关键字)，或者很大程度上是互补的(例如，元数据，如使用的相机)。冗余和互补性是两个…

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Computational Methods for Integrating Vision and Language

Abstract "This is clearly the most comprehensive and thoughtful compendium of knowledge on language/vision integration out there, and I'm sure it will be a valuable resources to many researchers and instructors." - Sven Dickinson, Series Editor (University of Toronto) Modeling data from visual and linguistic modalities together creates opportunities for better understanding of both, and supports many useful applications. Examples of dual visual-linguistic data includes images with keywords, video with narrative, and figures in documents. We consider two key task-driven themes: translating from one modality to another (e.g., inferring annotations for images) and understanding the data using all modalities, where one modality can help disambiguate information in another. The multiple modalities can either be essentially semantically redundant (e.g., keywords provided by a person looking at the image), or largely complementary (e.g., meta data such as the camera used). Redundancy and complementarity are two ...

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Synthesis Lectures on Computer Vision

自引率

0.00%

发文量

期刊最新文献

Visual Domain Adaptation in the Deep Learning Era Computer Vision in the Infrared Spectrum: Challenges and Approaches Person Re-identification with Limited Supervision Multi-Modal Face Presentation Attack Detection Computational Texture and Patterns: From Textons to Deep Learning