Massive-scale multimedia semantic modeling

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI:10.1145/2502081.2502235

John R. Smith, Liangliang Cao

引用次数: 0

Abstract

Visual data is exploding! 500 billion consumer photos are taken each year world-wide, 633 million photos taken per year in NYC alone. 120 new video-hours are uploaded on YouTube per minute. The explosion of digital multimedia data is creating a valuable open source for insights. However, the unconstrained nature of 'image/video in the wild' makes it very challenging for automated computer-based analysis. Furthermore, the most interesting content in the multimedia files is often complex in nature reflecting a diversity of human behaviors, scenes, activities and events. To address these challenges, this tutorial will provide a unified overview of the two emerging techniques: Semantic modeling and Massive scale visual recognition, with a goal of both introducing people from different backgrounds to this exciting field and reviewing state of the art research in the new computational era.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大规模多媒体语义建模

视觉数据正在爆炸!全球每年拍摄5000亿张消费者照片，仅纽约市每年就拍摄6.33亿张。每分钟有120个新视频小时上传到YouTube上。数字多媒体数据的爆炸式增长正在为见解创造一个有价值的开放资源。然而，“野外图像/视频”的不受约束性质使得基于计算机的自动化分析非常具有挑战性。此外，多媒体文件中最有趣的内容在本质上往往是复杂的，反映了人类行为、场景、活动和事件的多样性。为了应对这些挑战，本教程将提供两种新兴技术的统一概述:语义建模和大规模视觉识别，目的是将来自不同背景的人们介绍到这个令人兴奋的领域，并回顾新计算时代的艺术研究状态。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 21st ACM international conference on Multimedia

自引率

0.00%

发文量

期刊最新文献

Summary abstract for the 1st ACM international workshop on personal data meets distributed multimedia πLDA: document clustering with selective structural constraints Massive-scale multimedia semantic modeling OTMedia: the French TransMedia news observatory Orchestration: tv-like mixing grammars applied to video-communication for social groups