首页 > 最新文献

Proceedings of the 21st ACM international conference on Multimedia最新文献

英文 中文
Improving event detection using related videos and relevance degree support vector machines 利用相关视频和相关度支持向量机改进事件检测
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502176
Christos Tzelepis, Nikolaos Gkalelis, V. Mezaris, Y. Kompatsiaris
In this paper, a new method that exploits related videos for the problem of event detection is proposed, where related videos are videos that are closely but not fully associated with the event of interest. In particular, the Weighted Margin SVM formulation is modified so that related class observations can be effectively incorporated in the optimization problem. The resulting Relevance Degree SVM is especially useful in problems where only a limited number of training observations is provided, e.g., for the EK10Ex subtask of TRECVID MED, where only ten positive and ten related samples are provided for the training of a complex event detector. Experimental results on the TRECVID MED 2011 dataset verify the effectiveness of the proposed method.
本文提出了一种利用相关视频来解决事件检测问题的新方法,其中相关视频是指与感兴趣的事件密切但不完全相关的视频。特别是对加权余量支持向量机的公式进行了改进,使相关的类观测值能够有效地纳入到优化问题中。所得到的关联度支持向量机在只提供有限数量的训练观测值的问题中特别有用,例如,对于TRECVID MED的EK10Ex子任务,其中只提供十个正样本和十个相关样本来训练复杂事件检测器。在TRECVID MED 2011数据集上的实验结果验证了该方法的有效性。
{"title":"Improving event detection using related videos and relevance degree support vector machines","authors":"Christos Tzelepis, Nikolaos Gkalelis, V. Mezaris, Y. Kompatsiaris","doi":"10.1145/2502081.2502176","DOIUrl":"https://doi.org/10.1145/2502081.2502176","url":null,"abstract":"In this paper, a new method that exploits related videos for the problem of event detection is proposed, where related videos are videos that are closely but not fully associated with the event of interest. In particular, the Weighted Margin SVM formulation is modified so that related class observations can be effectively incorporated in the optimization problem. The resulting Relevance Degree SVM is especially useful in problems where only a limited number of training observations is provided, e.g., for the EK10Ex subtask of TRECVID MED, where only ten positive and ten related samples are provided for the training of a complex event detector. Experimental results on the TRECVID MED 2011 dataset verify the effectiveness of the proposed method.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74316584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Towards next generation multimedia recommendation systems 迈向下一代多媒体推荐系统
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502233
Jialie Shen, Xiansheng Hua, Emre Sargin
Empowered by advances in information technology, such as social media network, digital library and mobile computing, there emerges an ever-increasing amounts of multimedia data. As the key technology to address the problem of information overload, multimedia recommendation system has been received a lot of attentions from both industry and academia. This course aims to 1) provide a series of detailed review of state-of-the-art in multimedia recommendation; 2) analyze key technical challenges in developing and evaluating next generation multimedia recommendation systems from different perspectives and 3) give some predictions about the road lies ahead of us.
在社会媒体网络、数字图书馆和移动计算等信息技术进步的推动下,多媒体数据的数量不断增加。多媒体推荐系统作为解决信息过载问题的关键技术,受到了业界和学术界的广泛关注。本课程旨在1)提供一系列多媒体推荐技术的详细回顾;2)从不同的角度分析开发和评估下一代多媒体推荐系统的关键技术挑战,3)对我们面前的道路给出一些预测。
{"title":"Towards next generation multimedia recommendation systems","authors":"Jialie Shen, Xiansheng Hua, Emre Sargin","doi":"10.1145/2502081.2502233","DOIUrl":"https://doi.org/10.1145/2502081.2502233","url":null,"abstract":"Empowered by advances in information technology, such as social media network, digital library and mobile computing, there emerges an ever-increasing amounts of multimedia data. As the key technology to address the problem of information overload, multimedia recommendation system has been received a lot of attentions from both industry and academia. This course aims to 1) provide a series of detailed review of state-of-the-art in multimedia recommendation; 2) analyze key technical challenges in developing and evaluating next generation multimedia recommendation systems from different perspectives and 3) give some predictions about the road lies ahead of us.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75414604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Gesture--sound mapping by demonstration in interactive music systems 手势——交互式音乐系统中声音映射的演示
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502214
Jules Françoise
In this paper we address the issue of mapping between gesture and sound in interactive music systems. Our approach, we call mapping by demonstration, aims at learning the mapping from examples provided by users while interacting with the system. We propose a general framework for modeling gesture--sound sequences based on a probabilistic, multimodal and hierarchical model. Two orthogonal modeling aspects are detailed and we describe planned research directions to improve and evaluate the proposed models.
在本文中,我们讨论了交互式音乐系统中手势和声音之间的映射问题。我们的方法,我们称之为通过演示映射,目的是在与系统交互时从用户提供的示例中学习映射。我们提出了一个基于概率、多模态和分层模型的手势-声音序列建模的通用框架。详细介绍了正交建模的两个方面,并描述了计划的研究方向,以改进和评估所提出的模型。
{"title":"Gesture--sound mapping by demonstration in interactive music systems","authors":"Jules Françoise","doi":"10.1145/2502081.2502214","DOIUrl":"https://doi.org/10.1145/2502081.2502214","url":null,"abstract":"In this paper we address the issue of mapping between gesture and sound in interactive music systems. Our approach, we call mapping by demonstration, aims at learning the mapping from examples provided by users while interacting with the system. We propose a general framework for modeling gesture--sound sequences based on a probabilistic, multimodal and hierarchical model. Two orthogonal modeling aspects are detailed and we describe planned research directions to improve and evaluate the proposed models.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75037136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Summary abstract for the 1st ACM international workshop on personal data meets distributed multimedia 第一届ACM个人数据与分布式多媒体国际研讨会摘要
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2503836
V. Singh, Tat-Seng Chua, R. Jain, A. Pentland
Multimedia data are now created at a macro, public scale as well as individual personal scale. While distributed multimedia streams (e.g. images, microblogs, and sensor readings) have recently been combined to understand multiple spatio-temporal phenomena like epidemic spreads, seasonal patterns, and political situations; personal data (via mobile sensors, quantified-self technologies) are now being used to identify user behavior, intent, affect, social connections, health, gaze, and interest level in real time. An effective combination of the two types of data can revolutionize multiple applications ranging from healthcare, to mobility, to product recommendation, to content delivery. Building systems at this intersection can lead to better orchestrated media systems that may also improve users' social, emotional and physical well-being. For example, users trapped in risky hurricane situations can receive personalized evacuation instructions based on their health, mobility parameters, and distance to nearest shelter. This workshop bring together researchers interested in exploring novel techniques that combine multiple streams at different scales (macro and micro) to understand and react to each user's needs.
多媒体数据现在是在宏观、公共规模和个人规模上创建的。虽然最近已将分布式多媒体流(例如图像、微博和传感器读数)结合起来,以了解流行病传播、季节模式和政治局势等多种时空现象;个人数据(通过移动传感器、量化自我技术)现在被用于实时识别用户行为、意图、影响、社会关系、健康、凝视和兴趣水平。两种数据类型的有效组合可以彻底改变多个应用程序,从医疗保健、移动、产品推荐到内容交付。在这个交叉点建立系统可以带来更好的协调媒体系统,也可以改善用户的社交、情感和身体健康。例如,被困在危险飓风中的用户可以根据他们的健康状况、机动性参数和到最近避难所的距离收到个性化的疏散指示。本次研讨会汇集了对探索新技术感兴趣的研究人员,这些技术结合了不同尺度(宏观和微观)的多个流,以了解和响应每个用户的需求。
{"title":"Summary abstract for the 1st ACM international workshop on personal data meets distributed multimedia","authors":"V. Singh, Tat-Seng Chua, R. Jain, A. Pentland","doi":"10.1145/2502081.2503836","DOIUrl":"https://doi.org/10.1145/2502081.2503836","url":null,"abstract":"Multimedia data are now created at a macro, public scale as well as individual personal scale. While distributed multimedia streams (e.g. images, microblogs, and sensor readings) have recently been combined to understand multiple spatio-temporal phenomena like epidemic spreads, seasonal patterns, and political situations; personal data (via mobile sensors, quantified-self technologies) are now being used to identify user behavior, intent, affect, social connections, health, gaze, and interest level in real time. An effective combination of the two types of data can revolutionize multiple applications ranging from healthcare, to mobility, to product recommendation, to content delivery. Building systems at this intersection can lead to better orchestrated media systems that may also improve users' social, emotional and physical well-being. For example, users trapped in risky hurricane situations can receive personalized evacuation instructions based on their health, mobility parameters, and distance to nearest shelter. This workshop bring together researchers interested in exploring novel techniques that combine multiple streams at different scales (macro and micro) to understand and react to each user's needs.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72674770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Online human gesture recognition from motion data streams 基于运动数据流的在线人类手势识别
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502103
Xin Zhao, Xue Li, C. Pang, Xiaofeng Zhu, Quan Z. Sheng
Online human gesture recognition has a wide range of applications in computer vision, especially in human-computer interaction applications. Recent introduction of cost-effective depth cameras brings on a new trend of research on body-movement gesture recognition. However, there are two major challenges: i) how to continuously recognize gestures from unsegmented streams, and ii) how to differentiate different styles of a same gesture from other types of gestures. In this paper, we solve these two problems with a new effective and efficient feature extraction method that uses a dynamic matching approach to construct a feature vector for each frame and improves sensitivity to the features of different gestures and decreases sensitivity to the features of gestures within the same class. Our comprehensive experiments on MSRC-12 Kinect Gesture and MSR-Action3D datasets have demonstrated a superior performance than the stat-of-the-art approaches.
在线人体手势识别在计算机视觉特别是人机交互领域有着广泛的应用。近年来,高性价比的深度相机的出现,为肢体动作手势识别的研究带来了新的趋势。然而,有两个主要的挑战:i)如何从未分割的流中连续识别手势,ii)如何区分相同手势的不同风格和其他类型的手势。本文提出了一种新的高效有效的特征提取方法,该方法采用动态匹配的方法为每一帧构造特征向量,提高了对不同手势特征的敏感性,降低了对同一类手势特征的敏感性。我们在MSRC-12 Kinect Gesture和MSR-Action3D数据集上的综合实验证明了比最先进的方法更优越的性能。
{"title":"Online human gesture recognition from motion data streams","authors":"Xin Zhao, Xue Li, C. Pang, Xiaofeng Zhu, Quan Z. Sheng","doi":"10.1145/2502081.2502103","DOIUrl":"https://doi.org/10.1145/2502081.2502103","url":null,"abstract":"Online human gesture recognition has a wide range of applications in computer vision, especially in human-computer interaction applications. Recent introduction of cost-effective depth cameras brings on a new trend of research on body-movement gesture recognition. However, there are two major challenges: i) how to continuously recognize gestures from unsegmented streams, and ii) how to differentiate different styles of a same gesture from other types of gestures. In this paper, we solve these two problems with a new effective and efficient feature extraction method that uses a dynamic matching approach to construct a feature vector for each frame and improves sensitivity to the features of different gestures and decreases sensitivity to the features of gestures within the same class. Our comprehensive experiments on MSRC-12 Kinect Gesture and MSR-Action3D datasets have demonstrated a superior performance than the stat-of-the-art approaches.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77742372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
3D teleimmersive activity classification based on application-system metadata 基于应用系统元数据的三维远程沉浸式活动分类
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502194
Aadhar Jain, A. Arefin, Raoul Rivas, Chien-Nan Chen, K. Nahrstedt
Being able to detect and recognize human activities is essential for 3D collaborative applications for efficient quality of service provisioning and device management. A broad range of research has been devoted to analyze media data to identify human activity, which requires the knowledge of data format, application-specific coding technique and computationally expensive image analysis. In this paper, we propose a human activity detection technique based on application generated metadata and related system metadata. Our approach does not depend on specific data format or coding technique. We evaluate our algorithm with different cyber-physical setups, and show that we can achieve very high accuracy (above 97%) by using a good learning model.
能够检测和识别人类活动对于3D协作应用程序至关重要,以实现高效的服务提供和设备管理质量。广泛的研究致力于分析媒体数据以识别人类活动,这需要数据格式的知识,特定应用的编码技术和计算昂贵的图像分析。本文提出了一种基于应用生成元数据和相关系统元数据的人类活动检测技术。我们的方法不依赖于特定的数据格式或编码技术。我们用不同的网络物理设置评估了我们的算法,并表明通过使用良好的学习模型,我们可以达到非常高的准确率(97%以上)。
{"title":"3D teleimmersive activity classification based on application-system metadata","authors":"Aadhar Jain, A. Arefin, Raoul Rivas, Chien-Nan Chen, K. Nahrstedt","doi":"10.1145/2502081.2502194","DOIUrl":"https://doi.org/10.1145/2502081.2502194","url":null,"abstract":"Being able to detect and recognize human activities is essential for 3D collaborative applications for efficient quality of service provisioning and device management. A broad range of research has been devoted to analyze media data to identify human activity, which requires the knowledge of data format, application-specific coding technique and computationally expensive image analysis. In this paper, we propose a human activity detection technique based on application generated metadata and related system metadata. Our approach does not depend on specific data format or coding technique. We evaluate our algorithm with different cyber-physical setups, and show that we can achieve very high accuracy (above 97%) by using a good learning model.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81570412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The social signal interpretation (SSI) framework: multimodal signal processing and recognition in real-time 社会信号解释(SSI)框架:实时多模态信号处理和识别
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502223
J. Wagner, F. Lingenfelser, Tobias Baur, Ionut Damian, Felix Kistler, E. André
Automatic detection and interpretation of social signals carried by voice, gestures, mimics, etc. will play a key-role for next-generation interfaces as it paves the way towards a more intuitive and natural human-computer interaction. The paper at hand introduces Social Signal Interpretation (SSI), a framework for real-time recognition of social signals. SSI supports a large range of sensor devices, filter and feature algorithms, as well as, machine learning and pattern recognition tools. It encourages developers to add new components using SSI's C++ API, but also addresses front end users by offering an XML interface to build pipelines with a text editor. SSI is freely available under GPL at http://openssi.net.
语音、手势、模仿等传递的社会信号的自动检测和解释将在下一代界面中发挥关键作用,因为它为更直观、更自然的人机交互铺平了道路。本文介绍了社会信号解释(Social Signal Interpretation, SSI),一个实时识别社会信号的框架。SSI支持各种传感器设备、滤波器和特征算法,以及机器学习和模式识别工具。它鼓励开发人员使用SSI的c++ API添加新组件,但也通过提供XML接口来使用文本编辑器构建管道,从而解决了前端用户的问题。SSI在GPL下可在http://openssi.net免费获得。
{"title":"The social signal interpretation (SSI) framework: multimodal signal processing and recognition in real-time","authors":"J. Wagner, F. Lingenfelser, Tobias Baur, Ionut Damian, Felix Kistler, E. André","doi":"10.1145/2502081.2502223","DOIUrl":"https://doi.org/10.1145/2502081.2502223","url":null,"abstract":"Automatic detection and interpretation of social signals carried by voice, gestures, mimics, etc. will play a key-role for next-generation interfaces as it paves the way towards a more intuitive and natural human-computer interaction. The paper at hand introduces Social Signal Interpretation (SSI), a framework for real-time recognition of social signals. SSI supports a large range of sensor devices, filter and feature algorithms, as well as, machine learning and pattern recognition tools. It encourages developers to add new components using SSI's C++ API, but also addresses front end users by offering an XML interface to build pipelines with a text editor. SSI is freely available under GPL at http://openssi.net.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77019658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 183
Modeling local descriptors with multivariate gaussians for object and scene recognition 用多变量高斯建模局部描述子,用于物体和场景识别
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502185
G. Serra, C. Grana, M. Manfredi, R. Cucchiara
Common techniques represent images by quantizing local descriptors and summarizing their distribution in a histogram. In this paper we propose to employ a parametric description and compare its capabilities to histogram based approaches. We use the multivariate Gaussian distribution, applied over the SIFT descriptors, extracted with dense sampling on a spatial pyramid. Every distribution is converted to a high-dimensional descriptor, by concatenating the mean vector and the projection of the covariance matrix on the Euclidean space tangent to the Riemannian manifold. Experiments on Caltech-101 and ImageCLEF2011 are performed using the Stochastic Gradient Descent solver, which allows to deal with large scale datasets and high dimensional feature spaces.
常用的技术通过量化局部描述符并在直方图中总结它们的分布来表示图像。在本文中,我们建议采用参数描述,并将其能力与基于直方图的方法进行比较。我们使用多元高斯分布,应用于SIFT描述符,在空间金字塔上进行密集采样提取。通过连接平均向量和协方差矩阵在与黎曼流形相切的欧几里德空间上的投影,将每个分布转换为高维描述符。在Caltech-101和ImageCLEF2011上使用随机梯度下降求解器进行了实验,该算法允许处理大规模数据集和高维特征空间。
{"title":"Modeling local descriptors with multivariate gaussians for object and scene recognition","authors":"G. Serra, C. Grana, M. Manfredi, R. Cucchiara","doi":"10.1145/2502081.2502185","DOIUrl":"https://doi.org/10.1145/2502081.2502185","url":null,"abstract":"Common techniques represent images by quantizing local descriptors and summarizing their distribution in a histogram. In this paper we propose to employ a parametric description and compare its capabilities to histogram based approaches. We use the multivariate Gaussian distribution, applied over the SIFT descriptors, extracted with dense sampling on a spatial pyramid. Every distribution is converted to a high-dimensional descriptor, by concatenating the mean vector and the projection of the covariance matrix on the Euclidean space tangent to the Riemannian manifold. Experiments on Caltech-101 and ImageCLEF2011 are performed using the Stochastic Gradient Descent solver, which allows to deal with large scale datasets and high dimensional feature spaces.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81150872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Massive-scale multimedia semantic modeling 大规模多媒体语义建模
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502235
John R. Smith, Liangliang Cao
Visual data is exploding! 500 billion consumer photos are taken each year world-wide, 633 million photos taken per year in NYC alone. 120 new video-hours are uploaded on YouTube per minute. The explosion of digital multimedia data is creating a valuable open source for insights. However, the unconstrained nature of 'image/video in the wild' makes it very challenging for automated computer-based analysis. Furthermore, the most interesting content in the multimedia files is often complex in nature reflecting a diversity of human behaviors, scenes, activities and events. To address these challenges, this tutorial will provide a unified overview of the two emerging techniques: Semantic modeling and Massive scale visual recognition, with a goal of both introducing people from different backgrounds to this exciting field and reviewing state of the art research in the new computational era.
视觉数据正在爆炸!全球每年拍摄5000亿张消费者照片,仅纽约市每年就拍摄6.33亿张。每分钟有120个新视频小时上传到YouTube上。数字多媒体数据的爆炸式增长正在为见解创造一个有价值的开放资源。然而,“野外图像/视频”的不受约束性质使得基于计算机的自动化分析非常具有挑战性。此外,多媒体文件中最有趣的内容在本质上往往是复杂的,反映了人类行为、场景、活动和事件的多样性。为了应对这些挑战,本教程将提供两种新兴技术的统一概述:语义建模和大规模视觉识别,目的是将来自不同背景的人们介绍到这个令人兴奋的领域,并回顾新计算时代的艺术研究状态。
{"title":"Massive-scale multimedia semantic modeling","authors":"John R. Smith, Liangliang Cao","doi":"10.1145/2502081.2502235","DOIUrl":"https://doi.org/10.1145/2502081.2502235","url":null,"abstract":"Visual data is exploding! 500 billion consumer photos are taken each year world-wide, 633 million photos taken per year in NYC alone. 120 new video-hours are uploaded on YouTube per minute. The explosion of digital multimedia data is creating a valuable open source for insights. However, the unconstrained nature of 'image/video in the wild' makes it very challenging for automated computer-based analysis. Furthermore, the most interesting content in the multimedia files is often complex in nature reflecting a diversity of human behaviors, scenes, activities and events. To address these challenges, this tutorial will provide a unified overview of the two emerging techniques: Semantic modeling and Massive scale visual recognition, with a goal of both introducing people from different backgrounds to this exciting field and reviewing state of the art research in the new computational era.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72864189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fourth international workshop on human behavior understanding (HBU 2013) 第四届人类行为理解国际研讨会(HBU 2013)
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2503830
A. A. Salah, H. Hung, O. Aran, H. Gunes
With advances in pattern recognition and multimedia computing, it became possible to analyze human behavior via multimodal sensors, at different time-scales and at different levels of interaction and interpretation. This ability opens up enormous possibilities for multimedia and multimodal interaction, with a potential of endowing the computers with a capacity to attribute meaning to users' attitudes, preferences, personality, social relationships, etc., as well as to understand what people are doing, the activities they have been engaged in, their routines and lifestyles. This workshop gathers researchers dealing with the problem of modeling human behavior under its multiple facets with particular attention to interactions in arts, creativity, entertainment and edutainment.
随着模式识别和多媒体计算的进步,通过多模态传感器在不同的时间尺度和不同层次的交互和解释来分析人类行为成为可能。这种能力为多媒体和多模式交互开辟了巨大的可能性,有可能赋予计算机赋予赋予用户态度、偏好、个性、社会关系等意义的能力,以及理解人们正在做什么、他们所从事的活动、他们的惯例和生活方式的能力。本次研讨会聚集了研究人员,在其多个方面处理人类行为建模问题,特别关注艺术,创造力,娱乐和寓教于乐的相互作用。
{"title":"Fourth international workshop on human behavior understanding (HBU 2013)","authors":"A. A. Salah, H. Hung, O. Aran, H. Gunes","doi":"10.1145/2502081.2503830","DOIUrl":"https://doi.org/10.1145/2502081.2503830","url":null,"abstract":"With advances in pattern recognition and multimedia computing, it became possible to analyze human behavior via multimodal sensors, at different time-scales and at different levels of interaction and interpretation. This ability opens up enormous possibilities for multimedia and multimodal interaction, with a potential of endowing the computers with a capacity to attribute meaning to users' attitudes, preferences, personality, social relationships, etc., as well as to understand what people are doing, the activities they have been engaged in, their routines and lifestyles. This workshop gathers researchers dealing with the problem of modeling human behavior under its multiple facets with particular attention to interactions in arts, creativity, entertainment and edutainment.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80103154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 21st ACM international conference on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1