Proceedings of the 21st ACM international conference on Multimedia最新文献

英文中文

Cloud based multimedia analytic platform 基于云的多媒体分析平台

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502254

Peng Wu, R. Vernica, Qian Lin

Multimedia Analytic Platform is a cloud based service to expose state-of-the-art multimedia technologies for mobile and web application development. As a product-quality service platform, it offers comprehensive API documentation, code example, service description and sandbox for trial for each multimedia technology. The utilization of the cloud storage and distributed computing framework allows the service platform to run with robustness and efficiency. The current technologies supported by the platform include face detection, face verification, face demographic estimation, feature extraction, image matching, and image collage. Since its initial public launch in October 2012, it has been adopted by universities and third party companies for course support and application development.

多媒体分析平台是一种基于云的服务，为移动和web应用程序开发提供最新的多媒体技术。作为一个产品质量的服务平台，它为每种多媒体技术提供了全面的API文档、代码示例、服务描述和沙盒试用。利用云存储和分布式计算框架，使服务平台能够健壮高效地运行。目前平台支持的技术包括人脸检测、人脸验证、人脸人口统计估计、特征提取、图像匹配和图像拼贴。自2012年10月首次公开发布以来，它已被大学和第三方公司采用，用于课程支持和应用程序开发。

引用次数: 0

Gesture-based control of physical modeling sound synthesis: a mapping-by-demonstration approach 基于手势的物理建模声音合成控制:通过演示映射的方法

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502262

Jules Françoise, Norbert Schnell, Frédéric Bevilacqua

We address the issue of mapping between gesture and sound for gesture-based control of physical modeling sound synthesis. We propose an approach called mapping by demonstration, allowing users to design the mapping by performing gestures while listening to sound examples. The system is based on a multimodal model able to learn the relationships between gestures and sounds.

我们解决了手势和声音之间的映射问题，用于基于手势的物理建模声音合成控制。我们提出了一种称为通过演示映射的方法，允许用户通过在听声音示例时执行手势来设计映射。该系统基于一个多模态模型，能够学习手势和声音之间的关系。

引用次数: 3

EigenNews: a personalized news video delivery platform 特征新闻:个性化新闻视频发布平台

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502270

Matt C. Yu, Péter Vajda, David M. Chen, Sam S. Tsai, Maryam Daneshi, A. Araújo, Huizhong Chen, B. Girod

We demonstrate EigenNews, a personalized television news system. Upon visiting the EigenNews website, a user is shown a variety of news videos which have been automatically selected based on her individual preferences. These videos are extracted from 16 continually recorded television programs using a multimodal segmentation algorithm. Relevant metadata for each video are generated by linking videos to online news articles. Selected news videos can be watched in three different layouts and on various devices.

我们演示了一个个性化的电视新闻系统EigenNews。当用户访问EigenNews网站时，会根据用户的个人喜好自动选择各种新闻视频。这些视频是使用多模态分割算法从16个连续录制的电视节目中提取的。通过将视频链接到在线新闻文章，生成每个视频的相关元数据。选定的新闻视频可以在三种不同的布局和不同的设备上观看。

引用次数: 2

Context-aware gesture recognition in classical music conducting 古典音乐指挥中的语境感知手势识别

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502216

Álvaro Sarasúa

Body movement has received increasing attention in music technology research during the last years. Some new musical interfaces make use of gestures to control music in a meaningful and intuitive way. A typical approach is to use the orchestra conducting paradigm, in which the computer that generates the music would be a textit{virtual orchestra} conducted by the user. However, although conductors' gestures are complex and their meaning can vary depending on the musical context, this context-dependency is still to explore. We propose a method to study context-dependency of body and facial gestures of conductors in orchestral classical music based on temporal clustering of gestures into actions, followed by an analysis of the evolution of audio features after action occurrences. For this, multi-modal data (audio, video, motion capture) will be recorded in real live concerts and rehearsals situations using unobtrusive techniques.

近年来，肢体动作在音乐技术研究中受到越来越多的关注。一些新的音乐界面利用手势以一种有意义和直观的方式来控制音乐。一种典型的方法是使用管弦乐队指挥范式，其中生成音乐的计算机将是由用户指挥的textit{虚拟管弦乐队}。然而，尽管指挥家的手势是复杂的，其含义可以根据音乐环境而变化，但这种环境依赖性仍有待探索。本文提出了一种基于动作时间聚类的古典管弦乐指挥家肢体和面部动作的语境依赖性研究方法，并分析了动作发生后音频特征的演变。为此，多模态数据(音频、视频、动作捕捉)将在真实的现场音乐会和排练中使用不引人注目的技术进行记录。

引用次数: 7

Large-scale web video shot ranking based on visual features and tag co-occurrence 基于视觉特征和标签共现的大规模网络视频镜头排序

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502139

Do Hang Nga, Keiji Yanai

In this paper, we propose a novel ranking method, VisualTextualRank, which extends [1] and [2]. Our method is based on random walk over bipartite graph to integrate visual information of video shots and tag information of Web videos effectively. Note that instead of treating the textual information as an additional feature for shot ranking, we explore the mutual reinforcement between shots and textual information of their corresponding videos to improve shot ranking. We apply our proposed method to the system of extracting automatically relevant video shots of specific actions from Web videos [3]. Based on our experimental results, we demonstrate that our ranking method can improve the performance of video shot retrieval.

在本文中，我们提出了一种新的排序方法，VisualTextualRank，它扩展了[1]和[2]。我们的方法是基于二部图上的随机行走来有效地整合视频镜头的视觉信息和网络视频的标签信息。需要注意的是，我们没有将文本信息作为镜头排序的附加特征，而是探索了镜头与对应视频的文本信息之间的相互强化，从而提高了镜头排序。我们将提出的方法应用于从Web视频[3]中自动提取特定动作的相关视频片段的系统。基于实验结果，我们证明了我们的排序方法可以提高视频镜头检索的性能。

引用次数: 2

2nd international workshop on socially-aware multimedia (SAM'13) 第二届社会意识多媒体国际研讨会(SAM'13)

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2503833

Pablo César, Matthew Cooper, David A. Shamma, Doug Williams

Multimedia social communication is becoming commonplace. Television is becoming smart and social; media sharing applications are transforming the way we converse and recall events and videoconferencing is a common application on our computers, phones, tablets and even televisions. The confluence of computer-mediated interaction, social networking, and multimedia content are radically reshaping social communications, bringing new challenges and opportunities. This workshop, in its second edition, provides an opportunity to explore socially-aware multimedia, in which the social dimension of mediated interactions between people are considered to be as important as the characteristics of the media content. Even though this social dimension is implicitly addressed in some current solutions, further research is needed to better understand what makes multimedia socially-aware.

多媒体社会交流正变得越来越普遍。电视正变得智能化和社会化;媒体共享应用正在改变我们交流和回忆事件的方式，视频会议是我们的电脑、手机、平板电脑甚至电视上的常见应用。以计算机为媒介的交互、社会网络和多媒体内容的融合正在从根本上重塑社会通信，带来新的挑战和机遇。本次研讨会在其第二版中，提供了一个探索社会意识多媒体的机会，其中人们之间媒介互动的社会维度被认为与媒体内容的特征一样重要。尽管当前的一些解决方案隐含地解决了这个社会维度，但需要进一步的研究来更好地理解是什么使多媒体具有社会意识。

引用次数: 0

Beauty is here: evaluating aesthetics in videos using multimodal features and free training data 美在这里:使用多模态特征和免费训练数据来评估视频的美学

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2508121

Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang

The aesthetics of videos can be used as a useful clue to improve user satisfaction in many applications such as search and recommendation. In this paper, we demonstrate a computational approach to automatically evaluate the aesthetics of videos, with particular emphasis on identifying beautiful scenes. Using a standard classification pipeline, we analyze the effectiveness of a comprehensive set of features, ranging from low-level visual features, mid-level semantic attributes, to style descriptors. In addition, since there is limited public training data with manual labels of video aesthetics, we explore freely available resources with a simple assumption that people tend to share more aesthetically appealing works than unappealing ones. Specifically, we use images from DPChallenge and videos from Flickr as positive training data and the Dutch documentary videos as negative data, where the latter contain mostly old materials of low visual quality. Our extensive evaluations show that combining multiple features is helpful, and very promising results can be obtained using the noisy but annotation-free training data. On the NHK Multimedia Challenge dataset, we attain a Spearman's rank correlation coefficient of 0.41.

视频的美学可以作为一个有用的线索，在搜索和推荐等许多应用程序中提高用户满意度。在本文中，我们展示了一种自动评估视频美学的计算方法，特别强调识别美丽的场景。使用标准的分类管道，我们分析了一组综合特征的有效性，从低级视觉特征，中级语义属性到样式描述符。此外，由于带有视频美学手动标签的公共训练数据有限，我们通过一个简单的假设来探索免费可用的资源，即人们倾向于分享更美观的作品而不是不美观的作品。具体来说，我们使用来自DPChallenge的图像和来自Flickr的视频作为正面训练数据，而荷兰的纪录片视频作为负面数据，后者大多是视觉质量较低的旧材料。我们的广泛评估表明，结合多个特征是有帮助的，并且使用有噪声但无注释的训练数据可以获得非常有希望的结果。在NHK多媒体挑战数据集上，我们获得了0.41的Spearman等级相关系数。

{"title":"Beauty is here: evaluating aesthetics in videos using multimodal features and free training data","authors":"Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang","doi":"10.1145/2502081.2508121","DOIUrl":"https://doi.org/10.1145/2502081.2508121","url":null,"abstract":"The aesthetics of videos can be used as a useful clue to improve user satisfaction in many applications such as search and recommendation. In this paper, we demonstrate a computational approach to automatically evaluate the aesthetics of videos, with particular emphasis on identifying beautiful scenes. Using a standard classification pipeline, we analyze the effectiveness of a comprehensive set of features, ranging from low-level visual features, mid-level semantic attributes, to style descriptors. In addition, since there is limited public training data with manual labels of video aesthetics, we explore freely available resources with a simple assumption that people tend to share more aesthetically appealing works than unappealing ones. Specifically, we use images from DPChallenge and videos from Flickr as positive training data and the Dutch documentary videos as negative data, where the latter contain mostly old materials of low visual quality. Our extensive evaluations show that combining multiple features is helpful, and very promising results can be obtained using the noisy but annotation-free training data. On the NHK Multimedia Challenge dataset, we attain a Spearman's rank correlation coefficient of 0.41.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87991031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Spatialized audio multiparty teleconferencing with commodity miniature microphone array 基于商用微型麦克风阵列的空间化音频多方远程会议

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502146

V. Nguyen, Shengkui Zhao, T. Vu, Douglas L. Jones, M. Do

This paper presents a Spatialized Audio Multiparty Teleconferencing (SAMT) system with a radically new communication experience for group teleconferencing. The system includes our recently developed 3D audio technologies: 3D sound source localization (SSL) and 3D audio capture and reproduction using a low-cost and compact design microphone array. In essence, the SAMT system offers 3D audio capture capability and spatial audio perception with multiple participants at a site, which still falls short in teleconferencing solutions. In addition to being able to identify and automatically track the active speaker, the system allows more compelling visual presentation for effective communication. Requiring only a low-cost microphone array and a consumer depth camera, the proposed system runs reliably and comfortably in real time on a commodity laptop or desktop PC. With such a minimal deployment requirement, we present a variety of user experiences created by SAMT.

本文提出了一种空间化音频多方电话会议系统，为群组电话会议提供了一种全新的通信体验。该系统包括我们最近开发的3D音频技术:3D声源定位(SSL)和3D音频捕获和重放，使用低成本和紧凑设计的麦克风阵列。从本质上讲，SAMT系统提供了3D音频捕获能力和空间音频感知能力，在一个地点有多个参与者，这在远程会议解决方案中仍然存在不足。除了能够识别和自动跟踪主动说话者之外，该系统还允许更引人注目的视觉呈现，以实现有效的沟通。该系统只需要一个低成本的麦克风阵列和一个消费者深度相机，就可以在普通笔记本电脑或台式电脑上可靠、舒适地实时运行。有了这样一个最小的部署需求，我们提供了由SAMT创建的各种用户体验。

引用次数: 4

Learning with limited and noisy tagging 使用有限和嘈杂的标签进行学习

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502111

Yingming Li, Zhongang Qi, Zhongfei Zhang, Mingyuan Yang

With the rapid development of social networks, tagging has become an important means responsible for such rapid development. A robust tagging method must have the capability to meet the two challenging requirements: limited labeled training samples and noisy labeled training samples. In this paper, we investigate this challenging problem of learning with limited and noisy tagging and propose a discriminative model, called SpSVM-MC, that exploits both labeled and unlabeled data through a semi-parametric regularization and takes advantage of the multi-label constraints into the optimization. While SpSVM-MC is a general method for learning with limited and noisy tagging, in the evaluations we focus on the specific application of noisy image tagging with limited labeled training samples on a benchmark dataset. Theoretical analysis and extensive evaluations in comparison with state-of-the-art literature demonstrate that SpSVM-MC outstands with a superior performance.

随着社交网络的快速发展，标签已经成为推动社交网络快速发展的重要手段。鲁棒标记方法必须能够满足两个具有挑战性的要求:有限标记的训练样本和有噪声标记的训练样本。在本文中，我们研究了这个具有挑战性的问题，并提出了一个判别模型，称为SpSVM-MC，该模型通过半参数正则化来利用标记和未标记的数据，并利用多标签约束进行优化。虽然SpSVM-MC是一种使用有限标记和噪声标记进行学习的通用方法，但在评估中，我们关注的是在基准数据集上使用有限标记训练样本进行噪声图像标记的具体应用。理论分析和广泛的评价与最新的文献比较表明，SpSVM-MC具有优异的性能。

引用次数: 7

Projective identity and procedural rhetoric in educational multimedia: towards the enrichment of programming self-concept and growth mindset with fantasy role-play 教育多媒体中的投射认同与程序修辞:以幻想角色扮演丰富编程自我概念与成长心态

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502209

M. A. Scott

There is a growing movement in the behavioral sciences towards exploring more situated, pragmatic and ontological accounts of human learning. Positive psychology shows that a reciprocal relationship may exist between self-concept and the development of expertise, while social psychology reveals that mindsets about the nature of personal traits can have profound impacts on practice behavior. Thus, nurturing psychological constructs through the learning environment may empower students, enabling them to learn more effectively. Educational multimedia is known to support learning in a range of contexts, but its role in facilitating such self-enrichment has seldom been explored. Consequently, it is not clear which designs can aid both self enhancement and skill development. This doctoral symposium paper proposes that an interplay between projective identity and procedural rhetoric, delivered in the form of a fantasy role-playing experience, could be one such practice. Early experiments in the area of introductory programming show promise, but raise questions about external validity, educationally relevant effect sizes and how multimedia elements within the tool could be utilized more effectively to enhance these effects.

在行为科学中，越来越多的人倾向于探索人类学习的情境、实用主义和本体论。积极心理学表明，自我概念与专业技能的发展之间可能存在互惠关系，而社会心理学则表明，关于个人特质本质的心态可以对实践行为产生深远的影响。因此，通过学习环境培养心理结构可以赋予学生权力，使他们能够更有效地学习。众所周知，教育多媒体可以在各种情况下支持学习，但它在促进这种自我充实方面的作用却很少被探索。因此，不清楚哪些设计既能帮助自我提升又能帮助技能发展。这篇博士研讨会论文提出，投射性身份和程序性修辞之间的相互作用，以幻想角色扮演体验的形式传递，可能就是这样一种实践。在入门编程领域的早期实验显示了希望，但提出了关于外部有效性、教育相关效应大小以及如何更有效地利用工具中的多媒体元素来增强这些效果的问题。

{"title":"Projective identity and procedural rhetoric in educational multimedia: towards the enrichment of programming self-concept and growth mindset with fantasy role-play","authors":"M. A. Scott","doi":"10.1145/2502081.2502209","DOIUrl":"https://doi.org/10.1145/2502081.2502209","url":null,"abstract":"There is a growing movement in the behavioral sciences towards exploring more situated, pragmatic and ontological accounts of human learning. Positive psychology shows that a reciprocal relationship may exist between self-concept and the development of expertise, while social psychology reveals that mindsets about the nature of personal traits can have profound impacts on practice behavior. Thus, nurturing psychological constructs through the learning environment may empower students, enabling them to learn more effectively. Educational multimedia is known to support learning in a range of contexts, but its role in facilitating such self-enrichment has seldom been explored. Consequently, it is not clear which designs can aid both self enhancement and skill development. This doctoral symposium paper proposes that an interplay between projective identity and procedural rhetoric, delivered in the form of a fantasy role-playing experience, could be one such practice. Early experiments in the area of introductory programming show promise, but raise questions about external validity, educationally relevant effect sizes and how multimedia elements within the tool could be utilized more effectively to enhance these effects.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76331617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 21st ACM international conference on Multimedia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀