首页 > 最新文献

2015 IEEE International Symposium on Multimedia (ISM)最新文献

英文 中文
Optimizing HEVC CABAC Decoding with a Context Model Cache and Application-Specific Prefetching 优化HEVC CABAC解码与上下文模型缓存和应用程序特定的预取
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.97
Philipp Habermann, C. C. Chi, M. Alvarez-Mesa, B. Juurlink
Context-based Adaptive Binary Arithmetic Coding is the entropy coding module in the most recent JCT-VC video coding standard HEVC/H.265. As in the predecessor H.264/AVC, CABAC is a well-known throughput bottleneck due to its strong data dependencies. Beside other optimizations, the replacement of the context model memory by a smaller cache has been proposed, resulting in an improved clock frequency. However, the effect of potential cache misses has not been properly evaluated. Our work fills this gap and performs an extensive evaluation of different cache configurations. Furthermore, it is demonstrated that application-specific context model prefetching can effectively reduce the miss rate and make it negligible. Best overall performance results were achieved with caches of two and four lines, where each cache line consists of four context models. Four cache lines allow a speed-up of 10% to 12% for all video configurations while two cache lines improve the throughput by 9% to 15% for high bitrate videos and by 1% to 4% for low bitrate videos.
基于上下文的自适应二进制算术编码是最新JCT-VC视频编码标准HEVC/H.265中的熵编码模块。与之前的H.264/AVC一样,由于其强大的数据依赖性,CABAC是众所周知的吞吐量瓶颈。除了其他优化之外,还建议用较小的缓存替换上下文模型内存,从而提高时钟频率。然而,潜在的缓存丢失的影响还没有得到适当的评估。我们的工作填补了这一空白,并对不同的缓存配置进行了广泛的评估。此外,应用上下文模型预取可以有效地降低脱靶率,使其可以忽略不计。使用两行和四行缓存可以获得最佳的总体性能结果,其中每条缓存行由四个上下文模型组成。对于所有视频配置,四条缓存线允许10%到12%的速度提升,而两条缓存线对于高比特率视频可以提高9%到15%的吞吐量,对于低比特率视频可以提高1%到4%的吞吐量。
{"title":"Optimizing HEVC CABAC Decoding with a Context Model Cache and Application-Specific Prefetching","authors":"Philipp Habermann, C. C. Chi, M. Alvarez-Mesa, B. Juurlink","doi":"10.1109/ISM.2015.97","DOIUrl":"https://doi.org/10.1109/ISM.2015.97","url":null,"abstract":"Context-based Adaptive Binary Arithmetic Coding is the entropy coding module in the most recent JCT-VC video coding standard HEVC/H.265. As in the predecessor H.264/AVC, CABAC is a well-known throughput bottleneck due to its strong data dependencies. Beside other optimizations, the replacement of the context model memory by a smaller cache has been proposed, resulting in an improved clock frequency. However, the effect of potential cache misses has not been properly evaluated. Our work fills this gap and performs an extensive evaluation of different cache configurations. Furthermore, it is demonstrated that application-specific context model prefetching can effectively reduce the miss rate and make it negligible. Best overall performance results were achieved with caches of two and four lines, where each cache line consists of four context models. Four cache lines allow a speed-up of 10% to 12% for all video configurations while two cache lines improve the throughput by 9% to 15% for high bitrate videos and by 1% to 4% for low bitrate videos.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115115309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Songle Widget: Making Animation and Physical Devices Synchronized with Music Videos on the Web Songle Widget:使动画和物理设备与网络上的音乐视频同步
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.64
Masataka Goto, Kazuyoshi Yoshii, Tomoyasu Nakano
This paper describes a web-based multimedia development framework, Songle Widget, that makes it possible to control computer-graphic animation and physical devices such as lighting devices and robots in synchronization with music publicly available on the web. To avoid the difficulty of time-consuming manual annotation, Songle Widget makes it easy to develop web-based applications with rigid music synchronization by leveraging music-understanding technologies. Four types of musical elements (music structure, hierarchical beat structure, melody line, and chords) have been automatically annotated for more than 920,000 songs on music-or video-sharing services and can readily be used by music-synchronized applications. Since errors are inevitable when elements are annotated automatically, Songle Widget takes advantage of a user-friendly crowdsourcing interface that enables users to correct them. This is effective when applications require error-free annotation. We made Songle Widget open to the public, and its capabilities and usefulness have been demonstrated in seven music-synchronized applications.
本文描述了一个基于网络的多媒体开发框架,Songle Widget,它使控制计算机图形动画和物理设备(如照明设备和机器人)与网络上公开的音乐同步成为可能。为了避免耗时的手工注释的困难,Songle Widget通过利用音乐理解技术,使开发具有严格音乐同步的基于web的应用程序变得容易。四种类型的音乐元素(音乐结构、分层节拍结构、旋律线和和弦)已经在音乐或视频共享服务上为超过92万首歌曲自动标注,并且可以很容易地被音乐同步应用程序使用。由于自动注释元素时不可避免地会出现错误,因此Songle Widget利用了用户友好的众包界面,使用户能够纠正错误。这在应用程序需要无错误注释时非常有效。我们向公众开放了Songle Widget,它的功能和实用性已经在七个音乐同步应用程序中得到了展示。
{"title":"Songle Widget: Making Animation and Physical Devices Synchronized with Music Videos on the Web","authors":"Masataka Goto, Kazuyoshi Yoshii, Tomoyasu Nakano","doi":"10.1109/ISM.2015.64","DOIUrl":"https://doi.org/10.1109/ISM.2015.64","url":null,"abstract":"This paper describes a web-based multimedia development framework, Songle Widget, that makes it possible to control computer-graphic animation and physical devices such as lighting devices and robots in synchronization with music publicly available on the web. To avoid the difficulty of time-consuming manual annotation, Songle Widget makes it easy to develop web-based applications with rigid music synchronization by leveraging music-understanding technologies. Four types of musical elements (music structure, hierarchical beat structure, melody line, and chords) have been automatically annotated for more than 920,000 songs on music-or video-sharing services and can readily be used by music-synchronized applications. Since errors are inevitable when elements are annotated automatically, Songle Widget takes advantage of a user-friendly crowdsourcing interface that enables users to correct them. This is effective when applications require error-free annotation. We made Songle Widget open to the public, and its capabilities and usefulness have been demonstrated in seven music-synchronized applications.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"364 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132451133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Content-Based Multimedia Copy Detection 基于内容的多媒体复制检测
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.40
Chahid Ouali, P. Dumouchel, Vishwa Gupta
In this paper, we address the problem of multimedia content-based copy detection. We propose several audio and video fingerprints that are highly robust to audio and video transformations. We propose to accelerate the search of fingerprints by using a Graphics Processing Unit (GPU). To speedup this search even further, we propose a two-step search based on a clustering technique and a lookup table that reduces the number of comparisons between the query and the reference fingerprints. We evaluate our fingerprints on the well-known TRECVID 2009 and 2010 datasets, and we show that the proposed fingerprints outperform other state-of-the-art audio and video fingerprints while being significantly faster.
本文主要研究基于多媒体内容的拷贝检测问题。我们提出了几种对音频和视频转换具有高度鲁棒性的音频和视频指纹。我们建议使用图形处理单元(GPU)来加速指纹的搜索。为了进一步加快搜索速度,我们提出了一种基于聚类技术和查找表的两步搜索方法,该方法减少了查询和参考指纹之间的比较次数。我们在著名的TRECVID 2009和2010数据集上评估了我们的指纹,我们表明,我们提出的指纹比其他最先进的音频和视频指纹性能更好,而且速度要快得多。
{"title":"Content-Based Multimedia Copy Detection","authors":"Chahid Ouali, P. Dumouchel, Vishwa Gupta","doi":"10.1109/ISM.2015.40","DOIUrl":"https://doi.org/10.1109/ISM.2015.40","url":null,"abstract":"In this paper, we address the problem of multimedia content-based copy detection. We propose several audio and video fingerprints that are highly robust to audio and video transformations. We propose to accelerate the search of fingerprints by using a Graphics Processing Unit (GPU). To speedup this search even further, we propose a two-step search based on a clustering technique and a lookup table that reduces the number of comparisons between the query and the reference fingerprints. We evaluate our fingerprints on the well-known TRECVID 2009 and 2010 datasets, and we show that the proposed fingerprints outperform other state-of-the-art audio and video fingerprints while being significantly faster.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133096932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Vclick: Endpoint Driven Enterprise WebRTC Vclick:端点驱动的企业WebRTC
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.92
Kundan Singh, J. Yoakum
We present a robust, scalable and secure system architecture for web-based multimedia collaboration that keeps the application logic in the endpoint browser. Vclick is a simple and easy-to-use application for video interaction, collaboration and presence using HTML5 technologies including WebRTC (Web Real Time Communication), and is independent of legacy Voice-over-IP systems. Since its conception in early 2013, it has received many positive feedbacks, undergone improvements, and has been used in many enterprise communications research projects both in the cloud and on premise, on desktop as well as mobile. The techniques used and the challenges faced are useful to other emerging WebRTC applications.
我们为基于web的多媒体协作提供了一个健壮的、可扩展的和安全的系统架构,它将应用程序逻辑保存在端点浏览器中。Vclick是一个简单易用的应用程序,用于视频交互、协作和呈现,使用HTML5技术,包括WebRTC (Web Real Time Communication),并且独立于传统的ip语音系统。自2013年初提出以来,它获得了许多积极的反馈,经历了许多改进,并已在许多企业通信研究项目中使用,包括云和内部部署,桌面和移动。所使用的技术和面临的挑战对其他新兴的WebRTC应用程序是有用的。
{"title":"Vclick: Endpoint Driven Enterprise WebRTC","authors":"Kundan Singh, J. Yoakum","doi":"10.1109/ISM.2015.92","DOIUrl":"https://doi.org/10.1109/ISM.2015.92","url":null,"abstract":"We present a robust, scalable and secure system architecture for web-based multimedia collaboration that keeps the application logic in the endpoint browser. Vclick is a simple and easy-to-use application for video interaction, collaboration and presence using HTML5 technologies including WebRTC (Web Real Time Communication), and is independent of legacy Voice-over-IP systems. Since its conception in early 2013, it has received many positive feedbacks, undergone improvements, and has been used in many enterprise communications research projects both in the cloud and on premise, on desktop as well as mobile. The techniques used and the challenges faced are useful to other emerging WebRTC applications.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116791116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Musical Similarity and Commonness Estimation Based on Probabilistic Generative Models 基于概率生成模型的音乐相似性和共性估计
Pub Date : 2015-12-01 DOI: 10.1142/S1793351X1640002X
Tomoyasu Nakano, Kazuyoshi Yoshii, Masataka Goto
This paper proposes a novel concept we call musical commonness, which is the similarity of a song to a set of songs, in other words, its typicality. This commonness can be used to retrieve representative songs from a song set (e.g., songs released in the 80s or 90s). Previous research on musical similarity has compared two songs but has not evaluated the similarity of a song to a set of songs. The methods presented here for estimating the similarity and commonness of polyphonic musical audio signals are based on a unified framework of probabilistic generative modeling of four musical elements (vocal timbre, musical timbre, rhythm, and chord progression). To estimate the commonness, we use a generative model trained from a song set instead of estimating musical similarities of all possible song-pairs by using a model trained from each song. In experimental evaluation, we used 3278 popular music songs. Estimated song-pair similarities are comparable to ratings by a musician at the 0.1% significance level for vocal and musical timbre, at the 1% level for rhythm, and the 5% level for chord progression. Results of commonness evaluation show that the higher the musical commonness is, the more similar a song is to songs of a song set.
本文提出了一个新的概念,我们称之为音乐共性,它是一首歌与一组歌曲的相似性,换句话说,它的典型性。这种共性可用于从歌曲集中检索代表性歌曲(例如,80年代或90年代发行的歌曲)。之前关于音乐相似性的研究比较了两首歌曲,但没有评估一首歌与一组歌曲的相似性。本文提出的估计复调音乐音频信号相似性和共性的方法是基于四个音乐元素(人声音色、音乐音色、节奏和和弦进行)的概率生成建模的统一框架。为了估计共性,我们使用从歌曲集训练的生成模型,而不是使用从每首歌曲训练的模型来估计所有可能的歌曲对的音乐相似性。在实验评估中,我们使用了3278首流行音乐歌曲。估计的歌曲对相似性可与音乐家在声乐和音乐音色的0.1%显著性水平、节奏的1%显著性水平和和弦进行的5%显著性水平下的评分相媲美。共性评价结果表明,音乐共性越高,一首歌曲与一套歌曲中的歌曲越相似。
{"title":"Musical Similarity and Commonness Estimation Based on Probabilistic Generative Models","authors":"Tomoyasu Nakano, Kazuyoshi Yoshii, Masataka Goto","doi":"10.1142/S1793351X1640002X","DOIUrl":"https://doi.org/10.1142/S1793351X1640002X","url":null,"abstract":"This paper proposes a novel concept we call musical commonness, which is the similarity of a song to a set of songs, in other words, its typicality. This commonness can be used to retrieve representative songs from a song set (e.g., songs released in the 80s or 90s). Previous research on musical similarity has compared two songs but has not evaluated the similarity of a song to a set of songs. The methods presented here for estimating the similarity and commonness of polyphonic musical audio signals are based on a unified framework of probabilistic generative modeling of four musical elements (vocal timbre, musical timbre, rhythm, and chord progression). To estimate the commonness, we use a generative model trained from a song set instead of estimating musical similarities of all possible song-pairs by using a model trained from each song. In experimental evaluation, we used 3278 popular music songs. Estimated song-pair similarities are comparable to ratings by a musician at the 0.1% significance level for vocal and musical timbre, at the 1% level for rhythm, and the 5% level for chord progression. Results of commonness evaluation show that the higher the musical commonness is, the more similar a song is to songs of a song set.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124869402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Adaptive Error Protection for the Streaming of Motion JPEG 2000 Video over Variable Bit Error Rate Channels 可变误码率信道上运动JPEG 2000视频流的自适应错误保护
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.30
G. Baruffa, F. Frescura
In this paper we present a technique that can be used for optimizing the streaming of Motion JPEG 2000 video when the communication channel can be abstracted as a binary symmetric channel (BSC), characterized by a certain slowly variable rate of transmission errors. The video is packetized and every packet is protected with a Reed-Solomon forward error correction (FEC) code and then sent on the communication channel. The FEC helps to recover from erased/error-affected source information bytes at the receiver-decoder. The optimized amount of error protection is chosen by using an uncomplicated mathematical expression, given the knowledge of the channel status. In particular, we show the results of simulations that outline the robustness of this technique and its clear advantage over more trivial solutions.
本文提出了一种可以将通信信道抽象为二进制对称信道(BSC)的Motion JPEG 2000视频流优化技术,该信道具有一定的慢变传输误码率。视频被打包,每个包都用里德-所罗门前向纠错(FEC)码保护,然后在通信信道上发送。FEC有助于从接收解码器中被擦除/受错误影响的源信息字节中恢复。在已知信道状态的情况下,通过使用简单的数学表达式来选择最佳的错误保护量。特别是,我们展示了模拟的结果,这些结果概述了该技术的鲁棒性以及它相对于更平凡的解决方案的明显优势。
{"title":"Adaptive Error Protection for the Streaming of Motion JPEG 2000 Video over Variable Bit Error Rate Channels","authors":"G. Baruffa, F. Frescura","doi":"10.1109/ISM.2015.30","DOIUrl":"https://doi.org/10.1109/ISM.2015.30","url":null,"abstract":"In this paper we present a technique that can be used for optimizing the streaming of Motion JPEG 2000 video when the communication channel can be abstracted as a binary symmetric channel (BSC), characterized by a certain slowly variable rate of transmission errors. The video is packetized and every packet is protected with a Reed-Solomon forward error correction (FEC) code and then sent on the communication channel. The FEC helps to recover from erased/error-affected source information bytes at the receiver-decoder. The optimized amount of error protection is chosen by using an uncomplicated mathematical expression, given the knowledge of the channel status. In particular, we show the results of simulations that outline the robustness of this technique and its clear advantage over more trivial solutions.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127299505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WTA Hash-Based Multimodal Feature Fusion for 3D Human Action Recognition 基于WTA哈希的多模态特征融合三维人体动作识别
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.11
Jun Ye, Kai Li, K. Hua
With the prevalence of the commodity depth sensors (e.g. Kinect), multimodal data including RGB stream, depth stream and audio stream have been utilized in various applications such as video games, education and health. Nevertheless, it is still very challenging to effectively fuse the features from multimodal data. In this paper, we propose a WTA (Winner-Take-All) Hash-based feature fusion algorithm and investigate its application in 3D human action recognition. Specifically, the WTA Hashing is performed to encode features from different modalities into the ordinal space. By leveraging the ordinal measures rather than using the absolute value of the original features, such feature embedding can provide a form of resilience to the scale and numerical perturbations. We propose a frame-level feature fusion algorithm and develop a WTA Hash-embedded warping algorithm to measure the similarity between two sequences. Experiments performed on three public 3D human action datasets show that the proposed fusion algorithm has achieved state-of-the-art recognition results even with the nearest neighbor search.
随着商品深度传感器(如Kinect)的普及,包括RGB流、深度流和音频流在内的多模态数据已被用于视频游戏、教育和健康等各种应用。然而,如何有效地融合多模态数据的特征仍然是一个非常具有挑战性的问题。本文提出了一种基于WTA(赢者通吃)哈希的特征融合算法,并研究了该算法在三维人体动作识别中的应用。具体来说,执行WTA哈希将不同模态的特征编码到有序空间中。通过利用顺序度量而不是使用原始特征的绝对值,这种特征嵌入可以提供一种对尺度和数值扰动的弹性形式。我们提出了一种帧级特征融合算法,并开发了一种嵌入WTA哈希的扭曲算法来测量两个序列之间的相似性。在三个公开的三维人体动作数据集上进行的实验表明,即使采用最近邻搜索,所提出的融合算法也能获得最先进的识别结果。
{"title":"WTA Hash-Based Multimodal Feature Fusion for 3D Human Action Recognition","authors":"Jun Ye, Kai Li, K. Hua","doi":"10.1109/ISM.2015.11","DOIUrl":"https://doi.org/10.1109/ISM.2015.11","url":null,"abstract":"With the prevalence of the commodity depth sensors (e.g. Kinect), multimodal data including RGB stream, depth stream and audio stream have been utilized in various applications such as video games, education and health. Nevertheless, it is still very challenging to effectively fuse the features from multimodal data. In this paper, we propose a WTA (Winner-Take-All) Hash-based feature fusion algorithm and investigate its application in 3D human action recognition. Specifically, the WTA Hashing is performed to encode features from different modalities into the ordinal space. By leveraging the ordinal measures rather than using the absolute value of the original features, such feature embedding can provide a form of resilience to the scale and numerical perturbations. We propose a frame-level feature fusion algorithm and develop a WTA Hash-embedded warping algorithm to measure the similarity between two sequences. Experiments performed on three public 3D human action datasets show that the proposed fusion algorithm has achieved state-of-the-art recognition results even with the nearest neighbor search.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129587437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
SalAd: A Multimodal Approach for Contextual Video Advertising 沙拉:上下文视频广告的多模式方法
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.75
C. Xiang, Tam V. Nguyen, M. Kankanhalli
The explosive growth of multimedia data on Internet has created huge opportunities for online video advertising. In this paper, we propose a novel advertising technique called SalAd, which utilizes textual information, visual content and the webpage saliency, to automatically associate the most suitable companion ads with online videos. Unlike most existing approaches that only focus on selecting the most relevant ads, SalAd further considers the saliency of selected ads to reduce intentional ignorance. SalAd consists of three basic steps. Given an online video and a set of advertisements, we first roughly identify a set of relevant ads based on the textual information matching. We then carefully select a sub-set of candidates based on visual content matching. In this regard, our selected ads are contextually relevant to online video content in terms of both textual information and visual content. We finally select the most salient ad among the relevant ads as the most appropriate one. To demonstrate the effectiveness of our method, we have conducted a rigorous eye-tracking experiment on two ad-datasets. The experimental results show that our method enhances the user engagement with the ad content while maintaining users' quality of video viewing experience.
互联网上多媒体数据的爆炸式增长为在线视频广告创造了巨大的机会。在本文中,我们提出了一种新的广告技术,称为色拉,利用文本信息,视觉内容和网页显著性,自动关联最合适的伴侣广告与在线视频。与大多数现有的只关注于选择最相关广告的方法不同,SalAd进一步考虑了所选广告的显著性,以减少故意忽视。沙拉包括三个基本步骤。给定一个在线视频和一组广告,我们首先根据文本信息匹配大致识别出一组相关的广告。然后,我们根据视觉内容匹配仔细选择候选子集。在这方面,我们选择的广告在文本信息和视觉内容方面都与在线视频内容相关。我们最终在相关广告中选择最突出的广告作为最合适的广告。为了证明我们方法的有效性,我们在两个广告数据集上进行了严格的眼动追踪实验。实验结果表明,该方法在保持用户观看视频体验质量的同时,增强了用户对广告内容的参与度。
{"title":"SalAd: A Multimodal Approach for Contextual Video Advertising","authors":"C. Xiang, Tam V. Nguyen, M. Kankanhalli","doi":"10.1109/ISM.2015.75","DOIUrl":"https://doi.org/10.1109/ISM.2015.75","url":null,"abstract":"The explosive growth of multimedia data on Internet has created huge opportunities for online video advertising. In this paper, we propose a novel advertising technique called SalAd, which utilizes textual information, visual content and the webpage saliency, to automatically associate the most suitable companion ads with online videos. Unlike most existing approaches that only focus on selecting the most relevant ads, SalAd further considers the saliency of selected ads to reduce intentional ignorance. SalAd consists of three basic steps. Given an online video and a set of advertisements, we first roughly identify a set of relevant ads based on the textual information matching. We then carefully select a sub-set of candidates based on visual content matching. In this regard, our selected ads are contextually relevant to online video content in terms of both textual information and visual content. We finally select the most salient ad among the relevant ads as the most appropriate one. To demonstrate the effectiveness of our method, we have conducted a rigorous eye-tracking experiment on two ad-datasets. The experimental results show that our method enhances the user engagement with the ad content while maintaining users' quality of video viewing experience.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122519519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
An Adopter Centric API and Visual Programming Interface for the Definition of Strategies for Automated Camera Tracking 以采用者为中心的API和用于自动相机跟踪策略定义的可视化编程接口
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.106
Benjamin Wulff, Andrew Wilson, Beate Jost, M. Ketterl
The LectureSight system provides a facility for controlling robotic cameras during live presentation recordings in a fully automated way. In this paper we present how the system accounts for the very heterogeneous demands in the domain of lecture capture at universities. An API for the JavaScript programming language gives the adopter the freedom to formulate their own camera steering strategy. A graphical programming environment, based on the Open Roberta project, further eases the development of the steering logic. The accomplishments of the LectureSight project should serve as an example on how integrated measure and control systems can be made highly customizable and give the adopter the power to fully exploit the range of possibilities a technology provides.
LectureSight系统提供了一种设备,可以在现场演示录制过程中以全自动的方式控制机器人摄像机。在本文中,我们介绍了该系统如何解释大学讲座捕获领域的非常异构的需求。JavaScript编程语言的API使采用者可以自由地制定自己的相机转向策略。基于Open Roberta项目的图形化编程环境进一步简化了转向逻辑的开发。LectureSight项目的成就应该作为一个例子,说明如何使集成测量和控制系统高度可定制,并使采用者能够充分利用技术提供的各种可能性。
{"title":"An Adopter Centric API and Visual Programming Interface for the Definition of Strategies for Automated Camera Tracking","authors":"Benjamin Wulff, Andrew Wilson, Beate Jost, M. Ketterl","doi":"10.1109/ISM.2015.106","DOIUrl":"https://doi.org/10.1109/ISM.2015.106","url":null,"abstract":"The LectureSight system provides a facility for controlling robotic cameras during live presentation recordings in a fully automated way. In this paper we present how the system accounts for the very heterogeneous demands in the domain of lecture capture at universities. An API for the JavaScript programming language gives the adopter the freedom to formulate their own camera steering strategy. A graphical programming environment, based on the Open Roberta project, further eases the development of the steering logic. The accomplishments of the LectureSight project should serve as an example on how integrated measure and control systems can be made highly customizable and give the adopter the power to fully exploit the range of possibilities a technology provides.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114981476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On Virtualization of Red-Button Signaling in Hybrid TV 混合电视中红键信令的虚拟化研究
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.109
A. Mikityuk, Martin Platschek, O. Friedrich
Hybrid Broadcast Broadband TV (HbbTV) is a European Hybrid TV standard that combines web-based technologies with traditional TV broadcast services. With the release of the HbbTV 2.0 specification in mid-February 2015, the HbbTV standard has begun to gain momentum worldwide, e.g. in North and South America, in Asia and in the UK. Indeed, the ATSC standard 3.0 in North America will be harmonized with the HbbTV 2.0 version. In Europe we have already faced the fact that the HbbTV device market has become very fragmented. This is due to versioning of the standard, various hardware capabilities of HbbTV devices or the lack of the HbbTV standard support on millions of devices. In this work we address the challenge of device fragmentation on the HbbTV market with a cloud-enabled HbbTV concept. This concept is based on the virtualization of HbbTV application signaling or the so-called Red-button signaling. The Red-button signaling is terminated, executed and handled within the Cloud in our approach. This work presents the architecture to enable the Cloud HbbTV approach and the implementation of this architecture. Finally, the architecture evaluation and corresponding challenges are presented.
混合广播宽带电视(HbbTV)是一种欧洲混合电视标准,它将基于网络的技术与传统的电视广播服务相结合。随着HbbTV 2.0规范于2015年2月中旬发布,HbbTV标准开始在全球范围内获得发展势头,例如在北美和南美,亚洲和英国。事实上,北美的ATSC标准3.0将与HbbTV 2.0版本协调一致。在欧洲,我们已经面临HbbTV设备市场变得非常分散的事实。这是由于标准的版本控制,HbbTV设备的各种硬件功能,或者在数百万设备上缺乏HbbTV标准支持。在这项工作中,我们通过支持云的HbbTV概念解决了HbbTV市场上设备碎片化的挑战。这个概念是基于HbbTV应用程序信令的虚拟化或所谓的红按钮信令。在我们的方法中,红色按钮信号在云中终止、执行和处理。本工作介绍了支持云HbbTV方法的体系结构以及该体系结构的实现。最后,提出了体系结构评估和面临的挑战。
{"title":"On Virtualization of Red-Button Signaling in Hybrid TV","authors":"A. Mikityuk, Martin Platschek, O. Friedrich","doi":"10.1109/ISM.2015.109","DOIUrl":"https://doi.org/10.1109/ISM.2015.109","url":null,"abstract":"Hybrid Broadcast Broadband TV (HbbTV) is a European Hybrid TV standard that combines web-based technologies with traditional TV broadcast services. With the release of the HbbTV 2.0 specification in mid-February 2015, the HbbTV standard has begun to gain momentum worldwide, e.g. in North and South America, in Asia and in the UK. Indeed, the ATSC standard 3.0 in North America will be harmonized with the HbbTV 2.0 version. In Europe we have already faced the fact that the HbbTV device market has become very fragmented. This is due to versioning of the standard, various hardware capabilities of HbbTV devices or the lack of the HbbTV standard support on millions of devices. In this work we address the challenge of device fragmentation on the HbbTV market with a cloud-enabled HbbTV concept. This concept is based on the virtualization of HbbTV application signaling or the so-called Red-button signaling. The Red-button signaling is terminated, executed and handled within the Cloud in our approach. This work presents the architecture to enable the Cloud HbbTV approach and the implementation of this architecture. Finally, the architecture evaluation and corresponding challenges are presented.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129447292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2015 IEEE International Symposium on Multimedia (ISM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1