2015 IEEE International Symposium on Multimedia (ISM)最新文献

英文中文

A Joint Asymmetric Graphics Rendering and Video Encoding Approach for Optimizing Cloud Mobile 3D Display Gaming User Experience 一种优化云移动3D显示游戏用户体验的联合非对称图形渲染和视频编码方法

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.27

Yao Liu, Yao Liu, S. Dey

With the development and deployment of ubiquitous wireless network together with the growing popularity of mobile auto-stereoscopic 3D displays, more and more applications have been developed to enable rich 3D mobile multimedia experiences, including 3D display gaming. Simultaneously, with the emergence of cloud computing, more mobile applications are being developed to take advantage of the elastic cloud resources. In this paper, we explore the possibility of developing Cloud Mobile 3D Display Gaming, where the 3D video rendering and encoding are performed on cloud servers, with the resulting 3D video streamed to mobile devices with 3D displays through wireless network. However, with the significantly higher bitrate requirement for 3D videos, ensuring user experience may be a challenge considering the bandwidth constraints of mobile networks. In order to address this challenge, different techniques have been proposed including asymmetric graphics rendering and asymmetric video encoding. In this paper, for the first time, we propose a joint asymmetric graphics rendering and video encoding approach, where both the encoding quality and rendering richness of left view and right view are asymmetric, to enhance the user experience of the cloud mobile 3D display gaming system. Specifically, we first conduct extensive user studies to develop a user experience model that takes into account both video encoding impairment and graphics rendering impairment. We also develop a model to relate the bitrate of the resulting video with the video encoding settings and graphics rendering settings. Finally we propose an optimization algorithm that can automatically choose the video encoding settings and graphics rendering settings for left view and right view to ensure the best user experience given the network conditions. Experiments conducted using real 4G-LTE network profiles on commercial cloud service demonstrate the improvement in user experience when the proposed optimization algorithm is applied.

随着无处不在的无线网络的发展和部署，以及移动自动立体3D显示器的日益普及，越来越多的应用程序被开发出来，以实现丰富的3D移动多媒体体验，包括3D显示游戏。同时，随着云计算的出现，越来越多的移动应用程序被开发出来，以利用云资源的弹性。在本文中，我们探索了开发云移动3D显示游戏的可能性，其中3D视频渲染和编码在云服务器上执行，并通过无线网络将生成的3D视频流传输到具有3D显示的移动设备。然而，由于3D视频的比特率要求明显更高，考虑到移动网络的带宽限制，确保用户体验可能是一个挑战。为了应对这一挑战，人们提出了不同的技术，包括非对称图形渲染和非对称视频编码。本文首次提出了一种左视图和右视图的编码质量和渲染丰富度都是非对称的联合非对称图形渲染和视频编码方法，以增强云移动3D显示游戏系统的用户体验。具体而言，我们首先进行广泛的用户研究，以开发考虑视频编码缺陷和图形渲染缺陷的用户体验模型。我们还开发了一个模型，将生成的视频的比特率与视频编码设置和图形渲染设置联系起来。最后，我们提出了一种优化算法，该算法可以自动选择左视图和右视图的视频编码设置和图形渲染设置，以确保在给定网络条件下的最佳用户体验。在商业云服务上使用真实4G-LTE网络配置文件进行的实验表明，采用本文提出的优化算法后，用户体验得到了改善。

{"title":"A Joint Asymmetric Graphics Rendering and Video Encoding Approach for Optimizing Cloud Mobile 3D Display Gaming User Experience","authors":"Yao Liu, Yao Liu, S. Dey","doi":"10.1109/ISM.2015.27","DOIUrl":"https://doi.org/10.1109/ISM.2015.27","url":null,"abstract":"With the development and deployment of ubiquitous wireless network together with the growing popularity of mobile auto-stereoscopic 3D displays, more and more applications have been developed to enable rich 3D mobile multimedia experiences, including 3D display gaming. Simultaneously, with the emergence of cloud computing, more mobile applications are being developed to take advantage of the elastic cloud resources. In this paper, we explore the possibility of developing Cloud Mobile 3D Display Gaming, where the 3D video rendering and encoding are performed on cloud servers, with the resulting 3D video streamed to mobile devices with 3D displays through wireless network. However, with the significantly higher bitrate requirement for 3D videos, ensuring user experience may be a challenge considering the bandwidth constraints of mobile networks. In order to address this challenge, different techniques have been proposed including asymmetric graphics rendering and asymmetric video encoding. In this paper, for the first time, we propose a joint asymmetric graphics rendering and video encoding approach, where both the encoding quality and rendering richness of left view and right view are asymmetric, to enhance the user experience of the cloud mobile 3D display gaming system. Specifically, we first conduct extensive user studies to develop a user experience model that takes into account both video encoding impairment and graphics rendering impairment. We also develop a model to relate the bitrate of the resulting video with the video encoding settings and graphics rendering settings. Finally we propose an optimization algorithm that can automatically choose the video encoding settings and graphics rendering settings for left view and right view to ensure the best user experience given the network conditions. Experiments conducted using real 4G-LTE network profiles on commercial cloud service demonstrate the improvement in user experience when the proposed optimization algorithm is applied.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124963910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel Semi-Supervised Dimensionality Reduction Framework for Multi-manifold Learning 一种新的多流形学习半监督降维框架

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.73

Xin Guo, Tie Yun, L. Qi, L. Guan

In pattern recognition, traditional single manifold assumption can hardly guarantee the best classification performance, since the data from multiple classes does not lie on a single manifold. When the dataset contains multiple classes and the structure of the classes are different, it is more reasonable to assume each class lies on a particular manifold. In this paper, we propose a novel framework of semi-supervised dimensionality reduction for multi-manifold learning. Within this framework, methods are derived to learn multiple manifold corresponding to multiple classes in a data set, including both the labeled and unlabeled examples. In order to connect each unlabeled point to the other points from the same manifold, a similarity graph construction, based on sparse manifold clustering, is introduced when constructing the neighbourhood graph. Experimental results verify the advantages and effectiveness of this new framework.

在模式识别中，由于多个类别的数据并不在一个流形上，传统的单流形假设很难保证最佳的分类性能。当数据集包含多个类并且类的结构不同时，假设每个类位于特定的流形上是更合理的。本文提出了一种用于多流形学习的半监督降维框架。在此框架下，导出了学习数据集中多个类对应的多个流形的方法，包括标记和未标记的示例。在构造邻域图时，引入了一种基于稀疏流形聚类的相似图构造方法，使未标记点与同一流形上的其他点连接起来。实验结果验证了该框架的优越性和有效性。

引用次数: 2

Automatic Correction of Programming Exercises in ViPLab ViPLab中编程练习的自动校正

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.69

T. Richter, Jan Vanvinkenroye

We describe the ViPLab plug-in for the ILIAS learning management system (LMS) that offers students a virtual programming laboratory and hence allows them to run programming exercises without ever having to leave the browser. In particular, this article introduces one new component of the system that allows automatic correction of programming exercises and hence simplifies the implementation of programming classes in freshmen courses significantly.

我们描述了ILIAS学习管理系统(LMS)的ViPLab插件，它为学生提供了一个虚拟编程实验室，因此允许他们在不离开浏览器的情况下运行编程练习。特别是，本文介绍了该系统的一个新组件，它允许自动纠正编程练习，从而大大简化了大一课程中编程课程的实现。

引用次数: 0

Stepped-Line Text Layout with Phrased Segmentation for Readability Improvement of Japanese Electronic Text 基于分段的日文电子文本步行排版提高可读性

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.87

Jumpei Kobayashi, Takashi Sekiguchi, E. Shinbori, T. Kawashima

We propose a new electronic text format with a stepped-line layout to optimize viewing position and to improve the efficiency of reading Japanese text. Generally, the reader's eyes try to fixate on every phrase while reading Japanese text. To date, no method has been proposed to optimize the fixation position while reading. In case of spaced text such as English, the space characters provide the boundary information for eye movement, however, in case of Japanese text, reading speed decreases by inserting spaces between phrases. With the new stepped-line text format proposed in this report, a text line is segmented and stepped down between phrases, moreover, line breaks are present between phrases. To evaluate the effect of the stepped-line layout on the reading efficiency, we measured reading speeds and eye movements for both the new layout and a conventional straight-line layout. The reading speed for the new stepped-line layout is approximately 13% faster compared to the straight-line layout, whereas the number of fixations in the stepped-line layout is approximately 11% less than that in the straight-line layout. This is primarily achieved by a reduction in the number of regressions and an increase in the forward saccade length. Moreover, 91% of participants did not experience illegibility or incongruousness with the stepped-line layout reading, suggesting that the stepped-line layout is a new technique for improving the efficiency of eye movements while reading without any increase in cognitive load.

为了优化阅读位置，提高阅读效率，我们提出了一种新的电子文本格式，采用步进线布局。一般来说，在阅读日语文本时，读者的眼睛会试图关注每一个短语。到目前为止，还没有一种方法可以优化阅读时的固定位置。在英语等有行距的文本中，空格字符为眼球运动提供了边界信息，但在日语文本中，在短语之间插入空格会降低阅读速度。在本报告中提出的新的步进行文本格式中，文本行被分割并在短语之间步进，并且短语之间存在换行符。为了评估步行线布局对阅读效率的影响，我们测量了新步行线布局和传统直线布局的阅读速度和眼动。与直线布局相比，新的步进线布局的阅读速度大约快13%，而步进线布局的注视点数量大约比直线布局少11%。这主要是通过减少回归次数和增加前向扫视长度来实现的。此外，91%的参与者在阅读时没有出现难以辨认或不一致的现象，这表明在不增加认知负荷的情况下，步进线布局是一种提高阅读时眼球运动效率的新技术。

{"title":"Stepped-Line Text Layout with Phrased Segmentation for Readability Improvement of Japanese Electronic Text","authors":"Jumpei Kobayashi, Takashi Sekiguchi, E. Shinbori, T. Kawashima","doi":"10.1109/ISM.2015.87","DOIUrl":"https://doi.org/10.1109/ISM.2015.87","url":null,"abstract":"We propose a new electronic text format with a stepped-line layout to optimize viewing position and to improve the efficiency of reading Japanese text. Generally, the reader's eyes try to fixate on every phrase while reading Japanese text. To date, no method has been proposed to optimize the fixation position while reading. In case of spaced text such as English, the space characters provide the boundary information for eye movement, however, in case of Japanese text, reading speed decreases by inserting spaces between phrases. With the new stepped-line text format proposed in this report, a text line is segmented and stepped down between phrases, moreover, line breaks are present between phrases. To evaluate the effect of the stepped-line layout on the reading efficiency, we measured reading speeds and eye movements for both the new layout and a conventional straight-line layout. The reading speed for the new stepped-line layout is approximately 13% faster compared to the straight-line layout, whereas the number of fixations in the stepped-line layout is approximately 11% less than that in the straight-line layout. This is primarily achieved by a reduction in the number of regressions and an increase in the forward saccade length. Moreover, 91% of participants did not experience illegibility or incongruousness with the stepped-line layout reading, suggesting that the stepped-line layout is a new technique for improving the efficiency of eye movements while reading without any increase in cognitive load.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127156331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Multispectral Texture Features from Visible and Near-Infrared Synthetic Face Images for Face Recognition 用于人脸识别的可见光和近红外合成人脸图像多光谱纹理特征

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.95

Hyungil Kim, Seung-ho Lee, Yong Man Ro

Recently, high-performance face recognition has attracted research attention in real-world scenarios. Thanks to the advances in sensor technology, face recognition system equipped with multiple sensors has been widely researched. Among them, face recognition system with near-infrared imagery has been one important research topic. In this paper, complementary effect resided in face images captured by nearinfrared and visible rays is exploited by combining two distinct spectral images (i.e., face images captured by near-infrared and visible rays). We propose a new texture feature (i.e., multispectral texture feature) extraction method with synthesized face images to achieve high-performance face recognition with illumination-invariant property. The experimental results show that the proposed method enhances the discriminative power of features thanks the complementary effect.

近年来，高性能人脸识别在现实场景中的研究备受关注。由于传感器技术的进步，多传感器的人脸识别系统得到了广泛的研究。其中，基于近红外图像的人脸识别系统一直是一个重要的研究课题。本文通过将近红外和可见光两种不同的光谱图像(即近红外和可见光捕获的人脸图像)结合起来，利用近红外和可见光捕获的人脸图像存在的互补效应。为了实现具有光照不变性的高性能人脸识别，提出了一种基于合成人脸图像的纹理特征(即多光谱纹理特征)提取方法。实验结果表明，该方法利用互补效应增强了特征的判别能力。

引用次数: 5

A Dynamic Alpha Congestion Controller for WebRTC webbrtc的动态Alpha拥塞控制器

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.63

R. Atwah, Razib Iqbal, S. Shirmohammadi, A. Javadtalab

Video conferencing applications have significantly changed the way people communicate over the Internet. Web Real-Time Communication (WebRTC), drafted by the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) working groups, has added new functionality to the web browsers, allowing audio/video calls between browsers without the need to install any video telephony applications. The Google Congestion Control (GCC) algorithm has been proposed as WebRTC's congestion control mechanism, but its performance is limited due to using a fixed incoming rate decrease factor, known as alpha (a). In this paper, we propose a dynamic alpha model to reduce the available receiving bandwidth estimate during overuse as indicated by the over-use detector. Experiments using our specific testbed show that our proposed model achieves a 33% higher incoming rate and a 16% lower round-trip time, while keeping a similar packet loss rate and video quality, compared to a fixed alpha model.

视频会议应用程序极大地改变了人们在互联网上的交流方式。万维网联盟(W3C)和互联网工程任务组(IETF)工作组起草的网络实时通信(WebRTC)为网络浏览器增加了新的功能，允许在浏览器之间进行音频/视频通话，而无需安装任何视频电话应用程序。Google拥塞控制(GCC)算法已被提出作为WebRTC的拥塞控制机制，但由于使用固定的传入速率降低因子(称为alpha (a))，其性能受到限制。在本文中，我们提出了一个动态alpha模型，以减少过度使用期间可用的接收带宽估计，如过度使用检测器所示。使用我们的特定测试平台进行的实验表明，与固定alpha模型相比，我们提出的模型实现了33%的高传入率和16%的低往返时间，同时保持了相似的丢包率和视频质量。

引用次数: 3

UStream: Ultra-Metric Spanning Overlay Topology for Peer-to-Peer Streaming Systems UStream:点对点流系统的超度量跨越覆盖拓扑

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.82

O. Ojo, A. Oluwatope, Olufemi Ogunsola

The last decade has seen a sharp increase in Internet traffic due to dispersion of videos. Despite this growth, user's quality of experience (QoE) in peer-to-peer (P2P) streaming systems does not match the conventional television service. The deployment of P2P streaming system is affected by long delay, unplanned interruption, flash crowd, high churn situation and choice of overlay structure. The overlay structure plays significant role in ensuring that traffic are distributed to all physical links in a dynamic and fair manner, tree-based (TB) and mesh-based (MB) are the most popular. TB fails in situations where there is failure at the parent peer which can lead to total collapse of the system while MB is more vulnerable to flash crowd and high churn situation due to its unstructured pattern. This paper presents a novel P2P streaming topology (UStream), using a hybrid of TB and MB to address the disadvantages of both topologies to ensure an optimal solution. Furthermore, UStream adopts the features of ultra-metric tree to ensure that the time taken from the root peer to any of the children's peer are equal and the spanning tree to monitor all the peers at any point in time. Ustream also employs the principle of chaos theory. The present peer determines the future, though the approximate present does not approximately determines the future. Ustream was formalized using mathematical theories. Several theorems were proposed and proved in validating this topology.

在过去的十年里，由于视频的分散，互联网流量急剧增加。尽管有这样的增长，但点对点(P2P)流媒体系统的用户体验质量(QoE)仍无法与传统的电视服务相提并论。P2P流媒体系统的部署受长时延、计划外中断、flash人群、高流失率以及覆盖结构选择等因素的影响。覆盖结构在保证流量动态、公平地分配到所有物理链路上起着重要的作用，其中基于树的(TB)和基于网格的(MB)是最受欢迎的。TB在父节点失败的情况下失败，这可能导致系统的完全崩溃，而MB由于其非结构化模式，更容易受到闪电人群和高流失率的影响。本文提出了一种新的P2P流拓扑(UStream)，使用TB和MB的混合来解决这两种拓扑的缺点，以确保最佳解决方案。此外，UStream还采用了超度量树的特性来保证从根对等体到任何一个子对等体所花费的时间相等，并使用生成树来监控任意时间点的所有对等体。Ustream还采用了混沌理论的原理。现在的同伴决定未来，尽管近似的现在并不能近似地决定未来。Ustream是用数学理论形式化的。在验证该拓扑时，提出并证明了几个定理。

{"title":"UStream: Ultra-Metric Spanning Overlay Topology for Peer-to-Peer Streaming Systems","authors":"O. Ojo, A. Oluwatope, Olufemi Ogunsola","doi":"10.1109/ISM.2015.82","DOIUrl":"https://doi.org/10.1109/ISM.2015.82","url":null,"abstract":"The last decade has seen a sharp increase in Internet traffic due to dispersion of videos. Despite this growth, user's quality of experience (QoE) in peer-to-peer (P2P) streaming systems does not match the conventional television service. The deployment of P2P streaming system is affected by long delay, unplanned interruption, flash crowd, high churn situation and choice of overlay structure. The overlay structure plays significant role in ensuring that traffic are distributed to all physical links in a dynamic and fair manner, tree-based (TB) and mesh-based (MB) are the most popular. TB fails in situations where there is failure at the parent peer which can lead to total collapse of the system while MB is more vulnerable to flash crowd and high churn situation due to its unstructured pattern. This paper presents a novel P2P streaming topology (UStream), using a hybrid of TB and MB to address the disadvantages of both topologies to ensure an optimal solution. Furthermore, UStream adopts the features of ultra-metric tree to ensure that the time taken from the root peer to any of the children's peer are equal and the spanning tree to monitor all the peers at any point in time. Ustream also employs the principle of chaos theory. The present peer determines the future, though the approximate present does not approximately determines the future. Ustream was formalized using mathematical theories. Several theorems were proposed and proved in validating this topology.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128677389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Democratizing Optometric Care: A Vision-Based, Data-Driven Approach to Automatic Refractive Error Measurement for Vision Screening 民主化验光护理:一种基于视力，数据驱动的方法，用于视力筛查的自动屈光误差测量

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.55

Tiffany C. K. Kwok, N. M. Shum, G. Ngai, H. Leong, G. A. Tseng, Hoi-yi Choi, Ka-yan Mak, C. Do

We present a vision-based, data-driven approach to identifying and measuring refractive errors in human subjects with low-cost, easily available equipment and no specialist training. Vision problems, such as refractive error (e.g. nearsightedness, astigmatism, etc) are common ocular problems, which, if uncorrected, may lead to serious visual impairment. The diagnosis of such defects conventionally requires expensive specialist equipment and trained personnel, which is a barrier in many parts of the developing world. Our approach aims to democratize optometric care by utilizing the computational power inherent in consumer-grade devices and the advances made possible by multimedia computing. We present results that show our system is able to match and outperform state-of-the-art medical devices under certain conditions.

我们提出了一种基于视觉，数据驱动的方法来识别和测量人类受试者的屈光不正，使用低成本，易于获得的设备，无需专业培训。视力问题，例如屈光不正(例如近视、散光等)是常见的眼部问题，如不加以矫正，可能会导致严重的视力损害。诊断这些缺陷通常需要昂贵的专业设备和训练有素的人员，这在许多发展中国家是一个障碍。我们的方法旨在通过利用消费级设备固有的计算能力和多媒体计算所带来的进步，使验光护理民主化。我们展示的结果表明，在某些条件下，我们的系统能够匹配并优于最先进的医疗设备。

引用次数: 4

Probabilistic Ensemble Fusion for Multimodal Word Sense Disambiguation 多模态词义消歧的概率集成融合

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.35

Yang Peng, D. Wang, Ishan Patwa, Dihong Gong, C. Fang

With the advent of abundant multimedia data on the Internet, there have been research efforts on multimodal machine learning to utilize data from different modalities. Current approaches mostly focus on developing models to fuse low-level features from multiple modalities and learn unified representation from different modalities. But most related work failed to justify why we should use multimodal data and multimodal fusion, and few of them leveraged the complementary relation among different modalities. In this paper, we first identify the correlative and complementary relations among multiple modalities. Then we propose a probabilistic ensemble fusion model to capture the complementary relation between two modalities (images and text). Experimental results on the UIUC-ISD dataset show our ensemble approach outperforms approaches using only single modality. Word sense disambiguation (WSD) is the use case we studied to demonstrate the effectiveness of our probabilistic ensemble fusion model.

随着互联网上丰富的多媒体数据的出现，多模态机器学习已经成为利用不同模态数据的研究热点。目前的方法主要集中在开发模型来融合来自多个模态的底层特征，并从不同的模态中学习统一的表示。但大多数相关工作未能证明为什么我们应该使用多模态数据和多模态融合，很少利用不同模态之间的互补关系。在本文中，我们首先确定了多模态之间的相关和互补关系。然后，我们提出了一个概率集成融合模型来捕获两种模式(图像和文本)之间的互补关系。在UIUC-ISD数据集上的实验结果表明，我们的集成方法优于仅使用单一模态的方法。词义消歧(WSD)是我们研究的用例，以证明我们的概率集成融合模型的有效性。

引用次数: 5

ExploratoryVideoSearch: A Music Video Search System Based on Coordinate Terms and Diversification ExploratoryVideoSearch:一个基于坐标项和多样化的音乐视频搜索系统

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.99

Kosetsu Tsukuda, Masataka Goto

Many people search and watch music videos on video sharing websites. Although a vast variety of videos are uploaded, current search systems on video sharing websites allow users to search a limited range of music videos for an input query. Especially when a user does not have enough knowledge of a query, the problem gets worse because the user cannot customize the range by changing the query or adding some keywords to the original query. In this paper, we propose a music video search system, called ExploratoryVideoSearch, that is coordinate term aware and diversity aware. Our system focuses on artist name queries to search videos on YouTube and has two novel functions: (1) given an artist name query, the system shows a search result for the artist as well as those for its coordinate terms, and (2) the system diversifies search results for the query and its coordinate terms, and allows users to interactively change the diversity level. Coordinate terms are obtained by utilizing the Million Song Dataset and Wikipedia, while search results are diversified based on tags attached to YouTube music videos. ExploratoryVideoSearch enables users to search a wide variety of music videos without requiring deep knowledge about a query.

许多人在视频分享网站上搜索和观看音乐视频。尽管上传的视频种类繁多，但目前视频分享网站上的搜索系统允许用户搜索有限范围的音乐视频以输入查询。特别是当用户对查询没有足够的了解时，问题会变得更糟，因为用户无法通过更改查询或向原始查询添加一些关键字来定制范围。本文提出了一个具有坐标词感知和多样性感知的音乐视频搜索系统ExploratoryVideoSearch。我们的系统以艺人姓名查询为重点，在YouTube上搜索视频，有两个新颖的功能:(1)给定艺人姓名查询，系统显示该艺人及其坐标项的搜索结果;(2)系统对该查询及其坐标项的搜索结果进行多样化，并允许用户交互更改多样化程度。坐标词是利用百万歌曲数据集(Million Song Dataset)和维基百科(Wikipedia)获得的，搜索结果是根据YouTube音乐视频附加的标签进行多样化的。ExploratoryVideoSearch使用户能够搜索各种各样的音乐视频，而不需要对查询有深入的了解。

{"title":"ExploratoryVideoSearch: A Music Video Search System Based on Coordinate Terms and Diversification","authors":"Kosetsu Tsukuda, Masataka Goto","doi":"10.1109/ISM.2015.99","DOIUrl":"https://doi.org/10.1109/ISM.2015.99","url":null,"abstract":"Many people search and watch music videos on video sharing websites. Although a vast variety of videos are uploaded, current search systems on video sharing websites allow users to search a limited range of music videos for an input query. Especially when a user does not have enough knowledge of a query, the problem gets worse because the user cannot customize the range by changing the query or adding some keywords to the original query. In this paper, we propose a music video search system, called ExploratoryVideoSearch, that is coordinate term aware and diversity aware. Our system focuses on artist name queries to search videos on YouTube and has two novel functions: (1) given an artist name query, the system shows a search result for the artist as well as those for its coordinate terms, and (2) the system diversifies search results for the query and its coordinate terms, and allows users to interactively change the diversity level. Coordinate terms are obtained by utilizing the Million Song Dataset and Wikipedia, while search results are diversified based on tags attached to YouTube music videos. ExploratoryVideoSearch enables users to search a wide variety of music videos without requiring deep knowledge about a query.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125511369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 IEEE International Symposium on Multimedia (ISM)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀