With the development and deployment of ubiquitous wireless network together with the growing popularity of mobile auto-stereoscopic 3D displays, more and more applications have been developed to enable rich 3D mobile multimedia experiences, including 3D display gaming. Simultaneously, with the emergence of cloud computing, more mobile applications are being developed to take advantage of the elastic cloud resources. In this paper, we explore the possibility of developing Cloud Mobile 3D Display Gaming, where the 3D video rendering and encoding are performed on cloud servers, with the resulting 3D video streamed to mobile devices with 3D displays through wireless network. However, with the significantly higher bitrate requirement for 3D videos, ensuring user experience may be a challenge considering the bandwidth constraints of mobile networks. In order to address this challenge, different techniques have been proposed including asymmetric graphics rendering and asymmetric video encoding. In this paper, for the first time, we propose a joint asymmetric graphics rendering and video encoding approach, where both the encoding quality and rendering richness of left view and right view are asymmetric, to enhance the user experience of the cloud mobile 3D display gaming system. Specifically, we first conduct extensive user studies to develop a user experience model that takes into account both video encoding impairment and graphics rendering impairment. We also develop a model to relate the bitrate of the resulting video with the video encoding settings and graphics rendering settings. Finally we propose an optimization algorithm that can automatically choose the video encoding settings and graphics rendering settings for left view and right view to ensure the best user experience given the network conditions. Experiments conducted using real 4G-LTE network profiles on commercial cloud service demonstrate the improvement in user experience when the proposed optimization algorithm is applied.
{"title":"A Joint Asymmetric Graphics Rendering and Video Encoding Approach for Optimizing Cloud Mobile 3D Display Gaming User Experience","authors":"Yao Liu, Yao Liu, S. Dey","doi":"10.1109/ISM.2015.27","DOIUrl":"https://doi.org/10.1109/ISM.2015.27","url":null,"abstract":"With the development and deployment of ubiquitous wireless network together with the growing popularity of mobile auto-stereoscopic 3D displays, more and more applications have been developed to enable rich 3D mobile multimedia experiences, including 3D display gaming. Simultaneously, with the emergence of cloud computing, more mobile applications are being developed to take advantage of the elastic cloud resources. In this paper, we explore the possibility of developing Cloud Mobile 3D Display Gaming, where the 3D video rendering and encoding are performed on cloud servers, with the resulting 3D video streamed to mobile devices with 3D displays through wireless network. However, with the significantly higher bitrate requirement for 3D videos, ensuring user experience may be a challenge considering the bandwidth constraints of mobile networks. In order to address this challenge, different techniques have been proposed including asymmetric graphics rendering and asymmetric video encoding. In this paper, for the first time, we propose a joint asymmetric graphics rendering and video encoding approach, where both the encoding quality and rendering richness of left view and right view are asymmetric, to enhance the user experience of the cloud mobile 3D display gaming system. Specifically, we first conduct extensive user studies to develop a user experience model that takes into account both video encoding impairment and graphics rendering impairment. We also develop a model to relate the bitrate of the resulting video with the video encoding settings and graphics rendering settings. Finally we propose an optimization algorithm that can automatically choose the video encoding settings and graphics rendering settings for left view and right view to ensure the best user experience given the network conditions. Experiments conducted using real 4G-LTE network profiles on commercial cloud service demonstrate the improvement in user experience when the proposed optimization algorithm is applied.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124963910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In pattern recognition, traditional single manifold assumption can hardly guarantee the best classification performance, since the data from multiple classes does not lie on a single manifold. When the dataset contains multiple classes and the structure of the classes are different, it is more reasonable to assume each class lies on a particular manifold. In this paper, we propose a novel framework of semi-supervised dimensionality reduction for multi-manifold learning. Within this framework, methods are derived to learn multiple manifold corresponding to multiple classes in a data set, including both the labeled and unlabeled examples. In order to connect each unlabeled point to the other points from the same manifold, a similarity graph construction, based on sparse manifold clustering, is introduced when constructing the neighbourhood graph. Experimental results verify the advantages and effectiveness of this new framework.
{"title":"A Novel Semi-Supervised Dimensionality Reduction Framework for Multi-manifold Learning","authors":"Xin Guo, Tie Yun, L. Qi, L. Guan","doi":"10.1109/ISM.2015.73","DOIUrl":"https://doi.org/10.1109/ISM.2015.73","url":null,"abstract":"In pattern recognition, traditional single manifold assumption can hardly guarantee the best classification performance, since the data from multiple classes does not lie on a single manifold. When the dataset contains multiple classes and the structure of the classes are different, it is more reasonable to assume each class lies on a particular manifold. In this paper, we propose a novel framework of semi-supervised dimensionality reduction for multi-manifold learning. Within this framework, methods are derived to learn multiple manifold corresponding to multiple classes in a data set, including both the labeled and unlabeled examples. In order to connect each unlabeled point to the other points from the same manifold, a similarity graph construction, based on sparse manifold clustering, is introduced when constructing the neighbourhood graph. Experimental results verify the advantages and effectiveness of this new framework.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122012363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe the ViPLab plug-in for the ILIAS learning management system (LMS) that offers students a virtual programming laboratory and hence allows them to run programming exercises without ever having to leave the browser. In particular, this article introduces one new component of the system that allows automatic correction of programming exercises and hence simplifies the implementation of programming classes in freshmen courses significantly.
{"title":"Automatic Correction of Programming Exercises in ViPLab","authors":"T. Richter, Jan Vanvinkenroye","doi":"10.1109/ISM.2015.69","DOIUrl":"https://doi.org/10.1109/ISM.2015.69","url":null,"abstract":"We describe the ViPLab plug-in for the ILIAS learning management system (LMS) that offers students a virtual programming laboratory and hence allows them to run programming exercises without ever having to leave the browser. In particular, this article introduces one new component of the system that allows automatic correction of programming exercises and hence simplifies the implementation of programming classes in freshmen courses significantly.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127050327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jumpei Kobayashi, Takashi Sekiguchi, E. Shinbori, T. Kawashima
We propose a new electronic text format with a stepped-line layout to optimize viewing position and to improve the efficiency of reading Japanese text. Generally, the reader's eyes try to fixate on every phrase while reading Japanese text. To date, no method has been proposed to optimize the fixation position while reading. In case of spaced text such as English, the space characters provide the boundary information for eye movement, however, in case of Japanese text, reading speed decreases by inserting spaces between phrases. With the new stepped-line text format proposed in this report, a text line is segmented and stepped down between phrases, moreover, line breaks are present between phrases. To evaluate the effect of the stepped-line layout on the reading efficiency, we measured reading speeds and eye movements for both the new layout and a conventional straight-line layout. The reading speed for the new stepped-line layout is approximately 13% faster compared to the straight-line layout, whereas the number of fixations in the stepped-line layout is approximately 11% less than that in the straight-line layout. This is primarily achieved by a reduction in the number of regressions and an increase in the forward saccade length. Moreover, 91% of participants did not experience illegibility or incongruousness with the stepped-line layout reading, suggesting that the stepped-line layout is a new technique for improving the efficiency of eye movements while reading without any increase in cognitive load.
{"title":"Stepped-Line Text Layout with Phrased Segmentation for Readability Improvement of Japanese Electronic Text","authors":"Jumpei Kobayashi, Takashi Sekiguchi, E. Shinbori, T. Kawashima","doi":"10.1109/ISM.2015.87","DOIUrl":"https://doi.org/10.1109/ISM.2015.87","url":null,"abstract":"We propose a new electronic text format with a stepped-line layout to optimize viewing position and to improve the efficiency of reading Japanese text. Generally, the reader's eyes try to fixate on every phrase while reading Japanese text. To date, no method has been proposed to optimize the fixation position while reading. In case of spaced text such as English, the space characters provide the boundary information for eye movement, however, in case of Japanese text, reading speed decreases by inserting spaces between phrases. With the new stepped-line text format proposed in this report, a text line is segmented and stepped down between phrases, moreover, line breaks are present between phrases. To evaluate the effect of the stepped-line layout on the reading efficiency, we measured reading speeds and eye movements for both the new layout and a conventional straight-line layout. The reading speed for the new stepped-line layout is approximately 13% faster compared to the straight-line layout, whereas the number of fixations in the stepped-line layout is approximately 11% less than that in the straight-line layout. This is primarily achieved by a reduction in the number of regressions and an increase in the forward saccade length. Moreover, 91% of participants did not experience illegibility or incongruousness with the stepped-line layout reading, suggesting that the stepped-line layout is a new technique for improving the efficiency of eye movements while reading without any increase in cognitive load.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127156331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, high-performance face recognition has attracted research attention in real-world scenarios. Thanks to the advances in sensor technology, face recognition system equipped with multiple sensors has been widely researched. Among them, face recognition system with near-infrared imagery has been one important research topic. In this paper, complementary effect resided in face images captured by nearinfrared and visible rays is exploited by combining two distinct spectral images (i.e., face images captured by near-infrared and visible rays). We propose a new texture feature (i.e., multispectral texture feature) extraction method with synthesized face images to achieve high-performance face recognition with illumination-invariant property. The experimental results show that the proposed method enhances the discriminative power of features thanks the complementary effect.
{"title":"Multispectral Texture Features from Visible and Near-Infrared Synthetic Face Images for Face Recognition","authors":"Hyungil Kim, Seung-ho Lee, Yong Man Ro","doi":"10.1109/ISM.2015.95","DOIUrl":"https://doi.org/10.1109/ISM.2015.95","url":null,"abstract":"Recently, high-performance face recognition has attracted research attention in real-world scenarios. Thanks to the advances in sensor technology, face recognition system equipped with multiple sensors has been widely researched. Among them, face recognition system with near-infrared imagery has been one important research topic. In this paper, complementary effect resided in face images captured by nearinfrared and visible rays is exploited by combining two distinct spectral images (i.e., face images captured by near-infrared and visible rays). We propose a new texture feature (i.e., multispectral texture feature) extraction method with synthesized face images to achieve high-performance face recognition with illumination-invariant property. The experimental results show that the proposed method enhances the discriminative power of features thanks the complementary effect.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126174905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Atwah, Razib Iqbal, S. Shirmohammadi, A. Javadtalab
Video conferencing applications have significantly changed the way people communicate over the Internet. Web Real-Time Communication (WebRTC), drafted by the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) working groups, has added new functionality to the web browsers, allowing audio/video calls between browsers without the need to install any video telephony applications. The Google Congestion Control (GCC) algorithm has been proposed as WebRTC's congestion control mechanism, but its performance is limited due to using a fixed incoming rate decrease factor, known as alpha (a). In this paper, we propose a dynamic alpha model to reduce the available receiving bandwidth estimate during overuse as indicated by the over-use detector. Experiments using our specific testbed show that our proposed model achieves a 33% higher incoming rate and a 16% lower round-trip time, while keeping a similar packet loss rate and video quality, compared to a fixed alpha model.
{"title":"A Dynamic Alpha Congestion Controller for WebRTC","authors":"R. Atwah, Razib Iqbal, S. Shirmohammadi, A. Javadtalab","doi":"10.1109/ISM.2015.63","DOIUrl":"https://doi.org/10.1109/ISM.2015.63","url":null,"abstract":"Video conferencing applications have significantly changed the way people communicate over the Internet. Web Real-Time Communication (WebRTC), drafted by the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) working groups, has added new functionality to the web browsers, allowing audio/video calls between browsers without the need to install any video telephony applications. The Google Congestion Control (GCC) algorithm has been proposed as WebRTC's congestion control mechanism, but its performance is limited due to using a fixed incoming rate decrease factor, known as alpha (a). In this paper, we propose a dynamic alpha model to reduce the available receiving bandwidth estimate during overuse as indicated by the over-use detector. Experiments using our specific testbed show that our proposed model achieves a 33% higher incoming rate and a 16% lower round-trip time, while keeping a similar packet loss rate and video quality, compared to a fixed alpha model.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115476782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The last decade has seen a sharp increase in Internet traffic due to dispersion of videos. Despite this growth, user's quality of experience (QoE) in peer-to-peer (P2P) streaming systems does not match the conventional television service. The deployment of P2P streaming system is affected by long delay, unplanned interruption, flash crowd, high churn situation and choice of overlay structure. The overlay structure plays significant role in ensuring that traffic are distributed to all physical links in a dynamic and fair manner, tree-based (TB) and mesh-based (MB) are the most popular. TB fails in situations where there is failure at the parent peer which can lead to total collapse of the system while MB is more vulnerable to flash crowd and high churn situation due to its unstructured pattern. This paper presents a novel P2P streaming topology (UStream), using a hybrid of TB and MB to address the disadvantages of both topologies to ensure an optimal solution. Furthermore, UStream adopts the features of ultra-metric tree to ensure that the time taken from the root peer to any of the children's peer are equal and the spanning tree to monitor all the peers at any point in time. Ustream also employs the principle of chaos theory. The present peer determines the future, though the approximate present does not approximately determines the future. Ustream was formalized using mathematical theories. Several theorems were proposed and proved in validating this topology.
{"title":"UStream: Ultra-Metric Spanning Overlay Topology for Peer-to-Peer Streaming Systems","authors":"O. Ojo, A. Oluwatope, Olufemi Ogunsola","doi":"10.1109/ISM.2015.82","DOIUrl":"https://doi.org/10.1109/ISM.2015.82","url":null,"abstract":"The last decade has seen a sharp increase in Internet traffic due to dispersion of videos. Despite this growth, user's quality of experience (QoE) in peer-to-peer (P2P) streaming systems does not match the conventional television service. The deployment of P2P streaming system is affected by long delay, unplanned interruption, flash crowd, high churn situation and choice of overlay structure. The overlay structure plays significant role in ensuring that traffic are distributed to all physical links in a dynamic and fair manner, tree-based (TB) and mesh-based (MB) are the most popular. TB fails in situations where there is failure at the parent peer which can lead to total collapse of the system while MB is more vulnerable to flash crowd and high churn situation due to its unstructured pattern. This paper presents a novel P2P streaming topology (UStream), using a hybrid of TB and MB to address the disadvantages of both topologies to ensure an optimal solution. Furthermore, UStream adopts the features of ultra-metric tree to ensure that the time taken from the root peer to any of the children's peer are equal and the spanning tree to monitor all the peers at any point in time. Ustream also employs the principle of chaos theory. The present peer determines the future, though the approximate present does not approximately determines the future. Ustream was formalized using mathematical theories. Several theorems were proposed and proved in validating this topology.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128677389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiffany C. K. Kwok, N. M. Shum, G. Ngai, H. Leong, G. A. Tseng, Hoi-yi Choi, Ka-yan Mak, C. Do
We present a vision-based, data-driven approach to identifying and measuring refractive errors in human subjects with low-cost, easily available equipment and no specialist training. Vision problems, such as refractive error (e.g. nearsightedness, astigmatism, etc) are common ocular problems, which, if uncorrected, may lead to serious visual impairment. The diagnosis of such defects conventionally requires expensive specialist equipment and trained personnel, which is a barrier in many parts of the developing world. Our approach aims to democratize optometric care by utilizing the computational power inherent in consumer-grade devices and the advances made possible by multimedia computing. We present results that show our system is able to match and outperform state-of-the-art medical devices under certain conditions.
{"title":"Democratizing Optometric Care: A Vision-Based, Data-Driven Approach to Automatic Refractive Error Measurement for Vision Screening","authors":"Tiffany C. K. Kwok, N. M. Shum, G. Ngai, H. Leong, G. A. Tseng, Hoi-yi Choi, Ka-yan Mak, C. Do","doi":"10.1109/ISM.2015.55","DOIUrl":"https://doi.org/10.1109/ISM.2015.55","url":null,"abstract":"We present a vision-based, data-driven approach to identifying and measuring refractive errors in human subjects with low-cost, easily available equipment and no specialist training. Vision problems, such as refractive error (e.g. nearsightedness, astigmatism, etc) are common ocular problems, which, if uncorrected, may lead to serious visual impairment. The diagnosis of such defects conventionally requires expensive specialist equipment and trained personnel, which is a barrier in many parts of the developing world. Our approach aims to democratize optometric care by utilizing the computational power inherent in consumer-grade devices and the advances made possible by multimedia computing. We present results that show our system is able to match and outperform state-of-the-art medical devices under certain conditions.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130795839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Peng, D. Wang, Ishan Patwa, Dihong Gong, C. Fang
With the advent of abundant multimedia data on the Internet, there have been research efforts on multimodal machine learning to utilize data from different modalities. Current approaches mostly focus on developing models to fuse low-level features from multiple modalities and learn unified representation from different modalities. But most related work failed to justify why we should use multimodal data and multimodal fusion, and few of them leveraged the complementary relation among different modalities. In this paper, we first identify the correlative and complementary relations among multiple modalities. Then we propose a probabilistic ensemble fusion model to capture the complementary relation between two modalities (images and text). Experimental results on the UIUC-ISD dataset show our ensemble approach outperforms approaches using only single modality. Word sense disambiguation (WSD) is the use case we studied to demonstrate the effectiveness of our probabilistic ensemble fusion model.
{"title":"Probabilistic Ensemble Fusion for Multimodal Word Sense Disambiguation","authors":"Yang Peng, D. Wang, Ishan Patwa, Dihong Gong, C. Fang","doi":"10.1109/ISM.2015.35","DOIUrl":"https://doi.org/10.1109/ISM.2015.35","url":null,"abstract":"With the advent of abundant multimedia data on the Internet, there have been research efforts on multimodal machine learning to utilize data from different modalities. Current approaches mostly focus on developing models to fuse low-level features from multiple modalities and learn unified representation from different modalities. But most related work failed to justify why we should use multimodal data and multimodal fusion, and few of them leveraged the complementary relation among different modalities. In this paper, we first identify the correlative and complementary relations among multiple modalities. Then we propose a probabilistic ensemble fusion model to capture the complementary relation between two modalities (images and text). Experimental results on the UIUC-ISD dataset show our ensemble approach outperforms approaches using only single modality. Word sense disambiguation (WSD) is the use case we studied to demonstrate the effectiveness of our probabilistic ensemble fusion model.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130536781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many people search and watch music videos on video sharing websites. Although a vast variety of videos are uploaded, current search systems on video sharing websites allow users to search a limited range of music videos for an input query. Especially when a user does not have enough knowledge of a query, the problem gets worse because the user cannot customize the range by changing the query or adding some keywords to the original query. In this paper, we propose a music video search system, called ExploratoryVideoSearch, that is coordinate term aware and diversity aware. Our system focuses on artist name queries to search videos on YouTube and has two novel functions: (1) given an artist name query, the system shows a search result for the artist as well as those for its coordinate terms, and (2) the system diversifies search results for the query and its coordinate terms, and allows users to interactively change the diversity level. Coordinate terms are obtained by utilizing the Million Song Dataset and Wikipedia, while search results are diversified based on tags attached to YouTube music videos. ExploratoryVideoSearch enables users to search a wide variety of music videos without requiring deep knowledge about a query.
许多人在视频分享网站上搜索和观看音乐视频。尽管上传的视频种类繁多,但目前视频分享网站上的搜索系统允许用户搜索有限范围的音乐视频以输入查询。特别是当用户对查询没有足够的了解时,问题会变得更糟,因为用户无法通过更改查询或向原始查询添加一些关键字来定制范围。本文提出了一个具有坐标词感知和多样性感知的音乐视频搜索系统ExploratoryVideoSearch。我们的系统以艺人姓名查询为重点,在YouTube上搜索视频,有两个新颖的功能:(1)给定艺人姓名查询,系统显示该艺人及其坐标项的搜索结果;(2)系统对该查询及其坐标项的搜索结果进行多样化,并允许用户交互更改多样化程度。坐标词是利用百万歌曲数据集(Million Song Dataset)和维基百科(Wikipedia)获得的,搜索结果是根据YouTube音乐视频附加的标签进行多样化的。ExploratoryVideoSearch使用户能够搜索各种各样的音乐视频,而不需要对查询有深入的了解。
{"title":"ExploratoryVideoSearch: A Music Video Search System Based on Coordinate Terms and Diversification","authors":"Kosetsu Tsukuda, Masataka Goto","doi":"10.1109/ISM.2015.99","DOIUrl":"https://doi.org/10.1109/ISM.2015.99","url":null,"abstract":"Many people search and watch music videos on video sharing websites. Although a vast variety of videos are uploaded, current search systems on video sharing websites allow users to search a limited range of music videos for an input query. Especially when a user does not have enough knowledge of a query, the problem gets worse because the user cannot customize the range by changing the query or adding some keywords to the original query. In this paper, we propose a music video search system, called ExploratoryVideoSearch, that is coordinate term aware and diversity aware. Our system focuses on artist name queries to search videos on YouTube and has two novel functions: (1) given an artist name query, the system shows a search result for the artist as well as those for its coordinate terms, and (2) the system diversifies search results for the query and its coordinate terms, and allows users to interactively change the diversity level. Coordinate terms are obtained by utilizing the Million Song Dataset and Wikipedia, while search results are diversified based on tags attached to YouTube music videos. ExploratoryVideoSearch enables users to search a wide variety of music videos without requiring deep knowledge about a query.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125511369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}